

Within the data landscape, we often see the same challenges recurring among our clients. Unlocking single data sources is one thing; doing so in a scalable and standardised way is quite another. How do you make sure you get data streams and the required infrastructure into production in a stable way? And how does this work when multiple developers – let alone developer teams – are working on getting this done at the same time?
An important part of the answer lies with Databricks; the go-to for organisations where multiple developers – or even multiple developer teams – are responsible for unlocking data and creating data products.
In doing so, Databricks Asset Bundles (DABs) are your best friend: a tool to configure data pipelines and associated resources as-code – similar to Infrastructure-as-Code, but for your data. These bundles can then be easily integrated with CI pipelines to deploy from environment to environment. With DABs, you lay the foundations for standardised, repeatable and scalable deployments of your Databricks environments – and exactly that makes the difference between proof-of-concept and production.


With Databricks Asset Bundles (DABs), you declaratively configure which resources you need and where they should run. Which notebooks make up a job? In other words; which pieces of code make up the pipeline that transforms raw data into data products that deliver value to the business? What clusters are needed to run these pipelines on? What environments do we have and which jobs belong on which environment?
The declarative way you configure DABs ensures a simple, scalable and replicable solution to all these questions. Declarative means that you do not describe the steps to connect and deploy resources, but only the desired end state. To illustrate; you simply describe which notebooks in what order work together to ensure that raw financial data from source A is poured into a usable quarterly report. You configure which cluster should perform these calculations AND on which Databricks environment you wish to perform this job. A simple ‘deploy’ command from the Databricks CLI then takes care of automatic provisioning and updates.
As proven with several clients, Blenddata is the knowledge partner for provisioning and automating your Databricks environments. We successfully helped one of our customers migrate from a Dataiku platform to a Databricks platform, reducing costs and increasing data stability. Another satisfied client is a large financial service provider, where we helped build a robust Databricks platform from the start, with more than 10 teams running hundreds of data streams on a daily basis. We think along with you and search together for the best solution for your organisation.
In short, Databricks Asset Bundles allow the user to declaratively configure a Databricks workspace, bundle infrastructure and code, and replicate it automatically across multiple environments.
Wondering how your organisation can make use of this? Then get in touch!