Databricks Asset Bundles: stable configuration of your workspaces

Databricks Asset Bundles: Stable configuration of your workspaces

AuteurDélano GaasbeekFunctieData Engineer

Tag(s)

Expert Databricks

Within the data landscape, we often see the same challenges recurring among our clients. Unlocking single data sources is one thing; doing so in a scalable and standardised way is quite another. How do you make sure you get data streams and the required infrastructure into production in a stable way? And how does this work when multiple developers – let alone developer teams – are working on getting this done at the same time?

An important part of the answer lies with Databricks; the go-to for organisations where multiple developers – or even multiple developer teams – are responsible for unlocking data and creating data products.
In doing so, Databricks Asset Bundles (DABs) are your best friend: a tool to configure data pipelines and associated resources as-code – similar to Infrastructure-as-Code, but for your data. These bundles can then be easily integrated with CI pipelines to deploy from environment to environment. With DABs, you lay the foundations for standardised, repeatable and scalable deployments of your Databricks environments – and exactly that makes the difference between proof-of-concept and production.

What are Databricks Asset Bundles?

With Databricks Asset Bundles (DABs), you declaratively configure which resources you need and where they should run. Which notebooks make up a job? In other words; which pieces of code make up the pipeline that transforms raw data into data products that deliver value to the business? What clusters are needed to run these pipelines on? What environments do we have and which jobs belong on which environment?

The declarative way you configure DABs ensures a simple, scalable and replicable solution to all these questions. Declarative means that you do not describe the steps to connect and deploy resources, but only the desired end state. To illustrate; you simply describe which notebooks in what order work together to ensure that raw financial data from source A is poured into a usable quarterly report. You configure which cluster should perform these calculations AND on which Databricks environment you wish to perform this job. A simple ‘deploy’ command from the Databricks CLI then takes care of automatic provisioning and updates.

Why provisioning with DABs?

Fast and repeatable setup of environments
DABs make it possible to deploy entire Databricks workspaces – including code, configuration and infrastructure – in one go. So a new environment is ready in minutes, instead of hours or days.
Consistent and error-free deployments
By working with ‘infrastructure-as-code’, all environments are set up identically. This prevents human errors and ensures that development, test and production environments remain synchronous.
Easy promotion between environments
With just one command, code can be promoted from development to test, acceptance or production. No more manual steps – yet reliability and speed.
Seamless integration with CI/CD pipelines
Bundles can be integrated directly into existing CI/CD processes. This allows jobs to be automatically tested and deployed via the CLI, without the intervention of engineers.
More focus for data engineers
Less time spent on infrastructure, more time building stable, valuable data products. Automation takes repetitive work out of their hands.
Standardisation for platform teams
DABs bring a uniform way of working between teams. This makes the platform scalable, maintainable and easy to manage.
Transparency and governance
All configurations are fixed in version management. So it is always clear who modified what, where and when. This increases control over deployments and strengthens trust among stakeholders.

How Blenddata helps provisioning with SGIs

As proven with several clients, Blenddata is the knowledge partner for provisioning and automating your Databricks environments. We successfully helped one of our customers migrate from a Dataiku platform to a Databricks platform, reducing costs and increasing data stability. Another satisfied client is a large financial service provider, where we helped build a robust Databricks platform from the start, with more than 10 teams running hundreds of data streams on a daily basis. We think along with you and search together for the best solution for your organisation.

Databricks Asset Bundles: Stable configuration of your workspaces

Tag(s)

Tag(s)

What are Databricks Asset Bundles?

Why provisioning with DABs?

How Blenddata helps provisioning with SGIs

Summary & next step

Does your company also need stable data products?

Roel Smits

Navigation

Sectors

Solutions

Follow us

Address