From Dataiku to Databricks: 5 lessons for a successful migration

In this project, we managed to migrate a client’s supply chain platform from Dataiku to Databricks in just six months. The driving force? A sustainable cost model where they no longer pay for annual licences, but only for actual usage. In addition, the new environment offered direct collaboration and management advantages, giving us a future-proof foundation. In this blog, you can read about the considerations we made. Would you also like to realise such a migration?

Why switch?

The data platform is the beating heart of consultancy activities. It drives valuable insights and strategic advice. With the ambition to scale up further, the client is fully committed to growth. This platform unlocks client data and external sources, interprets them and translates them into concrete insights for supply chain optimisation. The move to Databricks was therefore a well-considered choice, based on multiple strategic advantages:

Cost savings: Switching to Databricks saves up to 65% annually on licence costs by using a pay-per-use model instead of a fixed annual licence.
Client adoption: Many organisations already use Databricks. By connecting to this standard, solutions can be more easily integrated into existing client environments.
Upsell opportunities: Using technology already known to the client makes collaboration with internal data teams easier. This creates opportunities for expansion of existing services.
Scalability and flexibility: Databricks is a widely used solution within the modern data ecosystem. The wide availability of knowledge on the labour market makes it easier to deploy new colleagues or external specialists, without dependence on specific knowledge.
Professionalisation of the development process: Switching to Databricks created the opportunity to structurally improve software principles such as version management, separate development and production environments and code quality.

Five lessons from our migration

Lesson 1 – Redesign your architecture

In Dataiku, transformations could occur on any dataset, leading to a complex and cluttered data flow. During the migration, we redesigned the logic according to a medallion architecture (data layers based on Databricks’ bronze, silver, gold principle). This structure simplifies the data set-up and assigns clear responsibilities to each layer. This improves insight into data lineage and makes it easier to detect errors. This prevents proliferation of logic and keeps the architecture scalable as the platform grows. Combined with Unity Catalog, we manage data and access rights centrally. This keeps data governance in order, whether you are working with one Databricks workspace or five.

Lesson 2 – Define the tipping point from visual to code-first

Dataiku’s drag-and-drop flows offer a low-threshold way to iterate quickly, making them ideal for quick proof of concepts. However, as multiple developers work together and complexity increases, this way of working also brings challenges:

Insufficient insight into who made which changes.
Fragmented logic spread across the pipeline.
Limited opportunities for automated testing and review.

By switching to code-first in Databricks, we achieve the following:

Full version control via Git.
Modular and reusable functions and modules.
Efficient management of dependencies, without loose visual configurations.

This approach promotes transparency, facilitates collaboration, and results in robust pipelines that grow with the data team.

Lesson 3 – Focus on just your business logic

In the Dataiku setup, it took developers more time to get started delivering value. Before business logic could be used, it had to be made available. This required extra code around UI components, which took extra development time. With the new way of working in Databricks, teams can focus from the start on implementing logic that directly contributes to insights and decision-making. Less detours, more focus on innovation and concrete impact.

Lesson 4 – Automate infrastructure and deployments

With Infrastructure as Code, we define all Databricks resources in code, eliminating manual creation. This provides a reliable and consistent way to make new components available across all environments.This way of working delivers clear advantages over manual processes:

New customer environments are made available automatically.
Less time spent on manual work means more time for valuable insights.
All environments are set up in the same way, which prevents errors and anomalies.

Lesson 5 – Build data quality into the pipeline

In the previous setup, debugging was cumbersome. An error in the data took a long time to find the cause of. With the new setup in Databricks, data quality is built into the pipeline from the start. Important checks, such as completeness, logical values and identifying outliers, are performed automatically. As soon as the data does not comply, the process is stopped immediately and the client receives a notification.
This delivers concrete benefits:

Fast error detection and limited recovery work.
Less time spent on manual checks.
Reliable data as a foundation for analysis and decision-making.

Results in figures

KPI: Consumption costs
Before migration: Fixed licensing model
After migration: Pay-per-use model, savings of 65%

Exact € amounts vary by workload; the biggest savings come from converting from a fixed licence fee to pay-per-use model.

With this migration, the supply chain platform is ready for the future. It matches the technological preferences of its customers, is scalable, reliable and maintenance-friendly. Moreover, it lays the foundation for further innovation in AI and machine learning.

Ready to future-proof your data platform?

Want to know more about how we can help your organisation migrate to a scalable and future-proof data platform? Contact us or check out our Cases.

From Dataiku to Databricks

Why switch?

Five lessons from our migration

Lesson 1 – Redesign your architecture

Lesson 2 – Define the tipping point from visual to code-first

Lesson 3 – Focus on just your business logic

Lesson 4 – Automate infrastructure and deployments

Lesson 5 – Build data quality into the pipeline

Results in figures

Ready to future-proof your data platform?

Let's talk business!

Roel Smits

Navigation

Sectors

Solutions

Follow us

Address