5 lessons for a successful migration
Written by: Levy van Kempen
The data platform is the beating heart of consultancy activities. It drives valuable insights and strategic advice. With the ambition to scale up further, the client is fully committed to growth. This platform unlocks client data and external sources, interprets them and translates them into concrete insights for supply chain optimisation. The move to Databricks was therefore a well-considered choice, based on multiple strategic advantages:
In Dataiku, transformations could occur on any dataset, leading to a complex and cluttered data flow. During the migration, we redesigned the logic according to a medallion architecture (data layers based on Databricks’ bronze, silver, gold principle). This structure simplifies the data set-up and assigns clear responsibilities to each layer. This improves insight into data lineage and makes it easier to detect errors. This prevents proliferation of logic and keeps the architecture scalable as the platform grows. Combined with Unity Catalog, we manage data and access rights centrally. This keeps data governance in order, whether you are working with one Databricks workspace or five.
Dataiku’s drag-and-drop flows offer a low-threshold way to iterate quickly, making them ideal for quick proof of concepts. However, as multiple developers work together and complexity increases, this way of working also brings challenges:
By switching to code-first in Databricks, we achieve the following:
This approach promotes transparency, facilitates collaboration, and results in robust pipelines that grow with the data team.
In the Dataiku setup, it took developers more time to get started delivering value. Before business logic could be used, it had to be made available. This required extra code around UI components, which took extra development time. With the new way of working in Databricks, teams can focus from the start on implementing logic that directly contributes to insights and decision-making. Less detours, more focus on innovation and concrete impact.
With Infrastructure as Code, we define all Databricks resources in code, eliminating manual creation. This provides a reliable and consistent way to make new components available across all environments.
This way of working delivers clear advantages over manual processes:
In the previous setup, debugging was cumbersome. An error in the data took a long time to find the cause of.
With the new setup in Databricks, data quality is built into the pipeline from the start. Important checks, such as completeness, logical values and identifying outliers, are performed automatically. As soon as the data does not comply, the process is stopped immediately and the client receives a notification.
This delivers concrete benefits:
KPI: Consumption costs
Before migration: Fixed licensing model
After migration: Pay-per-use model, savings of 65%
Exact € amounts vary by workload; the biggest savings come from converting from a fixed licence fee to pay-per-use model.
With this migration, the supply chain platform is ready for the future. It matches the technological preferences of its customers, is scalable, reliable and maintenance-friendly. Moreover, it lays the foundation for further innovation in AI and machine learning.
Want to know more about how we can help your organisation migrate to a scalable and future-proof data platform? Contact us or check out our Cases.