MLOps in Focus: Unpacking Modular Strategies & Industry Insights

Thu Apr 17 2025
 - Raamstraat 7, 1016 XL Amsterdam
Event
Data & Drinks
The first Data & Drinks event of 2025 will offer an exciting exploration of Apache DataFusion and its potential to shape the future of data systems! DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources.

This edition, which takes place on Thursday the 23rd of January at Xomnia's HQ in the heart of Amsterdam, features three insightful talks from Apache DataFusion contributors, showcasing the inner workings of the project and real-world applications, and providing an opportunity to explore the diverse possibilities Apache DataFusion unlocks for data-centric systems.

This free event includes dinner, drinks and a lot of networking opportunities with data professionals from Amsterdam and beyond.

Abstracts

Talk 1: Intro to DataFusion: Technology, Community, and Not Quite Enough Time by Andrew Lamb
Andrew delves into the architecture, modularity, and tradeoffs of Apache DataFusion, a high-performance Rust-based query engine, and how it's employed in building advanced data systems.

Talk 2: Building A Unified Compute Engine with Apache DataFusion by Mehmet Ozan Kabak
This talk explores how DataFusion’s modular architecture enables the vision of “unified” compute engines. Ozan will discuss how its extensibility addresses core engine-level limitations and empowers streamlined solutions for data and AI workloads, while also considering the challenges that remain.

Talk 3: Distributed Joins with DataFusion at Coralogix
A technical deep dive into optimizing performance for large-scale group aggregations with Apache DataFusion.

Biographies of the speakers:

  • Andrew Lamb, InfluxData, Staff Engineer, Apache DataFusion PMC Chair:\
  • After spending many years as C/C++ systems programmer (databases and compilers), and a stint working on Machine Learning startups (as one does), Andrew now works at InfluxData and a talented team of engineers on InfluxDB IOx, a new engine for time series data.
  • Mehmet Ozan Kabak, CEO & Co-founder of Synnada, Apache DataFusion PMC:
  • After diving deep into distributed systems and big data throughout his career path through various startups and Meta, he now leads Synnada as CEO, bringing his Stanford Ph.D. and extensive machine learning expertise to build next-generation data infrastructure. His journey has consistently revolved around tackling large-scale distributed systems challenges and advancing the field of applied machine learning.
  • Daniël Heres, Apache DataFusion & Arrow PMC, Senior Software Engineer at Coralogix:
  • Daniël Heres is Apache Arrow and DataFusion PMC, and Software Engineer Query Engine at Coralogix. He was previously Data / ML Engineer at Godatadriven and Data / ML Engineer at bol.com.

Agenda

  • Purpose: To provide highly processed data ready for business intelligence (BI), analytics and machine learning.
  • Description: The Gold layer contains data that has been aggregated, summarized, and structured to support specific business use cases. This layer delivers high-quality data ready for the end-users, enabling them to generate insights and make data-driven decisions.
  • Inclusions: Aggregated data, key performance indicators (KPIs), business metrics and features for ML models.
  • Example: Aggregating taxi ride data to calculate total trips per day, average fare per trip, and other relevant metrics.
  • Who can work with it: Business analysts, who can utilize the aggregated and processed data for generating business insights, KPIs, reports and other business metrics; executives and managers, who can leverage dashboards and reports generated from this layer for decision-making; data analysts, who can use it to access business-ready data for generating reports, dashboards, and conducting detailed analysis; data scientists, who can perform advanced analytics and predictive modeling on refined datasets; AI/ML engineers, to develop, optimize, and deploy AI and machine learning models, leveraging the structured, highly processed and quality data; BI tools: Integrated BI tools access this data to provide visualizations and insights for various stakeholders (e.g., Power BI).
Thu Apr 17 2025
Raamstraat 7, 1016 XL Amsterdam
crossmenuchevron-down