Insights from the First European DataFusion meetup

October 17, 2024

This blog is co-authored by Marko Grujic and Jezz Kelway.

Commitment to open source is one of EnterpriseDB’s defining principles, with 20+ years of Postgres innovation and adoption behind it. As the use cases for Postgres expand, EDB grows to address these exciting spaces, as can be seen with the emergence of a new analytics and AI team.

This new team is heavily invested in Rust, which seems to be on a good path to become an important facet of data engineering in future. Specifically, we rely on DataFusion to, among other things, accelerate our analytical queries, as well as the surrounding ecosystem (arrow-rs, delta-rs, etc.) to facilitate the separation of compute and storage layers. This ultimately allowed us to implement a modern Lakehouse architecture centered on Postgres.

Intro to DataFusion

EDB

DataFusion is a performant query engine at minimum, and more generally a fully-fledged, versatile database-building framework with high extensibility and modularity. 

It is also an open source Apache project, with a strong, talented and growing community.

Besides 6k+ github stars, hundreds of distinct contributors and dozens of committers, the community enthusiasm spilled over into the physical world this year, with a number of meetups taking place around the world, mostly in North America.

After some initial intros and discussions with the members of the community, and seeing an opportunity to give back something (aside from bug reports and occasional fixes), the analytics team on behalf of EDB decided to throw itself into organizing the first European DataFusion meetup.

Belgrade DataFusion meetup 

Artjoms

The meetup took place on the last Friday of September in Belgrade, Serbia, where key members of the EDB analytics team are located. Microsoft Development Center Serbia were kind enough to provide its new office space near the confluence of Danube and Sava rivers (Thank you!) as an excellent venue for the evening.

The turnout was quite good with close to 70 attendees from various data-related roles such as database development, data engineering, data science and machine learning, which is an appropriate target audience for DataFusion. The format was a quick succession of 15 min talks, with some in-depth face-to-face discussions and networking before and after.

Key topics covered

As for the talks themselves, we had 6 speakers, 2 of which were from EDB:

  • Andrew Lamb (PMC chair, InfluxDB) gave a quick intro to the idea of deconstructed databases, and his analogy between DataFusion and LLVM.
  • Artjoms Iskovs (Principal Engineer @ EDB) talked about using a caching layer at the object store level for speeding up queries.
  • Mehemet Ozan Kabak (PMC, Synnada CEO) presented a case for unified streaming data & AI workloads based on DataFusion, as provided with the Synnada platform.
  • Marko Grujic (EDB) demonstrated the versatility of DataFusion, by talking about how it can be used to build database replication technology.
  • Nick Karlov (R&D head @ Tarantool) gave a talk on building scalable HTAP systems with DataFusion.
  • Piotr Findeisen (SDF) spoke about the current state of the type system in DataFusion and Arrow, and presented a vision of how to improve it by separating logical and physical types explicitly.

Future Outlook

EDB

The event was a success with feedback indicating a very positive reception from the attendees, especially given the fact that this was the first meetup of a very popular Rust-based, data-centric Apache project in Europe. 

In addition, from the organizational point of view, the community's excitement was palpable, and it's clear that there is a shared desire to push the boundaries of what can be achieved with DataFusion. This energy is essential as it fuels innovation and rapid advancements in the framework.

Whether you’re a seasoned contributor or someone just discovering DataFusion, we encourage everyone to join the conversation, attend the meetups, and contribute ideas or code. Together, we can help shape the future of the broader open-source data ecosystem.

And as for EDB, we look forward to closer collaboration with the DataFusion community in the future and a repeat of the meetup in due time.

EDB
 

Share this

More Blogs

Live from SC24: Why EDB is Here, and Why Now

Supercomputing is where the world’s leading technologists gather to showcase breakthroughs in AI and high-performance computing. It’s an international conference that thrives on big ideas and bold innovations. But one...
November 19, 2024