Xomnia Helps Alliander with 98% Cost Reduction using Polars

Industry

Energy & Utilities

Topic

Data Science

In this case study, we discuss how we converted a large codebase, consisting of approximately 20,000 lines of code, to Polars. This conversion resulted in a 98% reduction in operational costs and significantly improved the maintainability of the codebase, running a computation that would have otherwise cost €140,000 for just €5,000. We explain how we accomplished this and share the lessons we learned along the way.

01

Challenge

Alliander is one of the largest utility companies in the Netherlands, managing over a third of the electrical and gas distribution grid. The company oversees more than 3 million connections (both commercial and residential), 36,000 networks, and 48,000 km of cables. Every year, about 1,000 networks and 500 km of cables are added or renewed.

Due to the increasing popularity of solar panels, heat pumps, and EV charging stations, many networks are already at capacity. The original network simply wasn't built for this. As a result, requests for installing solar panels, heat pumps, or EV charging stations are more often being denied, especially for large corporations or apartment complexes. Requests for ordinary homes are typically accepted although increasingly at risk.

As such, Alliander has to renew and strengthen the entire network over the next decade, which is both costly and time-consuming. The challenge is determining where to begin with feasibility limitations (i.e. available workforce and resource availability). Which of the 36,000 networks should be improved first? This is where the DELVI project comes into play.

The Decision Enabler on Low Voltage Impact (DELVI) project focuses on predicting the health and future performance of these networks. It starts by forecasting how usage patterns could change in the future and predicts how this impacts the load profiles of customers. Furthermore, leveraging a stochastic approach, it uses these probabilities and Monte Carlo sampling to provide a what-if view into how the load might vary in the future on a 15-min timescale. It then performs a power-flow analysis to compute the behavior of these loads on the grid, indicating where possible contingencies or bottlenecks might arise. Its results are often used for strategic or operational planning.

In the past, the codebase consisted of both Python and R. On the Python side, we used the pandas package for processing data. The R code was invoked using the rpy2 package. Processing one network could take up to five hours. The simulation could only sample 25 data points, as anything more would result in running out of memory. This process used over 500 GB of RAM, and the associated costs were enormous.

At a certain point, management set a target budget of €5,000 for processing all 36,000 networks. As this was obviously not feasible with the old codebase, something had to change.

Discovering Polars

Polars was first developed by Ritchie Vink while he was our colleague at Xomnia. Eventually Ritchie left Xomnia to found Polars Inc. together with co-founder Chiel Peters, also a former Xomnian.

When we first tried out Polars we were immediately convinced of its potential. Polars wasn't just fast, the elegant syntax was a joy to use. A common saying in the Polars community is: “Come for the speed, stay for the API."

In fact, we were so enthusiastic about Polars, that we figured it deserved a proper book. Luckily O'Reilly agreed with us and after 1,5 years of writing daily (and nightly) we completed Python Polars: The Definitive Guide.

Soon after we started writing, it hit us: why don't we use Polars for the DELVI project? We were certain it could handle the task, but the rest of the team wasn’t so easily convinced. At the time, Polars was still relatively unknown, and there wasn’t a stable release (no 1.0 version). Furthermore, the client also was concerned about the required effort to refactor a large pandas codebase. So, we had three key challenges to overcome:

Convincing our colleagues to use a relatively new tool developed by a small company.
The lack of a stable Polars release at the time.
The complexity of migrating from pandas to Polars.

Here's how we've overcome these challenges and what we've learned along the way.

02

Solution

Polars is a fast and increasingly popular package for processing data. It is blazingly fast. We ran several benchmarks ourselves, and while benchmarking can be challenging, it’s clear that Polars outperforms pandas on every query we tested (see the figure below).

Additionally, the popularity of Polars is growing. Below we compare the GitHub stars of various data processing packages. While popularity is not an end goal, it is indicative of the number of users and the long-term viability of the tool. When you're considering a new technology, it’s important to assess whether it will continue to be supported and if there’s an active community to address issues.

Key Lessons Learned

We’ve collaborated on this project for over 2 years, and along the way we’ve learned a couple of lessons we think are worthwhile sharing.

Show, Don’t Tell

In order to convince the team, we started by rewriting a small piece of code from pandas to Polars and measured the difference in performance. The results were striking: processing time dropped from 30 seconds to just 1 second.

So, the first and perhaps most important lesson was to demonstrate the performance benefits rather than just talking about them. Show the numbers and results, not opinions. Once we presented the hard data, the team was much more willing to give Polars a chance.

Benchmarking is Key

Another important lesson was to benchmark early and often. Whenever we made changes, we benchmarked to see the impact on both performance and memory usage. For instance, by switching from loading all data into memory at once to a more efficient batching approach, we reduced memory usage significantly. This optimization saved us money and allowed us to run more jobs in parallel.

Lazy Execution in Polars

Polars supports a "lazy" execution model or API, which was a game-changer for us. Unlike the eager API—where every operation is executed immediately as it's called—lazy execution defers these computations until they’re needed. Instead of processing each step one-by-one (which can be wasteful and redundant), the lazy API builds an optimized execution plan that combines multiple operations before running them. This approach not only cuts down on unnecessary memory usage but also allows Polars to apply smart optimizations (like reordering or eliminating redundant steps), yielding a significant performance boost. tIn our own benchmarks, we found that the lazy API was much faster than the eager API.

Optimizing Memory Usage

While working with Polars, we noticed that its memory management was much more efficient compared to pandas. One important trick we used was to cache intermediate results and only load the data that was necessary. This allowed us to process larger datasets with significantly less memory.

Polars is Less Surprising

Polars behaves in a more predictable manner than pandas. It doesn’t rely on the Index, which can cause unexpected behavior in pandas, such as incorrect joins or unintended results when performing operations like sorting. Additionally, Polars uses a more consistent approach to missing values, which simplifies working with different data types.

Transitioning Slowly from pandas to Polars

Switching from pandas to Polars was relatively easy because we didn't translate all at once. Because Polars allows for seamless conversion between Polars DataFrames and pandas DataFrames, you can take baby steps. If your pandas DataFrame is using the Arrow backend, the transition is even more efficient, as it’s a zero-copy operation.

03

Impact

After applying all the lessons learned, we were able to significantly improve our codebase. The processing time for a single network was reduced by 20%. The simulation could be based on 50 samples instead of 25, effectively doubling the amount of data processed. Memory usage was reduced by 92%, down from 500 GB of RAM to just 40 GB.

The entire codebase was converted to Python, eliminating the need for R, and fully migrated to Polars. This allowed us to process the entire grid (about 36,000 networks) for just 70% of the time and at a fraction of the cost. Had we used the old code, the project would have cost €140,000 to run, but with the new code, we brought that down dramatically to the budgeted €5,000.

* This fleet is used in mining and construction operations worldwide, which the aftermarket teams are aiming to keep running smoothly with minimal down-times. The Parts and Services division is responsible to proactively address potential issues in the mining and construction fleet.

In summary

The key takeaways from our experience with Polars are:

Show, don’t tell: Use data to prove the benefits of new tools
Benchmark early and often: Track performance and memory usage continuously
Use the lazy API: It’s highly optimized for performance
Cache intermediate results: Process larger datasets
Polars is consistent: Fewer surprises compared to pandas
Start small and iterate: Gradually refactor the codebase

By applying these principles, we were able to significantly improve both the performance and maintainability of our codebase. We highly recommend giving Polars a try, and we hope our experiences will help others get started with it. If you would like to explore what opportunities might be available for your organisation, we’re very happy to help you. Get in touch!

Written by

Jeroen Janssens - Senior ML Engineer at Xomnia, Author of Python Polars
Thijs Nieuwdorp - Senior Data Scientist at Xomnia, Author of Python Polars
Bram Timmers - Data Science Engineer at Alliander

Industry

Energy & Utilities

Topic

Data Science