PyQuant News: 8 ways pandas really losing to Polars for quick market data analysis

8 ways pandas really losing to Polars for quick market data analysis

In today’s newsletter, you’ll use Polars, a high-speed data-handling tool that's becoming essential in quantitative finance and algorithmic trading.

You’ll see how to compare its performance to pandas for many common data manipulation techniques.

By the end of this post, you'll understand how Polars can improve your data processing speed, especially when working with large datasets.

So if you've been looking for a more efficient DataFrame library, this is one issue you don't want to miss.

Let's dive in!

8 ways pandas really losing to Polars for quick market data analysis

Polars is a DataFrame library designed for speed and efficiency.

It’s written in Rust and uses parallel execution to process data across multiple CPU cores. This makes it faster than many other DataFrame libraries, including pandas, making it a good choice for tasks that involve large amounts of data.

Despite being written in Rust, Polars provides a Python API that is easy to use and familiar to those who have experience with Python.

This makes it accessible to a wide range of users, including data scientists and researchers.

The choice between the two will depend on the size of your data and how crucial performance is for your work.

8 ways pandas is really losing to Polars for quick market data analysis. Polars is a high-speed data-handling tool for algorithmic trading.

Imports and set up

Make sure you run the code in a Jupyter Notebook so you can use the %timeit magic. Then, start by importing pandas, Polars, and OpenBB.

Now we're ready.

Reading data from CSV

Reading data from CSVs is common. Here’s how to do it.

If you’ve never seen the output of %timeit before, the first number is the average time it takes to run the operation. In this case pandas took 458 ms per loop and Polars took 3.57 ms per loop.

Polars is 99% faster at reading data from a CSV than pandas.

Selecting data

Selecting data from from columns is also common.

Notice the difference in syntax. Polars requires the list of columns to selected be wrapped in the pl.col method.

Filtering data

How about filtering data.

Polars takes about half the time for simple filter operations.

Grouping data

pandas groups and aggregates in 113 ms while Polars does it in 16.5 ms.

Adding new columns

Polars is slower than pandas when filling nulls and nans by about 3x.

Imputing missing data

Sorting data

pandas and Polars are pretty close in sorting, but Polars is still faster.

Calculating rolling statistics

Polars beats pandas in a simple 20-day rolling mean calculation.

This demonstration only scratches the surface of Polars. It’s also important to note the syntax is different from pandas so there is a learning curve to use it. And as always, it’s important to use the tool that does the job for you. If you’re dealing with massive data sets of tens or hundreds of GBs, then Polars is a good option. If not, then pandas will work fine.

Action steps

Your action steps today are to get Polars installed and start getting comfortable with the syntax. Then, download a multi-gigabyte dataset from your favorite source and run some tests on your own.