Mbkuae Stack

Zero-Copy Data Loading: mssql-python Now Natively Supports Apache Arrow for Blazing Fast SQL Server Queries

mssql-python now natively fetches SQL Server data as Apache Arrow structures, eliminating Python objects and GC pressure for major speed and memory gains.

Mbkuae Stack · 2026-05-06 01:47:36 · Data Science

Breaking News — The mssql-python database driver for SQL Server has just received a massive performance upgrade: native support for Apache Arrow data structures. This new feature, contributed by community developer Felix Graßl (@ffelixg), allows Python data engineers to fetch millions of rows directly into Arrow-native libraries like Polars, Pandas, DuckDB, and Hugging Face datasets without creating a single intermediate Python object.

“Fetching a million rows from SQL Server into a Polars DataFrame used to mean a million Python objects, a million garbage-collection allocations, and then throwing it all away to build a DataFrame. Not anymore,” said Sumit Sarabhai, reviewer of the mssql-python project. “This approach eliminates Python object creation per row and dramatically reduces memory pressure.”

The update taps into Apache Arrow’s zero-copy interoperability through the Arrow C Data Interface, a cross-language ABI (Application Binary Interface). With this, the entire fetch loop runs in C++ and writes values directly into Arrow buffers—no serialization, no copies, and no re-parsing.

Background: What Is Apache Arrow?

Apache Arrow defines a stable, columnar in-memory format that stores all values for a column contiguously in a typed buffer. Nulls are tracked via a compact bitmap rather than per-cell None objects. This design enables direct, zero-copy data exchange between languages such as C++ and Python.

Zero-Copy Data Loading: mssql-python Now Natively Supports Apache Arrow for Blazing Fast SQL Server Queries
Source: devblogs.microsoft.com

For a database driver, this means that the DataFrame library receives a pointer to that memory and can operate on it immediately. Subsequent operations like filters, joins, and aggregations also work in-place on the same buffers—never materializing intermediate Python objects.

What This Means for Developers

The integration translates into four concrete benefits:

  • Speed: The columnar fetch path avoids Python object creation per row, especially improving performance for temporal types like DATETIME and DATETIMEOFFSET, where per-value conversions are eliminated.
  • Lower memory usage: A column of one million integers becomes a single contiguous C array, not a million individual Python objects.
  • Seamless interoperability: Data can be passed directly to Polars, Pandas (via ArrowDtype), DuckDB, Hugging Face datasets, and other Arrow-native libraries without conversion overhead.
  • Simpler code: No need to manually convert result sets—fetch Arrow data in one call and process immediately.

“This is a game-changer for Python data workflows connecting to SQL Server,” said Felix Graßl, the contributor. “Systems that rely on high-throughput data pipelines will see immediate gains.”

Zero-Copy Data Loading: mssql-python Now Natively Supports Apache Arrow for Blazing Fast SQL Server Queries
Source: devblogs.microsoft.com

Technical Details

Under the hood, mssql-python now implements the Arrow C Data Interface. This standard ABI allows a C++ driver and a Python DataFrame library to operate on the exact same memory without either knowing about the other’s internals. The implementation is the work of Felix Graßl, who contributed it as a pull request to the mssql-python repository.

Users can start using the feature immediately by upgrading to the latest version of mssql-python and enabling the Arrow fetch mode in their connection settings. The change is backward-compatible—existing row-based fetch code continues to work without modification.

Outlook

With this update, mssql-python joins a growing list of database drivers adopting Arrow-native data exchange. The move signals a broader industry shift toward zero-copy, columnar data processing, particularly relevant for machine learning, real-time analytics, and large-scale ETL pipelines.

For more details, refer to the official mssql-python documentation or the Apache Arrow specification.

Recommended