{"id":10301,"date":"2026-05-03T20:33:00","date_gmt":"2026-05-04T04:33:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/python\/?p=10301"},"modified":"2026-05-03T20:33:00","modified_gmt":"2026-05-04T04:33:00","slug":"introducing-apache-arrow-support-in-mssql-python","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/python\/introducing-apache-arrow-support-in-mssql-python\/","title":{"rendered":"Introducing Apache Arrow Support in mssql-python"},"content":{"rendered":"<p><span style=\"font-family: helvetica, arial, sans-serif;\"><!--ScriptorStartFragment--><\/span><\/p>\n<div>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2025\/07\/c1014e61-a66d-4807-ab58-655671044f49.png\"><img decoding=\"async\" class=\"alignnone wp-image-5513 size-large\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2025\/07\/c1014e61-a66d-4807-ab58-655671044f49-1024x519.png\" alt=\"c1014e61 a66d 4807 ab58 655671044f49 image\" width=\"1024\" height=\"519\" \/><\/a><\/span><\/p>\n<p><em><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Reviewed by Sumit Sarabhai<\/span><\/em><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Fetching a million rows from SQL Server into a <a href=\"https:\/\/pola.rs\/\">Polars<\/a> DataFrame used to mean a million Python objects, a million GC allocations, and then throwing it all away to build a DataFrame. Not anymore. <a href=\"https:\/\/github.com\/microsoft\/mssql-python\">mssql-python<\/a> now supports fetching SQL Server data directly as Apache Arrow structures &#8211; a faster and more memory-efficient path for anyone working with SQL Server data in Polars, Pandas, DuckDB, or any other Arrow-native library. This feature was contributed by community developer <strong>Felix Gra\u00dfl (@ffelixg)<\/strong>, and we are thrilled to ship it.<\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><div class=\"alert alert-primary\"><p class=\"alert-divider\"><i class=\"fabric-icon fabric-icon--Info\"><\/i><strong>Key Terms<\/strong><\/p><strong>API (Application Programming Interface):<\/strong> a source-code contract that defines how to call a function or library.<\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><strong>ABI (Application Binary Interface):<\/strong> a binary-level contract that specifies how compiled code is laid out in memory. Two programs built in different languages can share an ABI and exchange data directly &#8211; no serialization is needed.<\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><strong>Arrow C Data Interface:<\/strong> Apache Arrow&#8217;s ABI specification &#8211; the standard that makes zero-copy data exchange between languages possible.<\/div><\/span><\/p>\n<h2><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">What Is Apache Arrow?<\/span><\/h2>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">The key insight behind <a href=\"https:\/\/arrow.apache.org\/\">Apache Arrow<\/a> is <strong>zero-copy language interoperability<\/strong>. Arrow defines a stable shared-memory layout &#8211; the <strong>Arrow C Data Interface<\/strong>, a cross-language ABI &#8211; that any language can produce or consume by exchanging a pointer, with no serialization, no copies, and no re-parsing. A C++ database driver and a Python DataFrame library can work on the exact same memory without either one knowing about the other.<\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Built on top of that, Arrow uses a <strong>columnar in-memory format<\/strong>: instead of representing a table as a list of rows, each row a collection of Python objects, Arrow stores all values for a column contiguously in a typed buffer. Nulls are tracked in a compact bitmap rather than per-cell <code>None<\/code> objects.<\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">For a database driver, this means the entire fetch loop can run in C++ and write values directly into Arrow buffers &#8211; no Python object creation per row, no garbage-collector pressure. The DataFrame library receives a pointer to that memory and can begin operating on it immediately. Crucially, subsequent operations &#8211; filters, joins, aggregations &#8211; also work in-place on those same buffers. A Polars pipeline reading from mssql-python never needs to materialize intermediate Python objects at any stage, making Arrow the right foundation for high-throughput data processing pipelines.<\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">For users of mssql-python, this translates into four concrete benefits:<\/span><\/p>\n<ul>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><strong>Speed:<\/strong> The columnar fetch path avoids Python object creation per row, which should make fetching noticeably faster for many SQL Server types &#8211; especially temporal types like <code>DATETIME<\/code> and <code>DATETIMEOFFSET<\/code>, where Python-side per-value conversions are eliminated entirely.<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><strong>Lower memory usage:<\/strong> A column of one million integers is a single contiguous C array, not a million individual Python objects.<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><strong>Seamless interoperability:<\/strong> Polars, Pandas (via <code>ArrowDtype<\/code>), DuckDB, Hugging Face datasets, and many other libraries all speak Arrow natively. Zero-copy hand-off between mssql-python and those tools.<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><strong>Purely additive:<\/strong> Your existing <code>fetchone<\/code>, <code>fetchmany<\/code>, and <code>fetchall<\/code> code is completely unaffected. You opt in only where you need it.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><div class=\"alert alert-warning\"><strong>Try it here: <a href=\"https:\/\/github.com\/microsoft\/mssql-python\">pip install mssql-python<\/a><\/strong><\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Calling all Python + SQL developers! We invite the community to try out mssql-python and help us shape the future of high-performance <a href=\"https:\/\/github.com\/microsoft\/mssql-python\">SQL Server connectivity in Python<\/a>.!<\/div><\/span><\/p>\n<h2><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">The Arrow Fetch APIs<\/span><\/h2>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Three APIs have been added to the <code>Cursor<\/code> object.<\/span><\/p>\n<h4><span style=\"font-family: 'trebuchet ms', geneva, sans-serif; font-size: 12pt;\"><code>1. cursor.arrow_batch(batch_size=8192)<\/code> \u2192 <code>pyarrow.RecordBatch<\/code><\/span><\/h4>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Fetches the next batch of up to <code>batch_size<\/code> rows as an Arrow <code>RecordBatch<\/code> and advances the cursor. <code>RecordBatch<\/code>es are the building block for more high-level Arrow data types like tables and the batch reader interface.<\/span><\/p>\n<pre style=\"background-color: #f5f5f5; border: 1px solid #ddd; padding: 15px; font-family: 'Courier New', monospace; font-size: 10.5pt; overflow-x: auto; line-height: 1.5;\">import mssql_python\r\n\r\nconn   = mssql_python.connect(conn_str)\r\ncursor = conn.cursor()\r\ncursor.execute(\"SELECT * FROM SalesData\")\r\n\r\npartial_data = cursor.arrow_batch(batch_size=50000)\r\nprocess(partial_data)   # pyarrow.RecordBatch<\/pre>\n<h4><span style=\"font-family: 'trebuchet ms', geneva, sans-serif; font-size: 12pt;\"><code>2. cursor.arrow(batch_size=8192)<\/code> \u2192 <code>pyarrow.Table<\/code><\/span><\/h4>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Eagerly fetches the entire result set into a single Arrow <code>Table<\/code>. This is the simplest path and works well for analytics queries where the result fits comfortably in memory. However, because it materialises the full result set at once, it can cause high peak RAM usage or out-of-memory errors on very large or unbounded queries. For large exports or ETL workloads, prefer <code>cursor.arrow_reader()<\/code> (streaming, fetches lazily) or <code>cursor.arrow_batch()<\/code> (fetch one batch at a time). In both cases, <code>batch_size<\/code> is a tuning knob: larger batches improve throughput but increase peak memory; smaller batches reduce memory at the cost of slightly more per-batch overhead.<\/span><\/p>\n<pre style=\"background-color: #f5f5f5; border: 1px solid #ddd; padding: 15px; font-family: 'Courier New', monospace; font-size: 10.5pt; overflow-x: auto; line-height: 1.5;\">cursor.execute(\"SELECT customer_id, order_date, amount FROM Orders\")\r\ntable = cursor.arrow()\r\n\r\n# Zero-copy conversion to Polars\r\nimport polars as pl\r\ndf = pl.DataFrame(table)\r\n\r\n# Or to Pandas with Arrow-backed dtypes\r\nimport pandas as pd\r\ndf = table.to_pandas(types_mapper=pd.ArrowDtype)<\/pre>\n<h4><span style=\"font-family: 'trebuchet ms', geneva, sans-serif; font-size: 12pt;\"><code>3. cursor.arrow_reader(batch_size=8192)<\/code> \u2192 <code>pyarrow.RecordBatchReader<\/code><\/span><\/h4>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Returns a lazy <code>RecordBatchReader<\/code>. Batches are fetched only when the reader is iterated, enabling streaming over very large result sets. <code>RecordBatchReader<\/code> is also accepted directly by DuckDB, Lance, and other Arrow-native libraries.<\/span><\/p>\n<pre style=\"background-color: #f5f5f5; border: 1px solid #ddd; padding: 15px; font-family: 'Courier New', monospace; font-size: 10.5pt; overflow-x: auto; line-height: 1.5;\">cursor.execute(\"SELECT * FROM LargeEventLog\")\r\nreader = cursor.arrow_reader(batch_size=100000)\r\n\r\nfor batch in reader:\r\n    sink.write(batch)<\/pre>\n<h2><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Testing<\/span><\/h2>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">We validated the Arrow fetch path against the standard Python row fetch path across a range of SQL Server types \u2014 numeric, temporal, string, and UUID &#8211; for both single-column and wide (20-column) tables. The full test script and results are available in the Resources section; we encourage you to run them on your own hardware to see the difference for your workload.<\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">In our testing, the Arrow path was consistently faster for most SQL Server types. Temporal types showed the largest gains: types like <code>DATETIME<\/code> and <code>DATETIMEOFFSET<\/code> benefit significantly because the Arrow path handles timezone normalization and value encoding entirely in C++, eliminating per-value Python-side conversions. <code>DATETIMEOFFSET<\/code> in particular showed some of the most pronounced speedups we observed.<\/span><\/p>\n<h5><span style=\"font-family: 'trebuchet ms', geneva, sans-serif; font-size: 12pt;\">JSON Serialization Bonus<\/span><\/h5>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">The Arrow path can also benefit API workloads that serialize results to JSON. Instead of <code>fetchall()<\/code> + <code>json.dumps()<\/code>, fetch via <code>cursor.arrow()<\/code>, wrap in a Polars DataFrame, and call <code>df.write_json()<\/code> &#8211; the entire pipeline bypasses Python objects and can be noticeably faster, especially for types like <code>DATETIMEOFFSET<\/code>.<\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><div class=\"alert alert-warning\"><p class=\"alert-divider\"><i class=\"fabric-icon fabric-icon--Warning\"><\/i><strong>NVARCHAR on Linux<\/strong><\/p>Our Linux tests show longer fetch times for <code>NVARCHAR<\/code> due to the current UTF-16 \u2192 UTF-8 conversion path. On Windows, <code>NVARCHAR<\/code> fetches consistently faster with Arrow. A fix is targeted for a follow-up release.<\/div><\/span><\/p>\n<h2><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Getting Started<\/span><\/h2>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Install or upgrade mssql-python, then add pyarrow:<\/span><\/p>\n<pre style=\"background-color: #f5f5f5; border: 1px solid #ddd; padding: 15px; font-family: 'Courier New', monospace; font-size: 10.5pt; overflow-x: auto; line-height: 1.5;\">pip install mssql-python pyarrow<\/pre>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">For IDE type hints and static type checking:<\/span><\/p>\n<pre style=\"background-color: #f5f5f5; border: 1px solid #ddd; padding: 15px; font-family: 'Courier New', monospace; font-size: 10.5pt; overflow-x: auto; line-height: 1.5;\">pip install pyarrow-stubs<\/pre>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Then swap in <code>cursor.arrow()<\/code> wherever you would have called <code>fetchall()<\/code> and converted to a DataFrame. Your existing code is completely unaffected \u2014 Arrow support is purely additive.<\/span><\/p>\n<pre style=\"background-color: #f5f5f5; border: 1px solid #ddd; padding: 15px; font-family: 'Courier New', monospace; font-size: 10.5pt; overflow-x: auto; line-height: 1.5;\">import mssql_python\r\nimport polars as pl\r\n\r\nconn   = mssql_python.connect(conn_str)\r\ncursor = conn.cursor()\r\n\r\ncursor.execute(\"SELECT * FROM dbo.LargeSalesTable\")\r\ndf = pl.DataFrame(cursor.arrow())\r\n\r\nprint(df.describe())<\/pre>\n<h2><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">What&#8217;s Next<\/span><\/h2>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">One known area we are actively working on to improve is <a href=\"https:\/\/github.com\/microsoft\/mssql-python\/pull\/526\">NVARCHAR performance on Linux<\/a>. SQL Server returns Unicode string data in UTF-16 encoding, which the driver must convert to UTF-8 before handing it to Arrow. On Windows this conversion uses a native system API that is very fast, but the current Linux code path goes through a slower chain of intermediate steps. As a result, <code>NVARCHAR<\/code> columns on Linux show longer fetch times compared to the Python fetch path \u2014 the opposite of every other type. A fix using a more efficient codec is in progress for a follow-up release. On Windows, our tests show <code>NVARCHAR<\/code> fetching noticeably faster with Arrow, and Linux will follow.<\/span><\/p>\n<h2><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">A Note of Thanks<\/span><\/h2>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">This feature was contributed by <strong>Felix Gra\u00dfl (@ffelixg)<\/strong>, the author of <a href=\"https:\/\/github.com\/ffelixg\/zodbc\">zodbc<\/a>, his own Zig-based ODBC driver. His deep familiarity with ODBC and Arrow made this a thorough, well-tested contribution covering both Linux and Windows, and all three fetch patterns. We are very grateful for his work and the care he brought to this feature.<\/span><\/p>\n<h2><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Resources<\/span><\/h2>\n<ul>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><a href=\"https:\/\/github.com\/microsoft\/mssql-python\">mssql-python on GitHub<\/a><\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><a href=\"https:\/\/github.com\/microsoft\/mssql-python\/pull\/354\">Arrow Support PR<\/a><\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><a href=\"https:\/\/gist.github.com\/ffelixg\/cb302e606920c88f5450f9fd20758e86\">Full test script and results<\/a><\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><a href=\"https:\/\/arrow.apache.org\/docs\/format\/CDataInterface.html\">Apache Arrow C Data Interface<\/a><\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><a href=\"https:\/\/github.com\/ffelixg\/zodbc\">zodbc \u2014 Felix&#8217;s Zig-based ODBC driver<\/a><\/span><\/li>\n<\/ul>\n<h3><strong><span style=\"font-family: 'trebuchet ms', geneva, sans-serif; font-size: 14pt;\">Try It and Share Your Feedback!\u00a0<\/span><\/strong><\/h3>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">We invite you to:\u00a0<\/span><\/p>\n<ol>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Check-out the <a href=\"https:\/\/github.com\/microsoft\/mssql-python\">mssql-python\u00a0<\/a>driver and integrate it into your projects.\u00a0<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Share your thoughts:\u00a0Open <a href=\"https:\/\/github.com\/microsoft\/mssql-python\/issues\">issues<\/a>, suggest features, and contribute to the project.\u00a0<\/span><\/li>\n<li><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">Join the conversation:\u00a0<a href=\"https:\/\/github.com\/microsoft\/mssql-python\/discussions\">GitHub Discussions<\/a>\u00a0|\u00a0<a href=\"https:\/\/techcommunity.microsoft.com\/category\/sql-server\/blog\/sqlserver\">SQL Server Tech Community<\/a>.\u00a0<\/span><\/li>\n<\/ol>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><div class=\"alert alert-success\"><p class=\"alert-divider\"><i class=\"fabric-icon fabric-icon--Lightbulb\"><\/i><strong>Use Python Driver with Free Azure SQL Database<\/strong><\/p>You can use the Python Driver with the free version of Azure SQL Database!<\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">\u2705 <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-sql\/database\/free-offer?view=azuresql\">Deploy Azure SQL Database for free<\/a><\/span><\/p>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\">\u2705 <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-sql\/managed-instance\/free-offer?view=azuresql\">Deploy Azure SQL Managed Instance for free<\/a> Perfect for testing, development, or learning scenarios without incurring costs.<\/div><\/span><\/p>\n<ul>\n<li style=\"list-style-type: none;\"><\/li>\n<\/ul>\n<p><span style=\"font-family: 'trebuchet ms', geneva, sans-serif;\"><div class=\"alert alert-primary\">Have questions or feedback? Open an <a href=\"https:\/\/github.com\/microsoft\/mssql-python\/issues\">issue<\/a> or <a href=\"https:\/\/github.com\/microsoft\/mssql-python\/discussions\">discussion<\/a> on GitHub, or reach out to the team at <a href=\"mailto:mssql-python@microsoft.com\">mssql-python@microsoft.com<\/a><\/div><\/span><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Reviewed by Sumit Sarabhai Fetching a million rows from SQL Server into a Polars DataFrame used to mean a million Python objects, a million GC allocations, and then throwing it all away to build a DataFrame. Not anymore. mssql-python now supports fetching SQL Server data directly as Apache Arrow structures &#8211; a faster and more [&hellip;]<\/p>\n","protected":false},"author":194671,"featured_media":10119,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3,1],"tags":[1277,1281,17,1283,1282,1278,1280],"class_list":["post-10301","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure","category-python","tag-azure-sql","tag-client-driver","tag-python","tag-python-driver-for-azure-sql","tag-python-driver-for-sql-server","tag-sql-server","tag-sql-server-2025"],"acf":[],"blog_post_summary":"<p>Reviewed by Sumit Sarabhai Fetching a million rows from SQL Server into a Polars DataFrame used to mean a million Python objects, a million GC allocations, and then throwing it all away to build a DataFrame. Not anymore. mssql-python now supports fetching SQL Server data directly as Apache Arrow structures &#8211; a faster and more [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/posts\/10301","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/users\/194671"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/comments?post=10301"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/posts\/10301\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/media\/10119"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/media?parent=10301"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/categories?post=10301"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/python\/wp-json\/wp\/v2\/tags?post=10301"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}