Announcing Version 1.0 of .NET for Apache Spark
Today, we announce the release of version 1.0 of .NET for Apache® Spark™, an open source package that brings .NET development to the Apache® Spark™ platform. This release is possible due to the combined efforts of Microsoft and the open source community. Version 1.0 includes support for .NET applications targeting .NET Standard 2.0 or later. Access to the Apache® Spark™ DataFrame APIs (versions 2.3, 2.4 and 3.0) and the ability to write Spark SQL and create user-defined functions (UDFs) are also included in the release.
The following code snippet is an example of using Spark to produce a word count from a document (browse the full sample here):
var docs = spark.Read().Option("header", true).Csv("documents.csv"); var filCol = Functions.Col("file"); var words = docs .Select( fileCol, // "a b c" => ["a", "b", "c"] Functions.Split( Functions.Col("words"), " ") .Alias("wordList")) // flatten into one row per word .Select( fileCol, // 1: ["a", "b", "c"] => 1: "a", 2: "b", 3: "c" Functions.Explode( Functions.Col("wordList")) .Alias("word")) .GroupBy(fileCol, Functions.Lower(Functions.Col("word"))) .Count();
.NET for Apache® Spark™ launched two years ago to address increasing demand from the .NET community for an easier way to build big data applications. A recent survey confirmed the biggest motivation to use the package is to take advantage of existing .NET development skills and resources, including the enormous .NET ecosystem of existing libraries and frameworks. The team is committed to the continuous evolution of the product to integrate the latest features and keep the API current with the latest Spark versions. For more about the history of the project and key contributors, read the full announcement.
There are several options to get started. First, read the full .NET for Apache Spark 1.0 announcement. Then you can:
- Browse our online .NET for Apache Spark documentation
- Take the tutorial: Get started with .NET for Apache Spark
- Submit jobs to run on Azure and analyze data in real-time notebooks using .NET for Apache Spark with Azure Synapse Analytics
- Visit and consider contributing to our open source repository