Announcing Version 1.0 of .NET for Apache Spark

Jeremy

Today, we announce the release of version 1.0 of .NET for Apache® Spark™, an open source package that brings .NET development to the Apache® Spark™ platform. This release is possible due to the combined efforts of Microsoft and the open source community. Version 1.0 includes support for .NET applications targeting .NET Standard 2.0 or later. Access to the Apache® Spark™ DataFrame APIs (versions 2.3, 2.4 and 3.0) and the ability to write Spark SQL and create user-defined functions (UDFs) are also included in the release.

The .NET Bot

The following code snippet is an example of using Spark to produce a word count from a document (browse the full sample here):

var docs = spark.Read().Option("header", true).Csv("documents.csv");
var filCol = Functions.Col("file");
var words = docs
    .Select(
        fileCol,
        // "a b c" => ["a", "b", "c"]
        Functions.Split(
            Functions.Col("words"), " ")
        .Alias("wordList"))
    // flatten into one row per word
    .Select(
        fileCol,
        // 1: ["a", "b", "c"] => 1: "a", 2: "b", 3: "c"
        Functions.Explode(
            Functions.Col("wordList"))
        .Alias("word"))
    .GroupBy(fileCol, Functions.Lower(Functions.Col("word")))
    .Count();

Background

.NET for Apache® Spark™ launched two years ago to address increasing demand from the .NET community for an easier way to build big data applications. A recent survey confirmed the biggest motivation to use the package is to take advantage of existing .NET development skills and resources, including the enormous .NET ecosystem of existing libraries and frameworks. The team is committed to the continuous evolution of the product to integrate the latest features and keep the API current with the latest Spark versions. For more about the history of the project and key contributors, read the full announcement.

Get Started

There are several options to get started. First, read the full .NET for Apache Spark 1.0 announcement. Then you can:

6 comments

Comments are closed. Login to edit/delete your existing comments

  • saint4eva

    The acquisition process is still not easy.

    A lot of moving parts that need to be stitched together before the proper coding.

    • Michael RysMicrosoft employee

      @saint4eva: How are you planning on using it? We provide support for it out of the box in Azure HDInsight and Azure Synapse Spark pools. If you like to see it in Databricks I suggest to reach out to Databricks (if they see customer demand, I am sure they will consider it).

  • kirts

    Is there support yet for delta lake? If not, is that on the roadmap?

  • Dusan F

    Meanwhile our company moved from Apache Spark (java) to Flink. Even courses (pluralsight) are comparing Hadoop to 3G, Spark to 4G and Flink to 5G. Is there plan for 5G in .NET? Is there some project trying to port Flink stuff to .NET?

    • Michael RysMicrosoft employee

      Hi Dusan, if you have good use cases where you prefer Flink, I would suggest to file a feature request at the Azure Synapse uservoice. Once we see an increase in demand we can look into it.