{"id":5915,"date":"2018-05-01T13:14:10","date_gmt":"2018-05-01T20:14:10","guid":{"rendered":"\/developerblog\/?p=5915"},"modified":"2020-03-20T07:33:03","modified_gmt":"2020-03-20T14:33:03","slug":"runtime-configuration-of-spark-streaming-jobs-via-azure-service-bus","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/runtime-configuration-of-spark-streaming-jobs-via-azure-service-bus\/","title":{"rendered":"Runtime Configuration of Spark Streaming Jobs"},"content":{"rendered":"<h2>Background<\/h2>\n<p>Project Fortis\u00a0is a social data ingestion, analysis, and visualization platform. Originally developed by Microsoft in collaboration with the\u00a0<a href=\"https:\/\/www.unocha.org\/\">United Nations Office for the Coordination of Humanitarian Affairs<\/a>\u00a0(UN OCHA), Fortis provides planners and scientists with tools to gain insight from social media, public websites and custom data sources. We&#8217;ve explored previous work on Project Fortis in other code stories on this blog, such as\u00a0<a href=\"http:\/\/www.microsoft.com\/developerblog\/2017\/05\/10\/graphql-providing-context-into-global-crisiss-and-social-public-data-sources\/\">Project Fortis: Accelerating UN Humanitarian Aid Planning with GraphQL<\/a>\u00a0and\u00a0<a href=\"http:\/\/devblogs.microsoft.com\/cse\/2017\/11\/01\/building-a-custom-spark-connector-for-near-real-time-speech-to-text-transcription\/\">Building a Custom Spark Connector for Near Real-Time Speech-to-Text Transcription<\/a>.<\/p>\n<h2>Introduction<\/h2>\n<p>In Project Fortis, we use Spark Streaming to consume and process real-time news and events from a variety of sources, including social media outlets such as Twitter and Facebook. <strong>This article assumes familiarity with Apache\u00a0<a href=\"https:\/\/spark.apache.org\/docs\/latest\/streaming-programming-guide.html\">Spark Streaming application development<\/a>.\u00a0\u00a0<\/strong>As a part of this processing, we perform sentiment and geolocation analysis as well as entity extraction, ultimately aggregating this information into a Cassandra database, which powers a web service used to visualize it.<\/p>\n<p>One notable project requirement for Fortis is that many aspects of the processing done by Spark must be user-tunable; for example, our Spark application makes use of both a whitelist and a blacklist to filter out incoming events that are uninteresting to the user. These lists, among other settings, are maintained through an admin dashboard from within our web service.\u00a0 It&#8217;s important that the user is able to update these lists at any time.<\/p>\n<p>To satisfy this requirement of our work on Project Fortis, we needed to be able to communicate with Spark to reconfigure our data processing logic at runtime in response to configuration changes from an external service. In this code story, we&#8217;ll explore how we were able to achieve this goal using Spark transformations and Azure Service Bus.<\/p>\n<p><!--more--><\/p>\n<h2>Runtime Reconfiguration<\/h2>\n<p>One of Spark&#8217;s limitations is that it&#8217;s not possible to change an application&#8217;s\u00a0<code>DStream<\/code>\u00a0graph after the streaming context has been started. From an API perspective, this graph is comprised of all <code>DStream<\/code> transformation lineages which lead from an output operation (i.e.,<code>print<\/code><code>foreachRDD<\/code>, etc.) back to a source input stream (receiver, direct input stream, etc.). In other words, the hierarchy of the\u00a0<code>DStream<\/code>\u00a0graph defines the configuration and ordering of the stages which comprise an application&#8217;s data-processing pipeline(s).<\/p>\n<p>For example,\u00a0<code>myDStream.transform(...).map(...).filter(...).print()<\/code>defines a pipeline in which all <code>RDD<\/code>s arriving on the stream &#8220;<code>myDStream<\/code>&#8221; will be transformed by some function, mapped by another, filtered by yet another, and finally printed.<\/p>\n<p>While we cannot change pipeline stages or their ordering, we can use generic transformations and output actions that operate at the\u00a0<code>RDD<\/code>-level\u00a0 (namely <code>transform<\/code>\u00a0and <code>foreachRDD<\/code>, respectively) which give us a chance to run arbitrary code for each incoming <code>RDD<\/code>\u00a0(batch).<\/p>\n<p>Functions (callbacks) we provide to <code>foreachRDD<\/code>\u00a0and <code>transform<\/code>\u00a0are executed\u00a0within the driver process. They&#8217;re designed to build an execution plan for each <code>RDD<\/code>; they set up the\u00a0<code>RDD<\/code>-level transformations that will occur for each individual\u00a0<code>RDD<\/code>, and scheduling is done from the Spark driver. We were able to make use of this to satisfy our goal; before we process each <code>RDD<\/code>, we poll an external service for configuration changes and adjust the logic applied to the current and future\u00a0<code>RDD<\/code>s.<\/p>\n<h2>Designing our Approach<\/h2>\n<p>For a deeper understanding of the approach covered here, check out our supplemental blog post, <a href=\"https:\/\/medium.com\/@kevin_81668\/spark-streaming-transformations-a-deep-dive-b82787e53288\">Spark Streaming Transformations\u00a0: A Deep-dive<\/a>, which captures some non-obvious behaviors and limitations of Spark which we&#8217;d uncovered throughout its design.<\/p>\n<p>By writing some setup code to run on the driver for each incoming <code>RDD<\/code>, we can reference an external service and reactively change how we process data (i.e., in response to another service). Useful external services for this purpose include databases, web services, and messaging queue services like Azure Service Bus (more on this option below), but the options are of course limitless.<\/p>\n<p>For Project Fortis&#8217;s requirements, it made the most sense to place the code to do this within a <code>transform<\/code>\u00a0callback (a transformation). We chose this over <code>foreachRDD<\/code>\u00a0(an output operation) since it allowed us to modify our configuration the farthest upstream within our data pipeline. We were then able to use the resulting <code>DStream<\/code>\u00a0to supply data to completely separate downstream transformation sequences (data pipelines). For simpler applications with a single output operation, using <code>foreachRDD<\/code>\u00a0should suffice.<\/p>\n<p>When using\u00a0<code>transform<\/code>, however, work done in the callback must not be long-running (a Spark thread will invoke the callback). This limitation rules out network communication, making it more difficult to communicate with external services. To solve this, we used a worker\/background thread to fetch and prepare data from our external service, passing it off to the Spark driver thread(s) only once ready.<\/p>\n<p>However, this approach itself added additional complexity due to serialization concerns around Spark checkpointing. We needed to find a way to share the mutable state between the Spark thread executing our <code>transform<\/code>\u00a0callback and our background thread, which we&#8217;d started prior to the streaming context, near our application&#8217;s entry point. This problem is somewhat non-trivial given that the state of the <code>transform<\/code>\u00a0function is serialized by Spark during checkpointing, so upon recovery\/deserialization, no direct references from this code to or from objects we had control over in the old JVM process would exist.<\/p>\n<p>We found the most elegant solution to the above was to declare the mutable state through which these threads would communicate as static memory. In this way, the Spark thread(s) always refer to the current copy, as does startup code. This works because static fields are not serialized, and instead follow the lifetime of the process; our <code>transform<\/code>\u00a0callback would have a code reference to the shared state. The implication of this is that we needed to ensure that the state was initialized properly prior to starting the Spark streaming context, even after recovery from checkpointing. In practice, we could just add some more initialization code again at the entry point to our application both before running the background thread and (transitively) before we telling the streaming context to start.<\/p>\n<h2>Azure Service Bus<\/h2>\n<p>When designing Project Fortis, we chose to use Azure Service Bus as the external service from which we&#8217;d receive runtime configuration changes. Azure Service Bus is a highly available reliable message queue service which can be used to achieve exactly-once semantics for messages, which is what we wanted for our use cases. Furthermore, its API supports long-polling, allowing our Spark application to listen on a socket under the hood in order to begin processing configuration changes immediately.<\/p>\n<p>To best illustrate the concepts of the approach we took with Project Fortis, we&#8217;ve created\u00a0a bare-bones <a href=\"http:\/\/github.com\/kevinhartman\/streaming-conf-sb\">sample project<\/a>. In the sections below, we&#8217;ll explore its design. While the sample is based on Azure Service Bus, the approach it illustrates can be easily extended for use with other external services.<\/p>\n<h3>Sample Project<\/h3>\n<p>The <a href=\"http:\/\/github.com\/kevinhartman\/streaming-conf-sb\">sample project<\/a> consists of a simple Spark application (with checkpointing enabled) which processes a self-generated sample stream of string\u00a0<code>RDD<\/code>s. The logic of the Spark application prefixes each string in the stream with data from a configuration object, which is updated in real-time using Azure Service Bus. For each <code>RDD<\/code>\u00a0(batch) in the stream, the contents are printed to the console (batch interval is <code>5 seconds<\/code>). By sending new configuration objects to the Service Bus Queue on which the application is configured to listen, the output of each batch is affected.<\/p>\n<p>The video below shows a demonstration of the semantics achievable using the sample project.<\/p>\n<p><iframe title=\"Spark Service Bus Demo\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/tBA6MnyQqCU?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<h4>Video Steps<\/h4>\n<ol>\n<li>Start the Spark application.<\/li>\n<li>Observe the console output.<\/li>\n<li>Send an updated config JSON object using <a href=\"http:\/\/github.com\/paolosalvatori\/ServiceBusExplorer\/releases\">Service Bus Explorer<\/a> to change the computation of future batches.<\/li>\n<li>Observe the console output has changed in response.<\/li>\n<li>Repeat a few times with different data.<\/li>\n<\/ol>\n<h4>Project Structure<\/h4>\n<h5>build.sbt<\/h5>\n<p>In order to integrate our Spark Streaming application with Azure Service Bus, we first needed to import its client library by adding a project reference to the latest version (at the time of writing, 1.1.0) in our <code>build.sbt<\/code>:<\/p>\n<pre class=\"lang:scala decode:true\">\/\/ https:\/\/mvnrepository.com\/artifact\/com.microsoft.azure\/azure-servicebus\r\nlibraryDependencies += \"com.microsoft.azure\" % \"azure-servicebus\" % \"1.1.0\"<\/pre>\n<h5>Main.scala<\/h5>\n<p>Next, let&#8217;s look at the entry-point to the application: the\u00a0<code>Main<\/code> object. It&#8217;s here that we need to set up the client to listen for new messages on our Azure Service Bus queue, configure our data stream, and start the streaming context.<\/p>\n<p>The first interesting section initializes <code>ConfigManagerSingleton<\/code>\u00a0with a configuration object. This is the code mentioned above which will initialize our static shared mutable state on program start.<\/p>\n<pre class=\"lang:scala decode:true\">\/\/ Initialize ConfigManagerSingleton, which is used to pass updated configs from Service Bus to Spark\r\nConfigManagerSingleton.init(\r\n  initialConfig = Config(\"Hello \")\r\n)<\/pre>\n<p>The <code>ConfigManagerSingleton<\/code>\u00a0will be covered in more detail in a section below. For now, it&#8217;s important to know that the above\u00a0<code>Config<\/code> will be available to our Spark logic until it is replaced by a new one arriving on our Service Bus queue. In a production application, you may want to consider storing initial configuration values in a database, loading them as above (we did this in Project Fortis).<\/p>\n<p>Next, we configure the Service Bus client library (<code>QueueClient<\/code>) to listen for messages.<\/p>\n<pre class=\"lang:scala decode:true\">\/\/ Initialize connection to Service Bus\r\nval queueClient = new QueueClient(\r\n  new ConnectionStringBuilder(busConnStr, busQueueName),\r\n  ReceiveMode.PEEKLOCK\r\n)<\/pre>\n<p>We use the <code>PEEKLOCK<\/code>\u00a0receive mode, which avoids removing incoming messages from the queue until we&#8217;ve explicitly acknowledged them as processed. This guarantees that we&#8217;ll have had a chance to handle each message at least once. Next, we will:<\/p>\n<ul>\n<li>register a callback that will be invoked for each incoming message on a Service Bus client library worker thread<\/li>\n<li>register a callback invoked by Spark to set up our stream (<code>createStreamingContext<\/code>)<\/li>\n<li>start the streaming context<\/li>\n<\/ul>\n<pre class=\"lang:scala decode:true\">try {\r\n  \/\/ Register config manager as handler\r\n  queueClient.registerMessageHandler(ConfigManagerSingleton,\r\n    new MessageHandlerOptions(\r\n      1, \/\/ Max concurrent calls\r\n      true,\r\n      Duration.ofMinutes(5)\r\n    )\r\n  )\r\n\r\n  \/\/ Get StreamingContext from checkpoint data or create a new one\r\n  val context = StreamingContext.getOrCreate(checkpointDir, createStreamingContext)\r\n\r\n  \/\/ Start the context\r\n  context.start()\r\n  context.awaitTermination()\r\n} finally {\r\n  queueClient.close()\r\n}<\/pre>\n<p>Note the values passed to the <code>MessageHandlerOptions<\/code>\u00a0constructor. The first specifies to the client library that we do <strong>not<\/strong> want our callback to be invoked concurrently. The second indicates that we want the client library to automatically mark messages as handled if the callback executes fully. The last indicates that the max auto renew duration is 5 minutes (which is the default, but we need to pass it to satisfy this constructor&#8217;s argument list).<\/p>\n<p>The\u00a0<code>createStreamingContext<\/code>\u00a0callback is executed by Spark on startup if a checkpoint does not exist:<\/p>\n<pre class=\"nums-toggle:false lang:scala decode:true\">private def createStreamingContext(): StreamingContext = {\r\n  ...\r\n\r\n  \/\/ Create a static stream with a few hundred RDDs, each containing a set of test strings.\r\n  val stream = ExampleStream(ssc)\r\n\r\n  val transformedStream = stream.transform(rdd =&gt; {\r\n    \/\/ The body of transform will execute on the driver for each RDD.\r\n    \/\/ It's here that we can fetch the latest config from the Service Bus thread.\r\n    val config = ConfigManagerSingleton.get()\r\n\r\n    rdd.map(value =&gt; config.prefix + value)\r\n  })\r\n\r\n  ...\r\n}<\/pre>\n<p>The <code>ExampleStream<\/code>\u00a0class creates a <code>DStream<\/code>\u00a0of sample data. We won&#8217;t cover it here, but <a href=\"http:\/\/github.com\/kevinhartman\/streaming-conf-sb\/blob\/master\/src\/main\/scala\/mn\/hart\/ExampleStream.scala\">its implementation<\/a> can be found in the project&#8217;s repo.<\/p>\n<p>We use the <code>transform<\/code>\u00a0function of <code>DStream<\/code>\u00a0to register a callback which is invoked for each <code>RDD<\/code>\u00a0arriving on the stream. At the start of this callback, we get the latest <code>Config<\/code>\u00a0instance from the Service Bus thread (more on this in the <code>ConfigManagerSingleton<\/code>\u00a0section), and then define an <code>RDD<\/code>\u00a0transformation for the current <code>RDD<\/code>\u00a0which prefixes each of its elements\u00a0with the value of\u00a0<code>config.prefix<\/code>. Recall that the\u00a0<code>map<\/code>\u00a0function applied to the <code>RDD<\/code>\u00a0will be executed in parallel on worker threads, while <code>ConfigManagerSingleton.get()<\/code>\u00a0will be executed on the driver.<\/p>\n<h5>ConfigManagerSingleton.scala<\/h5>\n<p>The static\u00a0<code>ConfigManagerSingleton<\/code>\u00a0object encapsulates the shared mutable state which we&#8217;ll use to pass new configuration changes (instances of <code>Config<\/code>\u00a0sent over Service Bus) to our Spark code. It consists of:<\/p>\n<ul>\n<li>fields for mutable state<\/li>\n<li>a state initialization function<\/li>\n<li>an update function called by a Service Bus thread when a configuration change occurs<\/li>\n<li>a state accessor function called by our Spark driver-side code<\/li>\n<\/ul>\n<p>Our state includes:<\/p>\n<table class=\" alignleft\">\n<tbody>\n<tr>\n<td>Field<\/td>\n<td>Purpose<\/td>\n<\/tr>\n<tr>\n<td><code>nextConfig<\/code><\/td>\n<td>Used to synchronously hand off new <code>Config<\/code>\u00a0instances from our Service Bus worker thread to Spark threads. The Service Bus worker thread will synchronously offer a new config to the Spark threads, blocking until one of them takes the config.<\/td>\n<\/tr>\n<tr>\n<td><code>config<\/code><\/td>\n<td>Holds the current\u00a0<code>Config<\/code>, which will be updated by a Spark thread upon a successful handoff. Its initial value must be set manually from within the startup code, using the <code>init<\/code>\u00a0function shown below.<\/p>\n<p>Note that this field is volatile since it declares a variable reference which will be updated and read by different threads.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Here it is seen in the code:<\/p>\n<pre class=\"lang:scala decode:true\">private val nextConfig: SynchronousQueue[Config] = new SynchronousQueue[Config]()\r\n\r\n@volatile private var config: Config = _\r\ndef init(initialConfig: Config): Unit = {\r\n  config = initialConfig\r\n}<\/pre>\n<p>When a new <code>Config<\/code>\u00a0arrives on the Service Bus queue, a Service Bus thread will invoke the following update callback:<\/p>\n<pre class=\"lang:scala decode:true\">override def onMessageAsync(message: IMessage): CompletableFuture[Void] = {\r\n  implicit val _ = net.liftweb.json.DefaultFormats\r\n\r\n  val messageStr = new String(message.getBody, \"UTF-8\")\r\n  val json = parse(messageStr)\r\n  val config = json.extractOpt[Config]\r\n\r\n  if (config.isDefined) {\r\n    \/\/ Block for up to two minutes for a Spark thread to acknowledge the updated\r\n    \/\/ state.\r\n    if (!nextConfig.offer(config.get, 2, TimeUnit.MINUTES)) {\r\n      throw new Exception(\"No Spark thread acknowledged the update message within the timeout.\")\r\n    }\r\n  }\r\n\r\n  CompletableFuture.completedFuture(null)\r\n}<\/pre>\n<p>The body of the Service Bus message (JSON) is parsed for a\u00a0<code>Config<\/code>\u00a0instance.<\/p>\n<p>The\u00a0<code>Config<\/code>\u00a0is passed to a Spark thread by offering it on our <code>SynchronousQueue<\/code>. The <code>offer<\/code>\u00a0call will block until either a Spark thread has received the <code>Config<\/code>\u00a0or until the timeout has been exceeded, at which point an exception is thrown to let the Service Bus client library know that the message was not acknowledged. If the <code>offer<\/code>\u00a0succeeds, a Spark thread has received the new <code>Config<\/code>.<\/p>\n<p>The success of the\u00a0<code>offer<\/code>\u00a0call only guarantees that a consuming Spark thread got possession of the <code>Config<\/code>. This should be sufficient for our purposes (it&#8217;s unimportant if the corresponding batch is processed successfully using this exact config, assuming we initialize <code>ConfigManagerSingleton<\/code>\u00a0with the latest config on (re)startup).<\/p>\n<p>On arrival of each batch, Spark threads invoke our <code>transform<\/code>\u00a0callback, which accesses <code>ConfigManagerSingleton<\/code>&#8216;s shared state via the following function:<\/p>\n<pre class=\"lang:scala decode:true\">def get(): Config = this.synchronized {\r\n  \/\/ Grab the next config from the service bus client thread if one is ready. Else, return the current config.\r\n  val value = Option(nextConfig.poll(0, TimeUnit.SECONDS))\r\n\r\n  if (value.isDefined) {\r\n    config = value.get\r\n  }\r\n\r\n  config\r\n}<\/pre>\n<p>This function runs quickly since it does not wait for the Service Bus client thread to offer a <code>Config<\/code>; if a new <code>Config<\/code>\u00a0is not available <em>right now<\/em>, then the old one is returned instead, which satisfies the requirement that our <code>transform<\/code>\u00a0function&#8217;s callback must not be long-running.<\/p>\n<p>Note that the function itself is a critical section to avoid interleavings of calling (Spark) threads which could otherwise cause the value of <code>ConfigManagerSingleton.config<\/code> to become stale. This is only relevant when <code>spark.streaming.concurrentJobs<\/code>\u00a0is enabled, but even then shouldn&#8217;t cause any Spark thread to wait too long since the critical section itself does not block.<\/p>\n<h2>Notes and Caveats<\/h2>\n<p>For many applications (Fortis included), more work may be required to prepare configuration data\u00a0for use by Spark threads. In that case, this work should be done on the (Service Bus) worker thread.<\/p>\n<p>In Fortis, the filter word lists we send along with our configuration data\u00a0are large. To avoid serializing them with each Spark task, we broadcasted them somewhere in our equivalent of <code>ConfigManagerSingleton.get<\/code>\u00a0(and hence on a Spark driver thread). Broadcasting is not especially expensive since it&#8217;s implemented lazily (Spark executors reach back to the driver when referencing a broadcast), so this should be fine to do within <code>transform<\/code>. While it may be tempting to do this from the (Service Bus) worker thread, Spark threads appear to have a custom thread-local state which I expect may be required for a successful broadcast.<\/p>\n<p>For complex applications, it may be necessary to think carefully about the interplay between configuration changes and data checkpointing and persistence. For example, if an <code>RDD<\/code>\u00a0processed with configuration <code>A<\/code>\u00a0were persisted, a new configuration <code>B<\/code> applied at a later point in time would not affect that data. As another example, consider a pipeline defined as follows, where <code>upstream<\/code> and <code>downstream<\/code> configurations control separate data processing settings:<code>myDStream.transform(&lt;upstream config&gt;).checkpoint().transform(&lt;downstream config&gt;)<\/code>If recovering from checkpoint, the effects of the upstream config may be much older than those of the downstream config, since its resulting <code>RDD<\/code>\u00a0will have been recovered from disk rather than recomputed.<\/p>\n<h3>Sample Project<\/h3>\n<p>The sample project&#8217;s <code>ExampleStream<\/code>\u00a0is backed by Spark&#8217;s\u00a0<code>QueueStream<\/code>, which does not support checkpointing. Consequently, recovering the sample application from a checkpoint will fail. Checkpointing is included to demonstrate how the approach taken here can be correctly integrated into a production scenario in which checkpointing is enabled. Before running the sample, ensure the specified checkpoint folder is emptied.<\/p>\n<p>Since the Service Bus thread synchronously and serially passes <code>Config<\/code>s to Spark, each <code>Config<\/code>\u00a0in the Service Bus queue will be used for the configuration of at least one <code>RDD<\/code>, even if a newer one exists subsequently in the queue. For some applications, it may be desirable to skip older <code>Config<\/code>s which are waiting in the queue until no newer ones are present. In this case, it should be possible to manually poll additional <code>Config<\/code>s from the queue within the <code>onMessageAsync<\/code>\u00a0callback until no more can be found, passing only the latest one to Spark.<\/p>\n<p>Both the sample application as well as Fortis use the Service Bus client library&#8217;s\u00a0<code>QueueClient<\/code>\u00a0to take advantage of the library&#8217;s built-in message pump loop\/thread pool. However, using this client does not necessarily guarantee that messages will be delivered in the order in which they are submitted to the Service Bus queue (though for our human\/UI-triggered use-cases, ordering appeared to be temporally sequential), since it does not support message sessions. For applications which require ordering guarantees, enable <a href=\"http:\/\/docs.microsoft.com\/en-us\/azure\/service-bus-messaging\/message-sessions\">Message Sessions<\/a> for the queue. The drawback to this approach is of course that the <code>QueueClient<\/code>\u00a0message loop cannot be used, and no equivalent exists today within the Service Bus Java client library (though an <a href=\"http:\/\/github.com\/Azure\/azure-service-bus-java\/issues\/121\">issue captures this feature request on GitHub<\/a>).<\/p>\n<h2>Epilogue<\/h2>\n<p>Throughout this article, we\u2019ve covered the approach we took to enable runtime reconfiguration of the Spark Streaming data pipeline used in <a href=\"http:\/\/github.com\/CatalystCode\/project-fortis\">Project Fortis<\/a>. Using Azure Service Bus, we were able to communicate with our Spark application from Fortis&#8217;s admin UI, allowing administrators of Fortis deployments to control settings such as information filtering, geofences, and more without code modification and without application restarts (zero downtime). Developers should be able to extend the principles employed by the provided sample project and <a href=\"https:\/\/medium.com\/@kevin_81668\/spark-streaming-transformations-a-deep-dive-b82787e53288\">technical deep-dive<\/a>\u00a0to work with other external services for applications which cannot feasibly use Azure Service Bus.<\/p>\n<p>Feel free to reach out in the comments section below with questions and feedback!<\/p>\n<h2>Resources<\/h2>\n<ul>\n<li><a href=\"http:\/\/github.com\/CatalystCode\/project-fortis\">Project Fortis on GitHub<\/a><\/li>\n<li><a href=\"https:\/\/medium.com\/@kevin_81668\/spark-streaming-transformations-a-deep-dive-b82787e53288\">Spark Streaming Transformations: A Deep-dive<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/kevinhartman\/streaming-conf-sb\/\">Sample Project<\/a><\/li>\n<li><a href=\"http:\/\/github.com\/CatalystCode\/project-fortis\/blob\/master\/project-fortis-spark\/src\/main\/scala\/com\/microsoft\/partnercatalyst\/fortis\/spark\/Pipeline.scala\">Project Fortis&#8217;s Implementation<\/a><\/li>\n<li><a href=\"http:\/\/github.com\/paolosalvatori\/ServiceBusExplorer\/releases\">Service Bus Explorer Download<\/a><\/li>\n<li><a href=\"https:\/\/spark.apache.org\/docs\/latest\/streaming-programming-guide.html\">Spark Streaming Programming Guide<\/a><\/li>\n<li><a href=\"http:\/\/bahir.apache.org\/\">Apache Bahir<\/a><\/li>\n<li><a href=\"https:\/\/www.slideshare.net\/spark-project\/deep-divewithsparkstreaming-tathagatadassparkmeetup20130617\">Spark Deepdive Meetup Slides<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>We achieved zero-downtime reconfiguration and management of the Spark Streaming job used in Project Fortis with Azure Service Bus.<\/p>\n","protected":false},"author":21392,"featured_media":13042,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[11],"tags":[93,320,333,334],"class_list":["post-5915","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-big-data","tag-azure-service-bus","tag-scala","tag-spark","tag-spark-streaming"],"acf":[],"blog_post_summary":"<p>We achieved zero-downtime reconfiguration and management of the Spark Streaming job used in Project Fortis with Azure Service Bus.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/5915","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/21392"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=5915"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/5915\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/13042"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=5915"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=5915"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=5915"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}