{"id":9429,"date":"2018-12-12T13:45:51","date_gmt":"2018-12-12T21:45:51","guid":{"rendered":"https:\/\/www.microsoft.com\/developerblog\/?p=9429"},"modified":"2020-03-14T14:11:51","modified_gmt":"2020-03-14T21:11:51","slug":"databricks-ci-cd-pipeline-using-travis","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/databricks-ci-cd-pipeline-using-travis\/","title":{"rendered":"Social Stream Pipeline on Databricks with auto-scaling and CI\/CD using Travis"},"content":{"rendered":"<h2>Background<\/h2>\n<p>For the tech companies designing tomorrow\u2019s smart cities, making local authorities able to collect and analyze large quantities of data from many different sources and mediums\u00a0is critical. Data can come from different sources \u2013 from posts on social media and data automatically collected from IoT devices, to information submitted by citizens on a range of different channels.<\/p>\n<p>To consolidate a continuous stream from this myriad of sources, these companies need an infrastructure that is strong enough to support the load. But, they also require a flexible infrastructure that offers the right tools, has the ability to automatically scale up or down, and which offers an environment that is dynamic enough to support quick changes to model processing and data\u00a0scoring.<\/p>\n<p>One such company, <a href=\"https:\/\/zencity.io\/\">ZenCity,<\/a>\u00a0is dedicated to making cities smarter by processing social, IoT and LOB data to identify and aggregate exceptional trends. In June\u00a02018, ZenCity approached CSE to partner in building a pipeline that could analyze a varying array of data sources, scale according to need, and potentially scale separately for specific customers. At the outset of our collaboration with ZenCity, our team evaluated ZenCity&#8217;s existing infrastructure, which consisted of manually managed VMs that were proving difficult to maintain and support given the startup&#8217;s rapidly growing customer base. It was very important to\u00a0understand ZenCity&#8217;s needs and to try and predict how those needs would evolve in the near future as the company grows.<\/p>\n<p><span class=\"annotation\" style=\"background-color: #f0e465\" data-annotation=\"Why is this section in italics?\" data-author=\"Peter Cornell Andringa\">Our primary role<\/span> in the collaboration was to investigate\u00a0<a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/databricks\/\">Azure Databricks<\/a>\u00a0and other streaming alternatives that might meet their requirements and to recommend the best approach from a technical standpoint. However, we also aimed to integrate the systems with other Azure services and online OSS libraries that could support the sort of pipeline ZenCity needed.<\/p>\n<p>As the project progressed, our team discovered that there are very few online open source examples that demonstrate building a CI\/CD pipeline for a Spark-based solution. And, to the best of our knowledge, none of the examples demonstrated a Databricks-based solution that can utilize its rich features. As a result, we decided to provide a CI\/CD sample and\u00a0address the challenges of continuous integration and continuous deployment for a Databricks-based solution.<\/p>\n<p>This code story describes the challenges, solutions and technical details of the approach we decided to take.<\/p>\n<h2><span class=\"annotation\" style=\"background-color: #ded145\" data-annotation=\"see editorial comments \" data-author=\"Craig Corbett\">Challenges<\/span><\/h2>\n<p>In searching out the most suitable solution for ZenCity, we faced the following challenges:<\/p>\n<ul>\n<li>Finding an Event Stream Processing solution that was capable of near real-time processing of events coming in from social networks, LOB (line of business) systems, IoT devices, etc.<\/li>\n<li>Building a solution that scales and could support a growing market of customers<\/li>\n<li>Constructing a CI\/CD pipeline around the solution that supports several environments (e.g., development, staging and production)<\/li>\n<\/ul>\n<h2>Solution<\/h2>\n<p>For simplicity, the architecture diagram below describes a single workload, chosen as an example from several somewhat similar ones that we focused on.\nThis diagram depicts the processing of tweets from a Twitter feed and analyzing them.<\/p>\n<h3>Architecture<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2018\/12\/ci-cd-pipeline-cloud-architecture.png\" alt=\"Image ci cd pipeline cloud architecture\" width=\"1069\" height=\"584\" class=\"aligncenter size-full wp-image-10617\" srcset=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2018\/12\/ci-cd-pipeline-cloud-architecture.png 1069w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2018\/12\/ci-cd-pipeline-cloud-architecture-300x164.png 300w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2018\/12\/ci-cd-pipeline-cloud-architecture-1024x559.png 1024w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2018\/12\/ci-cd-pipeline-cloud-architecture-768x420.png 768w\" sizes=\"(max-width: 1069px) 100vw, 1069px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>The above architecture uses Databricks notebooks (written in <a href=\"https:\/\/www.scala-lang.org\/\">scala<\/a>) and <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/event-hubs\/\">Event Hubs<\/a>\u00a0to distinguish between computational blocks and enable scalability in a smart way.<\/p>\n<p>The pipeline works as a stream that flows in the following manner:<\/p>\n<ul>\n<li>Ingest tweets and push them into the pipeline for processing<\/li>\n<li>Each tweet is enriched with Language and an associated Topic<\/li>\n<li>From here the stream diverges into 3 parallel parts:\n<ul>\n<li>Each enriched tweet is saved in a table on an Azure-based SQL Database<\/li>\n<li>A model meant to run on an active window frame and scan the last 10 minutes for topic anomalies<\/li>\n<li>Once a day, the entire batch of tweets is processed for topic anomalies<\/li>\n<\/ul>\n<\/li>\n<li>Each time an anomaly is detected, it is passed to a function app that sends an email describing the anomaly<\/li>\n<\/ul>\n<h3>Databricks<\/h3>\n<p>Databricks\u00a0is a management layer on top of Spark that exposes a rich UI with a scaling mechanism (including REST API and cli tool) and a simplified development process. We chose Databricks specifically because it enables us to:<\/p>\n<ul>\n<li>Create clusters that automatically scale up and down<\/li>\n<li>Schedule jobs to run periodically<\/li>\n<li>Co-edit notebooks (*)<\/li>\n<li>Run Scala notebooks interactively and see results interactively<\/li>\n<li>Integrate with GitHub<\/li>\n<\/ul>\n<p>The option to create clusters on demand can also potentially enable a separate execution environment for a specific customer that scales according to their individual need.<\/p>\n<blockquote><p>* The idea of notebooks used in Databricks, is borrowed from <a href=\"http:\/\/jupyter.org\/try\">Jupyter Notebooks<\/a>\u00a0and meant to provide an easy interface to manipulate queries and interact with data during development.<\/p><\/blockquote>\n<h2>Databricks Deployment Scripts<\/h2>\n<p>In order to create a maintainable solution that supports CI\/CD and manual deployment with ease, it was necessary to have a suite of scripts that could support granular actions like <strong>&#8220;deploy resources to azure&#8221;<\/strong>\u00a0or\u00a0<strong>&#8220;upload environment secrets to Databricks.&#8221;<\/strong>\u00a0While almost all of the actions are achievable using azure-cli, Databricks-cli, or other libraries needed to deploy, build and test such a solution, it is essential to be able to call on those actions quickly when developing the solution and wanting to check changes. More importantly, it is critical for supporting a CI\/CD pipeline that doesn&#8217;t require any manual interaction.<\/p>\n<p>To aggregate all scripts\/actions into a manageable and coherent collection of commands, we built upon <a href=\"https:\/\/github.com\/devlace\/azure-databricks-recommendation\">Lace Lofranco&#8217;s work<\/a> and used <a href=\"https:\/\/www.tutorialspoint.com\/unix_commands\/make.htm\">make<\/a>\u00a0which can be run locally from a Linux terminal or on Travis.<\/p>\n<p>Using <strong>make<\/strong>, the entire solution can be deployed by running\u00a0<span class=\"lang:sh decode:true crayon-inline\">make deploy<\/span>\u00a0, while providing (according to prompt) the appropriate parameters for\u00a0<strong>the resource group name<\/strong>, <strong>region<\/strong> and <strong>subscription id<\/strong>.<\/p>\n<p>The Makefile deployment script, runs a collection of script that uses <a href=\"https:\/\/docs.microsoft.com\/en-us\/cli\/azure\/install-azure-cli?view=azure-cli-latest\">azure-cli<\/a>, <a href=\"https:\/\/docs.azuredatabricks.net\/user-guide\/dev-tools\/index.html\">databricks-cli<\/a>\u00a0and Python scripts.<\/p>\n<h3><span class=\"annotation\" style=\"background-color: #f0e465\" data-annotation=\"This is a bit of a confusing subheading to have at the end of the article\" data-author=\"Craig Corbett\">Getting Started<\/span><\/h3>\n<p>Using our <a href=\"https:\/\/github.com\/Azure-Samples\/twitter-databricks-topic-extractor\">sample project on GitHub,<\/a>\u00a0you can run deployment on Azure from your local environment and follow the prompt for any details.<\/p>\n<p style=\"direction: ltr\">Running the <span class=\"lang:default decode:true crayon-inline \">make deploy<\/span>\u00a0 command can also be done with all parameters, but would still require a prompt for a token from Databricks by using the command:<\/p>\n<pre class=\"lang:default decode:true\">make deploy resource-group-name=sample-social-rg region=westeurope subscription-id=5b86ec85-0709-4021-b73c-7a089d413ff0<\/pre>\n<p>To set up a test environment, follow the <a href=\"https:\/\/github.com\/morsh\/social-posts-pipeline#integration-tests\">Integration Tests<\/a>\u00a0section on the README file.<\/p>\n<h3>Deploying the ARM template<\/h3>\n<p>The ARM template enabled us to deploy all the resources in the solution in a single resource group, while associating the keys and secrets between them. Using ARM deployment, all resources, except for Databricks, could also be configured, and the various secrets and keys could be configured quickly.<\/p>\n<p>We also used the <strong>ARM\u00a0output<\/strong> feature to export all keys and secrets into a separate\u00a0<strong>.env<\/strong> file which could later be used to configure Databricks.<\/p>\n<h6>deploy\/deploy.sh [Shell]<\/h6>\n<pre title=\"deploy\/deploy.sh\" class=\"lang:sh decode:true\"># Deploying an ARM template with all resources to Azure\r\narm_output=$(az group deployment create \\\r\n    --name \"$deploy_name\" \\\r\n    --resource-group \"$rg_name\" \\\r\n    --template-file \".\/azuredeploy.json\" \\\r\n    --parameters @\".\/azuredeploy.parameters.json\" \\\r\n    --output json)\r\n\r\n# Extracting deployment output parameters\r\nstorage_account_key=$(az storage account keys list \\\r\n    --account-name $storage_account \\\r\n    --resource-group $rg_name \\\r\n    --output json |\r\n    jq -r '.[0].value')\r\n\r\n# Dumping the secrets into a local .env file\r\necho \"BLOB_STORAGE_KEY=${storage_account_key}\" &gt;&gt; $env_file<\/pre>\n<div>\n<h3>Configuring Databricks Remotely<\/h3>\n<p>To configure Databricks, we used\u00a0<strong>databricks-cli<\/strong>, which is a command line interface tool designed to provide easy remote access to Databricks and most of the API it offers.<\/p>\n<p>The first script uploads all the relevant secrets into the Databricks environment, making them available to all clusters that will be created in it. The second script configures the libraries, clusters, and jobs that are required to run as part of the pipeline.<\/p>\n<h6>deploy\/databricks\/create_secrets.py [Python]<\/h6>\n<div>\n<pre title=\"deploy\/databricks\/create_secrets.py\" class=\"lang:python decode:true\"># Using Databricks REST API to create a secret for every environment variable in the .env file\r\napi_url = \"https:\/\/\" + dbi_domain + \"\/api\/2.0\/\"\r\nscope = \"storage_scope\"\r\n\r\nr = requests.post(api_url + 'preview\/secret\/secrets\/write',\r\n                  headers={\"Authorization\": \"Bearer \" + token},\r\n                  json={\"scope\": scope, \"key\": secret_name, \"string_value\": secret_value\r\n                        })\r\nresponse_body = r.json()\r\nif r.status_code != 200:\r\n    raise Exception('Error creating scope: ' + json.dumps(response_body))\r\nreturn (response_body)<\/pre>\n<div>\n<h6>deploy\/databricks\/configure.sh [Shell]<\/h6>\n<pre class=\"lang:default decode:true\"># Configure databricks cli profile to use the user generated token\r\n&gt; ~\/.databrickscfg\r\necho \"[DEFAULT]\" &gt;&gt; ~\/.databrickscfg\r\necho \"host = $DATABRICKS_URL\" &gt;&gt; ~\/.databrickscfg\r\necho \"token = $DATABRICKS_ACCESS_TOKEN\" &gt;&gt; ~\/.databrickscfg\r\necho \"\" &gt;&gt; ~\/.databrickscfg\r\n\r\n# Create + Start a cluster to use for library deployments\r\ndatabricks clusters create --json-file \".\/config\/cluster.config.json\"\r\ndatabricks clusters start --cluster-id $cluster_id\r\n\r\n# installing libraries on Databricks\r\ndatabricks libraries install --maven-coordinates com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.1 --cluster-id $cluster_id\r\ndatabricks libraries install --maven-coordinates org.apache.bahir:spark-streaming-twitter_2.11:2.2.0 --cluster-id $cluster_id\r\ndatabricks libraries install --maven-coordinates org.json4s:json4s-native_2.11:3.5.4 --cluster-id $cluster_id\r\n\r\n# Uploading build artifacts from Java project (containing twitter wrapper and test validator)\r\nblob_file_name=\"social-source-wrapper-1.0-SNAPSHOT.jar\"\r\nblob_local_path=\"src\/social-source-wrapper\/target\/$blob_file_name\"\r\nblob_dbfs_path=\"dbfs:\/mnt\/jars\/$blob_file_name\"\r\n\r\ndatabricks fs cp --overwrite \"$blob_local_path\" \"$blob_dbfs_path\"\r\ndatabricks libraries install --cluster-id $cluster_id --jar \"$blob_dbfs_path\"\r\n\r\n# Uploading notebooks to Databricks\r\ndatabricks workspace import_dir \"..\/..\/notebooks\" \"\/notebooks\" --overwrite\r\n\r\n# Executing the jobs according to configuration paths\r\nfor filePath in $(ls -v $PWD\/config\/run.*.config.json); do\r\n    declare jobjson=$(cat \"$filePath\")\r\n    databricks runs submit --json \"$jobjson\"\r\ndone<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<p>To make sure the notebooks run with test configuration, it is important to execute the notebooks with a declared parameter, setting the social source to &#8220;Custom&#8221;:<\/p>\n<pre class=\"lang:sh decode:true \">jobjson=$(echo \"$jobjson\" | \\\r\n          jq '.notebook_task.base_parameters |= \\\r\n          { \"socialSource\": \"CUSTOM\" }')<\/pre>\n<p>This line adds a parameter to the job execution of each notebook. The notebooks which are affected by this parameter will change their functionality to run with mock data.<\/p>\n<h3>Cleanup<\/h3>\n<p>When running on a test environment, it was necessary to remove the excess resources once the test had completed its execution. In a full scenario, we&#8217;d be able to completely delete the ARM resource group and re-create it in the next execution. But, because there&#8217;s currently no API for creating a token for Databricks, it was necessary to generate the token manually. This meant we were not able to delete the resources between executions and it was important to keep the Databricks resource (although not its clusters) alive between test runs. Because this was the case, we needed to make sure all jobs we initiated were terminated so that they didn&#8217;t continue to consume resources &#8211; the cleanup script would <strong>only<\/strong> be run\u00a0in a test deployment, after a successful \/ failed test run.<\/p>\n<h6>deploy\/databricks\/cleanup.sh [Shell]<\/h6>\n<pre title=\"deploy\/databricks\/cleanup.sh\" class=\"lang:sh decode:true\"># Stopping all active jobs deployed in the databricks deployment\r\nfor filePath in $(ls -v $PWD\/config\/run.*.config.json); do\r\n    for jn in \"$(cat \"$filePath\" | jq -r \".run_name\")\"; do\r\n        declare runids=$(databricks runs list --active-only --output JSON | \\\r\n                         jq -c \".runs \/\/ []\" | \\\r\n                         jq -c \"[.[] | \\\r\n                         select(.run_name == \\\"$jn\\\")]\" | \\\r\n                         jq .[].run_id)\r\n        for id in $runids; do\r\n            databricks runs cancel --run-id $id\r\n        done\r\n    done\r\ndone\r\n<\/pre>\n<h3>Java Packages and Build<\/h3>\n<p>The build stage is used to build the Java packages that are uploaded and used in the job execution on Databricks.<\/p>\n<p>The following test is run by Travis-CI to connect to Azure Event Hubs and listen on the last event hub in the pipeline, the event hub receiving the alerts.\u00a0If a new alert is identified, it will exit the test process successfully. Otherwise, it will fail the test.<\/p>\n<h6>src\/integration-tests\/src\/main\/java\/com\/microsoft\/azure\/eventhubs\/checkstatus\/ReceiveByDateTime.java [Java]<\/h6>\n<pre title=\"src\/integration-tests\/src\/main\/java\/com\/microsoft\/azure\/eventhubs\/checkstatus\/ReceiveByDateTime.java\" class=\"lang:java decode:true\">package com.microsoft.azure.eventhubs.checkstatus;\r\n\r\n\/\/ Creating an Even Hub events receiver\r\nfinal EventHubClient ehClient = \r\n                    EventHubClient.createSync(connStr.toString(), executorService);\r\nfinal PartitionReceiver receiver = ehClient.createEpochReceiverSync(\r\n                    EventHubClient.DEFAULT_CONSUMER_GROUP_NAME, partitionId,\r\n                    EventPosition.fromEnqueuedTime(Instant.EPOCH), 2345);\r\n\r\nfinal LocalDateTime checkupStartTime = LocalDateTime.now();\r\n\r\n\/\/ Making sure 15 minutes haven't passed since the test started\r\nwhile (LocalDateTime.now().minusMinutes(15).isBefore(checkupStartTime)) {\r\n    receiver.receive(100).thenAcceptAsync(receivedEvents -&gt; {\r\n\r\n        \/\/ After parsing each event, check event time is after start of execution\r\n        if (eventDateTime.isAfter(startTime)) {\r\n            System.out.println(\"Found a processed alert: \" + dataString);\r\n            System.exit(0);\r\n        }\r\n    }, executorService).get();\r\n}\r\n\r\n\/\/ If no event was found for 15 minutes, fail the process\r\nSystem.exit(1);<\/pre>\n<h3>Databricks Continuous Integration Using Travis<\/h3>\n<p><a href=\"https:\/\/travis-ci.org\">Travis-CI<\/a> is a great tool for continuous integration, listening to GitHub changes, and running the appropriate deployment scripts.<\/p>\n<p>In this project, we used Travis to listen to any change in the master branch and execute a test deployment. This configuration can also be changed to run once a day if you think that every change should\u00a0cause a build.<\/p>\n<p>All the configuration of Azure and Databricks can currently be done remotely and automatically via the scripts described in this article, except for one task &#8211; <a href=\"https:\/\/docs.azuredatabricks.net\/api\/latest\/authentication.html\">creating an authentication token in Databricks.<\/a>\u00a0This task requires manual interaction with the Databricks API.<\/p>\n<p>For that reason, to run tests in a test environment, it was necessary to first deploy a test environment, and use the output from the deployment to configure Travis repository settings with those parameters. To see how to do that, continue reading <a href=\"https:\/\/github.com\/morsh\/social-posts-pipeline#connect-to-travis-ci\">here<\/a>.<\/p>\n<h3>Integration Testing<\/h3>\n<p>Together with ZenCity, we discussed adding Unit Testing to the Databricks pipeline notebooks, but we found that running unit tests on a stream-based pipeline was not something we wanted to invest in at that moment. For more information on Unit Testing Spark-based stream, <a href=\"http:\/\/mkuthan.github.io\/blog\/2015\/03\/01\/spark-unit-testing\/\">see here<\/a>.<\/p>\n<p>Integration testing on the other hand seemed like something that would help us figure out the maturity of the code being checked end to end, but also seemed quite a challenge. The idea behind the integration testing was to spin up an entire test environment, let the tests run with mock data, and &#8220;watch&#8221; the end of the pipeline while waiting for specific results.<\/p>\n<h3>Mocking Cognitive Services<\/h3>\n<p>We used <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/text-analytics\/\">Text Analytics<\/a> in <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/\">Cognitive Services<\/a>\u00a0to get a quick analysis of the language and topics on each tweet. Although this solution worked great, the throttling limits on Cognitive Services (which are present on all SKU levels with varying limits) could not support a pipeline with the scale we were hoping to support.\u00a0Therefore, we used those services as an implementation example only, while in the customer deployment, we used scalable propriety models developed by the customer.<\/p>\n<p>For that reason, in the published sample, we chose to mock those requests using a <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-functions\/functions-create-function-app-portal\">Function App<\/a>\u00a0with constant REST responses \u2013 in a real-life scenario this should be replaced with Cognitive Services (in case of a small scale stream), an external REST API, or a model that can be run by Spark.<\/p>\n<h3>Twitter API<\/h3>\n<p>In this sample, the production version uses <a href=\"https:\/\/github.com\/DanielaSfregola\/twitter4s\">Twitter API<\/a>\u00a0to read tweets on a certain hashtag\/topic and ingests them into the data pipeline. This API was encapsulated to enable mocking using predefined data in a test environment. Both the Twitter and the mock implementations use the following interface:<\/p>\n<h6>src\/social-source-wrapper\/src\/main\/java\/social\/pipeline\/source\/SocialQuery.java [Java]<\/h6>\n<pre title=\"src\/social-source-wrapper\/src\/main\/java\/social\/pipeline\/source\/SocialQuery.java\" class=\"lang:default decode:true\">\/\/ Defining the interface to query a social service\r\npublic interface SocialSource {\r\n  SocialQueryResult search(SocialQuery query) throws Exception;\r\n\r\n  void setOAuthConsumer(String key, String secret);\r\n\r\n  void setOAuthAccessToken(String accessToken, String tokenSecret);\r\n}<\/pre>\n<h2>Conclusion<\/h2>\n<p>We started out looking to help ZenCity solve a challenge around building a cloud-based data pipeline, but found that the real challenge was finding and creating a CI\/CD pipeline that can support that kind of solution.<\/p>\n<p>Along the way we worked on generalizing the solution in a way that would allow it to be configured and enhanced to work for any CI\/CD pipeline around a Databricks-based pipeline. The generalization process made our solution flexible and adaptable for other similar projects.<\/p>\n<p>Additionally, the approach we took with the pipeline \u2013 which includes integration testing on the entire pipeline in a full test environment \u2013 can also be integrated into other projects with include a data pipeline.<\/p>\n<p>There are two requirements needed to adopt this approach: first, a streaming pipeline, with which it&#8217;s hard to test each component or the SDK does not expose an easily testable API; and second, a data pipeline that enables controlling the input to the pipeline and monitoring the output.<\/p>\n<p>The scripts in this solution can be changed to support the deployment and configuration of any remotely controlled streaming infrastructure.<\/p>\n<h2>Resources<\/h2>\n<ul>\n<li>Link to GitHub repo: <a href=\"https:\/\/github.com\/Azure-Samples\/twitter-databricks-topic-extractor\">https:\/\/github.com\/Azure-Samples\/twitter-databricks-topic-extractor<\/a><\/li>\n<li>This solution was built on top of the great work done by <a href=\"https:\/\/github.com\/devlace\">Lace Lofranco<\/a> in <a href=\"https:\/\/github.com\/devlace\/azure-databricks-recommendation\">azure-databricks-recommendation<\/a>.<\/li>\n<li><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/databricks\/\">Databricks on Azure<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>This code story describes CSE&#8217;s work with ZenCity to create a data pipeline on Azure Databricks supported by a CI\/CD pipeline on TravisCI. The aim of the collaboration was to create a pipeline capable of processing a stream of social posts, analyzing them, and identifying trends.<\/p>\n","protected":false},"author":21371,"featured_media":10618,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[10,11,14,16],"tags":[60,67,76,124,144,166,177,291,320,333,334],"class_list":["post-9429","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure-app-services","category-big-data","category-cognitive-services","category-devops","tag-azure","tag-azure-cli","tag-azure-event-hubs","tag-cognitive-services","tag-databricks","tag-email","tag-featured","tag-pipelines","tag-scala","tag-spark","tag-spark-streaming"],"acf":[],"blog_post_summary":"<p>This code story describes CSE&#8217;s work with ZenCity to create a data pipeline on Azure Databricks supported by a CI\/CD pipeline on TravisCI. The aim of the collaboration was to create a pipeline capable of processing a stream of social posts, analyzing them, and identifying trends.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/9429","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/21371"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=9429"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/9429\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/10618"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=9429"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=9429"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=9429"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}