{"id":2950,"date":"2017-05-10T20:35:19","date_gmt":"2017-05-10T20:35:19","guid":{"rendered":"https:\/\/www.microsoft.com\/reallifecode\/?p=2950"},"modified":"2020-03-15T05:08:44","modified_gmt":"2020-03-15T12:08:44","slug":"graphql-providing-context-into-global-crisiss-and-social-public-data-sources","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/graphql-providing-context-into-global-crisiss-and-social-public-data-sources\/","title":{"rendered":"Project Fortis: Accelerating UN Humanitarian Aid Planning with GraphQL"},"content":{"rendered":"<h2>Background<\/h2>\n<p>The <a href=\"https:\/\/www.unocha.org\/\">United Nations Office for the Coordination of Humanitarian Affairs (OCHA)<\/a> is in charge of planning responses to emergencies and humanitarian crises around the world, between dozens of organizations. Their activities include establishing camps for refugees and reacting to events like epidemics, famines and terrorist attacks. Very often, the places that most desperately need humanitarian aid are in the most dangerous and inaccessible areas on the planet.<\/p>\n<h3>The Problem<\/h3>\n<p>In order to create and execute an accurate response plan, UN OCHA needs to understand what\u2019s happening on the ground in these disaster areas. In locations where aid workers are present, these experts provide much of the insight and data that informs the response. In addition, for places too dangerous for aid workers to travel, response planning requires quantitative insight and analysis to understand the conditions that drive a need for humanitarian aid. OCHA actively monitors these hostile situations by reviewing social media outlets, local radio, RSS feeds, blogs, news sites and television stations on a daily basis to understand what local people are talking about. This manual data gathering process is time-consuming and imprecise, which means that h<span>umanitarian aid plans are often built from sparse and limited datasets.<\/span><\/p>\n<h3>Engagement<\/h3>\n<p>Through a collaboration between Microsoft&#8217;s Partner Catalyst Team and UN OCHA, Project Fortis started with the specific narrow goal of providing planning insight to the UN OCHA team for Libya, where the post-Gadhafi refugee and terrorist\u00a0conflict makes it too dangerous to have aid workers on the ground. \u00a0The project kicked off with a hackfest,\u00a0in which UN OCHA field officers who monitor Libya shared their operational process and existing issues with Microsoft&#8217;s engineers. On a daily basis, UN OCHA Libya specialists monitored approximately 330 distinct data sources, including Twitter and Facebook, local radio, TV and news sources. \u00a0Understanding the magnitude of this daily work enabled Microsoft&#8217;s engineers to propose a pipeline that would automate this data collection and visualize the results.<\/p>\n<p>Providing a more timely and accurate humanitarian response plan helps save lives. Our goal was to accelerate UN OCHA\u2019s ability to respond to these disasters by improving how they monitor their numerous data sources and gain insight from them.<\/p>\n<h2><strong>The Solution<\/strong><\/h2>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-3346\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/fortis_screenshot.png\" alt=\"\" width=\"1863\" height=\"889\" \/><\/p>\n<p>At a high level, we designed and built a data ingestion, analysis, and visualization pipeline. \u00a0The pipeline collects social media conversations and postings from the public web and darknet data sources. It then\u00a0performs\u00a0feature extraction and infers relationships between\u00a0targeted keywords\u00a0in real-time. Conversational message streams are paired with sentiment analysis and mood inference modeling alongside with other machine-learning techniques to gain quantitative insight into the topics, demographics, and indicators that drive the key humanitarian conditions for a targeted set of locations. \u00a0Finally, results are visualized on a dashboard so that a user can see trending topics and keywords over time and geography.<\/p>\n<p>As we spoke to other organizations, we noticed a remarkable consistency in the data analysis needs across a range of industries. Based on this feedback, we generalized the pipeline we built with the United Nations and created Fortis. \u00a0The Fortis solution was\u00a0used to provide\u00a0deeper insights\u00a0into the relationships between medicines, diseases and epidemic risk zones (such as those for dengue fever or Zika virus). Fortis provides users with the ability to configure\u00a0a watch list of topics, locations, and data sources. Insights can\u00a0be reshaped to meet the needs of any user scenario involving deep quantitative insight into social, public or private data sources in real-time.<\/p>\n<h3><strong>Additional Application: Dengue Fever Monitoring<\/strong><\/h3>\n<p>Ume\u00e5 University is one of the leading research institutes for using innovative methods to monitor outbreaks of dengue fever. We worked closely with the university to forecast areas at high-risk of dengue fever in Sri Lanka and Indonesia. The predictive dataset was generated off R-based statistical models that referenced weather forecast humidity, temperature, and precipitation data.<\/p>\n<p>Similar to the UN, the university needed help visualizing trends and insights about dengue fever incidents and interventions, represented across time and space. This information had to be available in real-time and we wanted to converge data from the predictive models and social media to represent a single unified view of reported incidents and symptoms.<\/p>\n<h3><strong>Data Sources<\/strong><\/h3>\n<p>For social media sources, we used Twitter\u2019s public filter streaming <a href=\"https:\/\/dev.twitter.com\/streaming\/reference\/post\/statuses\/filter\">API<\/a> to receive Tweets in real-time, <a href=\"https:\/\/developers.facebook.com\/docs\/graph-api\">Facebook&#8217;s Graph API<\/a>, <a href=\"http:\/\/www.acleddata.com\/\">ACLED\u2019s Rest API<\/a> for armed conflict-related events across Africa and Asia, and <a href=\"https:\/\/www.tadaweb.com\/\">Tadaweb<\/a> to query a pre-defined list of public web and darknet sites. We also used\u00a0<a href=\"https:\/\/www.wunderground.com\/weather\/api\/\">Weather Underground<\/a>\u00a0for the weather forecast dataset.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-2980\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/fortis-current-architecture.png\" alt=\"\" width=\"1728\" height=\"485\" \/><\/p>\n<h3><strong>High-Level Functional Architecture<\/strong><\/h3>\n<p>In this section, we&#8217;ll provide a high-level overview of the Fortis core stack, and then dive into the use of <a href=\"http:\/\/graphql.org\/learn\/\">GraphQL<\/a>\u00a0in the next section.<\/p>\n<p>To begin the ingestion process, data from sources like Facebook or Twitter is streamed into Azure Event Hubs. <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-functions\/\">Azure Functions<\/a>\u00a0are used across the system, in keeping with an event-driven, compute-on-demand serverless architecture. Azure Functions enable us to have a microservice code block that is triggered by Azure storage-based operations (Blob, Queue, Tables, No-SQL DBs, Event Hub Writes) or through exposed HTTP endpoints.<\/p>\n<p>We use\u00a0<a href=\"http:\/\/spark.apache.org\/streaming\/\">Spark <\/a>to scale and distribute the feature extraction operations\u00a0across an HDInsight cluster. The next version of Fortis will focus on\u00a0minimizing any disk-related operations outside of the Postgres and Cassandra persistence layers. Spark is an important component of the Fortis architecture, as we have models utilizing libraries from <a href=\"http:\/\/scikit-learn.org\/stable\/\">SkLearn<\/a>, R and <a href=\"http:\/\/spark.apache.org\/graphx\/\">GraphX <\/a>data structures. Where possible, we leverage the native and open source <a href=\"https:\/\/github.com\/stefanobaghino\/spark-twitter-stream-example\">Twitter4J <\/a>and <a href=\"http:\/\/facebook4j.github.io\/en\/index.html\">Facebook4J <\/a>Spark streaming data connectors to minimize compute latency. Our goal is to leverage the streaming connectors\u00a0available in Apache <a href=\"http:\/\/bahir.apache.org\/\">Bahir<\/a>, and contribute any missing connectors (Instagram, Snapchat, Bing, etc) back to <a href=\"https:\/\/github.com\/apache\/bahir\">Bahir<\/a>.<\/p>\n<h2><strong>GraphQL<\/strong><\/h2>\n<p>GraphQL is our service layer, and provide Fortis clients with the ability to directly query and interface with the processed data results. Our React-based web interface is fully powered by Fortis GraphQL services.<\/p>\n<h3><strong>Why Facebook\u00a0Created Another Framework<\/strong><\/h3>\n<p>GraphQL came about from the Facebook newsfeed team in 2012, as a response to data retrieval latency, especially for their mobile users. The goal was to retrieve as much relevant data as possible in a single round trip to the web server, minimizing round trips and decreasing latency.<\/p>\n<h3><strong>What is GraphQL?\u00a0<\/strong><\/h3>\n<p>GraphQL is a query language for microservice APIs and an alternative to REST. What makes GraphQL so great when querying data is you request the response payload that you want and you get exactly that, and it never changes. Let\u2019s step\u00a0through a simple Fortis GraphQL query.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-3000\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/gql_req_rsp.png\" alt=\"\" width=\"851\" height=\"463\" \/><\/p>\n<p>The request schema is a\u00a0JSON-esque query language that returns\u2014you guessed it\u2014JSON. You can see that the query and the response have the same structure. The curly braces represent a JSON object which is called a selection set in GraphQL. Within a selection set, you have object keys which are called fields.<\/p>\n<p>Fields can also have selection sets. \u00a0This powerful feature allows you to represent deeply recursive data models and complex data structures. We see this with the edge selection set, which returns a JSON Array\u00a0of edge objects. In a traditional REST approach, you&#8217;d need to chain multiple service requests, resulting in complex SQL joins. With GraphQL, you&#8217;re able to achieve complex data retrieval\u00a0through a single\u00a0trip to your API\u00a0server. We can\u00a0represent these queries at deeper levels (i.e. lists within lists, within lists). Think of it like running a recursive map operation across your entire dataset.<\/p>\n<p>You can also pass arguments to selection sets, enabling front-end developers to interact directly with your query. Common scenarios include paginating datasets, complex filters, and ordering conditions.<\/p>\n<p>In\u00a0the spirit of <span>minimizing\u00a0ambiguity in the expected data response, GraphQL\u00a0forces you<\/span>\u00a0to query down to the scalar leaf nodes throughout your query. So, you cannot use something like:\n<code>select * from my_rest_endpoint<\/code>.<\/p>\n<h3><strong>Flow: Static Type System<\/strong><\/h3>\n<p>When a query is submitted to a GraphQL server, how does the GraphQL server know the query is valid? Because the GraphQL server is backed by the flow\u00a0static type system. This type of system describes the data type, its fields and the arguments along the way. GraphQL is aware of what&#8217;s possible and what&#8217;s not through omission.\u00a0Below is a snippet of the Event type schema:<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-3020\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/gql_typeII.png\" alt=\"\" width=\"586\" height=\"592\" \/><\/p>\n<p>The type system determines if a query is valid or not, and returns a useful error to clients. You&#8217;ll notice in the example above, our event query returns an Object of the Event type. The square brackets represent a JSONArray, and the exclamation mark indicates that null values are not allowed (see\u00a0<a href=\"http:\/\/graphql.org\/learn\/schema\/#scalar-types\">supported scalar types<\/a>\u00a0for more info).<\/p>\n<h3><strong>Resolver Functions<\/strong><\/h3>\n<p>We need to tell\u00a0GraphQL how it should respond to queries, which we will orchestrate through resolver functions. Every field can have its own resolver function which defines how that field should return data. The response from a field&#8217;s parent node is passed in as an argument to the resolver. This occurs recursively until all\u00a0scalar leaf nodes are\u00a0evaluated. Resolvers support both JSON objects and Promises as return types.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-3024\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/gql_resolver.png\" alt=\"\" width=\"751\" height=\"365\" \/><\/p>\n<h3><strong>Introspection<\/strong><\/h3>\n<p>GraphQL also supports <a href=\"http:\/\/graphql.org\/learn\/introspection\/\">introspection<\/a>, meaning that GraphQL can query type schemas including data type documentation. This is a\u00a0powerful feature and enables GraphQL to be used as\u00a0a tool to build other tools. You can create rich IDEs and also interact directly with your codebase to learn how your GraphQL server works.<\/p>\n<p>All GraphQL servers are shipped with GraphiQL, a\u00a0browser-based IDE that enables front-end developers to write and test queries against your web server. GraphiQL supports typeahead while errors and issues are highlighted. The IDE also has an integrated documentation explorer.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-3030\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/graphiqlII.gif\" alt=\"\" width=\"1200\" height=\"670\" \/><\/p>\n<h3>Mutations<\/h3>\n<p>This code story has focused on fetching data, but no data platform\u00a0is complete without mutating data, too. These are the equivalents to the POST, PUT, PATCH and DELETE in HTTP\/REST. Similar to queries, you specify the return object type and fields once the requested mutation request has been committed and resolved by your API. This feature can be useful for retrieving the new state following the mutation.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-3037\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/gql_mutation.png\" alt=\"\" width=\"330\" height=\"365\" \/><\/p>\n<h3>Why<strong> did we decide to use GraphQL?<\/strong><\/h3>\n<p>To create a\u00a0more reactive front-end solution, we decided to use\u00a0the React\u00a0Flux paradigm.\u00a0Our visualization components subscribe to GraphQL observables, then data is pushed from backend services to React components to recompute its state on the fly. When it came to which API paradigm to follow, we had faced several challenges with REST, \u00a0as service clients made subtle assumptions about how the data would come back. REST-based architectures\u00a0introduced data latent\u00a0environments\u00a0and unpredictable results\u00a0to our users, too.<\/p>\n<p>As we move towards a more real-time pub\/sub model, social and public media postings will be running hot on client devices through GraphQL subscription streaming. Fortis users will be viewing results in both web and mobile, and also in locations with poor internet connection ranging up to 1000s of millisecond latency. As a result, minimizing the response payload is crucial.<\/p>\n<p>In addition, clients can plug in their own React-based dashboard visualization components. Maintaining an acceptable developer experience requires decorating react components with higher-order querying integration\u00a0into Fortis APIs. For example, imagine running a Fortis-live GraphQL query during the next US presidential election debates to gain an understanding of public opinion around foreign policy as distributed across the country. GraphQL provides front-end engineers this sort of flexibility to\u00a0easily re-shape trends and insights across\u00a0Fortis.<\/p>\n<h3>Using GraphQL Subscriptions with Spark Streaming<\/h3>\n<p>Feature extraction computations such as inferring\u00a0discussion topics, gender, entities, mood, sentiment, etc. are all distributed across our Spark cluster. The same applies for our predictive R-based models for\u00a0utilizing\u00a0weather forecast data\u00a0to identify high-risk zones for health-related epidemics. The resulting data is persisted to a Postgres server, as we aggregate\u00a0the underlying dataset in the form of <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/bb259689.aspx\">geotiles<\/a> to visualize the trends on a heat map.<\/p>\n<p>So, how should\u00a0we write\u00a0the Spark computed result to Postgres in an efficient way?\u00a0GraphQL subscriptions are another\u00a0action type that allows clients to wire-up GraphQL with a pub-sub system (i.e. Event Hub, Redis, Kafka). Subscriptions allow us to build more reactive backend extensions and real-time services. The general idea is that GraphQL becomes the consumer of\u00a0an Event Hub or Kafka topic, while Spark acts as\u00a0the message producer. The GraphQL subscription\u00a0resolver\u00a0handlers respond to these published messages\u00a0and\u00a0commit the\u00a0tile data entries into Postgres.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-3234\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/fortis-subscription-overviewII.png\" alt=\"\" width=\"2460\" height=\"1002\" \/><\/p>\n<p>First, we create a root schema definition and resolver, similar to Query and Mutation. In this case, a JSONArray of event\u00a0objects is streamed to all Fortis clients.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-3069\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/gql_subscription_chema.png\" alt=\"\" width=\"393\" height=\"150\" \/><\/p>\n<p>Clients subscribe to the observable\u00a0<code>eventsAdded<\/code> where\u00a0newly published events are pushed down to all active subscriptions. The response is described in the form of a GraphQL query.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-3073\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/gql_subscription.png\" alt=\"GraphQL Subscription request\" width=\"305\" height=\"294\" \/><\/p>\n<p><span>Apollo is the production-<\/span><span>grade<\/span><span> GraphQL client framework, <\/span><span>supporting<\/span><span>\u00a0platforms like Android, React, Express, React Native, Angular, C# and iOS. <\/span><span>I<\/span><span>t&#8217;s recommended to use a client framework like Apollo or Relay to get the most out of your GraphQL server. Both Apollo and Relay support features such as query and data caching, optimistic UI updates, batching, pagination, routing, test harness and real-time subscription utilities. Check out this Apollo <a href=\"http:\/\/dev.apollodata.com\/tools\/graphql-subscriptions\/index.html\">tutorial<\/a> for instructions on setting up a GraphQL server to support subscriptions.<\/span><\/p>\n<h3>Does GraphQL require React?<\/h3>\n<p>GraphQL uses the HTTP protocol and has no dependencies on React. Apollo supports most of the popular device platforms used today.<\/p>\n<h3>Setting up GraphQL on Azure<\/h3>\n<p>Apollo helps ease the developer experience when onboarding a new GraphQL express server to Azure. We&#8217;ll start with a boilerplate Node-based express app published by Apollo.<\/p>\n<p>Follow the commands below to bootstrap the baseline node app.<\/p>\n<pre class=\"lang:default decode:true\">git clone https:\/\/github.com\/apollostack\/apollo-starter-kit\r\ncd apollo-starter-kit\r\ngit checkout server-only\r\nnpm install\r\nnpm start<\/pre>\n<p>Once the GraphQL server is online you should be able to access the GraphiQL IDE and run a sample query at\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/localhost:8080\/graphql\" target=\"_blank\" rel=\"noopener noreferrer\">localhost:8080\/grapqhl<\/a>.<\/p>\n<pre class=\"lang:js decode:true\">{\r\n  testString\r\n}<\/pre>\n<p>You should see a response in the IDE response pane. You can walk through some other setup steps\u00a0following\u00a0<a href=\"https:\/\/dev-blog.apollodata.com\/tutorial-building-a-graphql-server-cddaa023c035\">this tutorial <\/a>to help get your feet wet with GraphQL.<\/p>\n<p>When you&#8217;re ready to deploy to Azure, push your changes to a GitHub repo. Then create an <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/app-service\/api\/\">Azure\u00a0API App<\/a> resource via the portal, and set up GitHub <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/app-service-web\/app-service-continuous-deployment\">continuous integration<\/a> via Kudu in the Deployment tab in Azure.<\/p>\n<h2>Final Thoughts<\/h2>\n<p>Adding GraphQL to Fortis gives Fortis clients a\u00a0flexible convention for\u00a0interacting directly\u00a0with our data APIs. There&#8217;s a growing set of\u00a0open source tools \u2014 including <a href=\"https:\/\/www.graph.cool\/\">Graphcool<\/a>, Apollo, and Relay <span>\u2014 <\/span>that make it easier to get started with GraphQL.<\/p>\n<p>The goal of Fortis is to provide deeper insight and analysis across a broad range of data, geographies, and scenarios. \u00a0While we started by focusing on the challenges of humanitarian aid planning in Libya, Fortis&#8217; generalized infrastructure enabled developers to leverage the pipeline in other domains.<\/p>\n<p>We accept Pull Requests of all shapes and sizes and encourage developers to check out the\u00a0<a href=\"https:\/\/github.com\/CatalystCode\/project-fortis\">Fortis GitHub repository<\/a> and<a href=\"https:\/\/github.com\/CatalystCode\/project-fortis\/issues\"> issue list<\/a> and contribute to this <span>incredibly important\u00a0<\/span>effort.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Using GraphQL and Azure to create a data processing pipeline for identifying trends and providing insights about global humanitarian crises.<\/p>\n","protected":false},"author":21362,"featured_media":10984,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[10,11,17],"tags":[60,190,307,333,334],"class_list":["post-2950","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure-app-services","category-big-data","category-frameworks","tag-azure","tag-graphql","tag-reactjs","tag-spark","tag-spark-streaming"],"acf":[],"blog_post_summary":"<p>Using GraphQL and Azure to create a data processing pipeline for identifying trends and providing insights about global humanitarian crises.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2950","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/21362"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=2950"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2950\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/10984"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=2950"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=2950"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=2950"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}