{"id":4929,"date":"2017-10-09T09:18:32","date_gmt":"2017-10-09T16:18:32","guid":{"rendered":"https:\/\/www.microsoft.com\/developerblog\/?p=4929"},"modified":"2020-03-20T09:31:50","modified_gmt":"2020-03-20T16:31:50","slug":"migration-story-aws-azure","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/migration-story-aws-azure\/","title":{"rendered":"Moving High Scale Data and Compute from AWS to Azure for The Emedgene Genetics Intelligence Platform"},"content":{"rendered":"<h2>Background<\/h2>\n<p>While artificial intelligence (AI) is offering amazing data-driven insights across a range of industries, arguably one of the more important applications for us as humans has been in the field of genomics. Technologists have teamed up with geneticists to use AI technology to offer a better understanding of human DNA, in an attempt to tackle common diseases around the world. By comparing millions of samples of genetic data, experts can better understand the cause of diseases, and attempt to highlight demographics and patient profiles who are most at risk from certain conditions.<\/p>\n<p>However, while the field has advanced extensively over the last two decades, thousands of genetic diseases remain unknown, and many research centers and healthcare organizations do not have the time or resources to appropriately analyze and take insights from their patient DNA data.<\/p>\n<p>To help alleviate this problem, Microsoft recently partnered with\u00a0<a href=\"http:\/\/emedgene.com\/\">Emedgene<\/a>, to develop a next-generation genomics intelligence platform which incorporates advanced artificial intelligence technologies to streamline the interpretation and evidence presentation process. With this platform, healthcare providers around the world will be better able to provide individualized care to more patients through improved yields on their diagnostic data.<\/p>\n<p>To allow Emedgene to scale efficiently, the company wanted to migrate their solution from AWS to Azure with support from Microsoft. However, due to the huge amount of high scale data and computing power needed to run AI applications of this type, this posed a challenge. To do this, our team migrated the compute resources to Azure, transferred more than 100 TB of blob storage and handled application secrets without embedding the Azure SDK in the application code:<\/p>\n<p><!--more--><\/p>\n<h2>Architecture &amp; Migration<\/h2>\n<p>A key part of Emedgene&#8217;s architecture\u00a0is the provisioning of new EC2 Spot instances to execute compute heavy analytics processes that require the input of large sets of genomics data in S3. Each analytics job metadata is enqueued in a queue for processing by the EC2 instances.\u00a0The number of the instances is dynamic and varies according to the number of messages in the queue.<\/p>\n<p><strong>Compute:<\/strong> EC2 instances are provisioned using another EC2 instance that is monitoring a Redis Queue for additional jobs.<\/p>\n<p><strong>Data:<\/strong> The genomic datasets can comprise over one million individual files with a cumulative size limit of 100TB. In order to perform very fast analytics, Emedgene needs to copy the different sets of files from S3 to the instances attached disks each time an instance is provisioned and gain higher throughput and lower network latency between the instances and the data. In Azure, we will copy these files from <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/data-lake-store\/\">Azure Data Lake Store<\/a> to the VM&#8217;s attached disks the same way.<\/p>\n<p>To support native scalability without using another application module, like the solution in AWS that included Redis Queues and additional EC2 instances, we used <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/virtual-machine-scale-sets\/virtual-machine-scale-sets-overview\">Virtual Machine Scale Sets (VMSS)<\/a>. VMSS enables us to monitor an <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/service-bus\/\">Azure Service Bus<\/a> queue for messages and provision new instances when the queue reaches a certain threshold. Once the application finishes its task, it invokes a <a href=\"https:\/\/github.com\/CatalystCode\/self-destroy-instance\">script (Self Destroy Instance)\u00a0<\/a>that deletes the VM instance from VMSS. The script can be invoked in a Docker container for maximum flexibility in the deployment process.<\/p>\n<p>Note: We considered working with <a href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/announcing-public-preview-of-azure-batch-low-priority-vms\/\">Azure Batch Low Priority VMs<\/a> but scaling with Azure Service Bus and custom VM images are not fully supported.<\/p>\n<p> <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/10\/emedgene-1-1024x555-1.png\" alt=\"Image emedgene 1 1024 215 555\" width=\"1024\" height=\"555\" class=\"aligncenter size-full wp-image-10884\" srcset=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2017\/10\/emedgene-1-1024x555-1.png 1024w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2017\/10\/emedgene-1-1024x555-1-300x163.png 300w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2017\/10\/emedgene-1-1024x555-1-768x416.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<h3>The DevOps flow<\/h3>\n<p>The Continuous Integration \/ Continuous Deployment (CI\/CD) process is managed with Jenkins. While Jenkins provides a lot of flexibility, we needed a way to provision and manage Azure resources in the pipeline. To do this, we used <a href=\"https:\/\/docs.microsoft.com\/en-us\/cli\/azure\/overview\">Azure CLI 2.0<\/a> but we also needed to be able to propagate the results from each command to the next such as names and paths.<\/p>\n<p>For example, this code is the result of a CLI command. We want to take the &#8220;name&#8221; property and propagate it to another command since it is dynamic.<\/p>\n<pre class=\"lang:default decode:true\">{\r\n       \"id\": \"\/subscriptions\/some-guid\/resourceGroups\/test\",\r\n       \"location\": \"northeurope\",\r\n       \"managedBy\": null,\r\n       \"name\": \"test\",\r\n       \"properties\": {\r\n          \"provisioningState\": \"Succeeded\"\r\n       },\r\n       \"tags\": null\r\n}<\/pre>\n<p>To do this, we created the <a href=\"https:\/\/wiki.jenkins.io\/display\/JENKINS\/Azure+CLI+Plugin\">Azure CLI Jenkins Plugin<\/a>. The following steps describe how to provision a new VM, create an image from that VM, and create a Virtual Machine Scale Set (VMSS) from that image.<\/p>\n<ol>\n<li>Using the <a href=\"https:\/\/wiki.jenkins.io\/display\/JENKINS\/EnvInject+Plugin\">Environment Injector<\/a> plugin, inject environment variables into Jenkins\n<pre class=\"lang:default decode:true\"># Configs\r\n\r\nbase_vm_resource_group=\r\nbase_vm_name=\r\n\r\nsubscription_id=\r\nbase_image_resource_group=\r\nbase_image_name=base-image-$(date +%Y%m%d-%H%M)\r\nbase_image_source=\r\n\r\nvmss_name=\r\nvmss_resource_group=\r\nvmss_image=\r\nvmss_vnet_name=\r\nvmss_subnet_name=\r\nvmss_ssh_key_path=\r\nimage_user=\r\ninstance_count=1\r\ninstance_type=Standard_F8S<\/pre>\n<\/li>\n<li>Provision a new VM with an attached disk\n<pre class=\"lang:default decode:true\"># Recreate resource group and vm\r\naz group create -n ${base_vm_resource_group} -l ${location}\r\naz vm create --resource-group ${base_vm_resource_group} --name ${base_vm_name} \r\n    --image ${vmss_image} --storage-sku Premium_LRS --ssh-key-value ${vmss_ssh_key_path} \r\n    --admin-username ${image_user} --size ${instance_type}\r\n<\/pre>\n<\/li>\n<li><span class=\"annotation\" data-author=\"Mor Shemesh\">SSH into the VM using <a href=\"https:\/\/plugins.jenkins.io\/ssh\">Jenkins SSH Plugin<\/a><\/span><\/li>\n<li>Deploy the application using a simple BASH script<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/python\/api\/overview\/azure\/data-lake-store?view=azure-python\">Copy the data from Azure Data Lake Store<\/a> to the attached disk to enable the application maximum performance in Read\/Write<\/li>\n<li>Deprovision the VM\n<pre title=\"Deprovision the VM\" class=\"lang:default decode:true\">sudo waagent -deprovision+user --force\r\n<\/pre>\n<\/li>\n<li>Make an image from the VM\n<pre title=\"Create an image \" class=\"lang:default decode:true\"># Deallocate (stop) the base image vm\r\naz vm deallocate --resource-group {$base_vm_resource_group} --name ${base_vm_name}\r\n\r\n# Flag vm to be able to create an image from it\r\naz vm generalize --resource-group ${base_vm_resource_group} --name ${base_vm_name}\r\n\r\n# Create the image\r\naz image create --resource-group ${base_image_resource_group} --name ${base_image_name} --source ${base_image_source}\r\n\r\n# Export the id of the image to an environment variable\r\nid|vmss_image\r\n<\/pre>\n<\/li>\n<li>Delete the VM and associated resources\n<pre class=\"lang:default decode:true\">az group delete -n ${base_vm_resource_group} --yes\r\n<\/pre>\n<\/li>\n<li>Create or update the VMSS with the current VM image\n<pre class=\"lang:default decode:true\"># Create (or update) VMSS with new image\r\naz vmss create -n ${vmss_name} -g ${vmss_resource_group} --image ${vmss_image} --vnet-name ${vmss_vnet_name} \r\n    --subnet ${vmss_subnet_name} --storage-sku Premium_LRS --ssh-key-value ${vmss_ssh_key_path} \r\n    --admin-username ${image_user} --instance-count ${instance_count} --vm-sku ${instance_type} \r\n    --data-disk-sizes-gb 10 --disable-overprovision<\/pre>\n<p>The image below shows the Azure CLI commands with environment variables as parameters:<\/li>\n<\/ol>\n<p> <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/10\/jenkins2-1024x651-1.jpg\" alt=\"Image jenkins2 1024 215 651\" width=\"1024\" height=\"651\" class=\"aligncenter size-full wp-image-10886\" srcset=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2017\/10\/jenkins2-1024x651-1.jpg 1024w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2017\/10\/jenkins2-1024x651-1-300x191.jpg 300w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2017\/10\/jenkins2-1024x651-1-768x488.jpg 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<h2>Transferring the Data<\/h2>\n<p>Emedgene provides its customers the option of supplying data on either Azure or AWS S3. For Azure, Emedgene decided that they wanted to store their data in <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/data-lake-store\/\">Azure Data Lake Store (ADLS),<\/a>\u00a0which enables the capture of data of any size, type, or ingestion speed. In order to achieve this\u00a0functionality, they needed to transfer more than 100 TB of customer data securely from S3 storage to ADLS using\u00a0<a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/data-factory\/\">Azure Data Factory<\/a>. Azure Data Factory allows users to create a workflow that can ingest data from both on-premises and cloud data stores, then transform or process that data using existing compute services.<\/p>\n<p>During the migration process, Emedgene faced a challenge involving their need to periodically pull from their customers&#8217; S3 instances. While Azure Data Factory can be used for a full migration from S3, it only supports incremental data copy from external data sources by date stamp if the data store is properly structured.<\/p>\n<pre class=\"lang:default decode:true\">Example: -|--year\r\n\r\n             |--month\r\n\r\n                |-- day<\/pre>\n<p>Since AWS does not provide a way of enforcing a rigid store hierarchy, Emedegene needed a mechanism to support incremental copy for improperly organized data stores. To resolve this problem, we created a <a href=\"https:\/\/github.com\/catalystcode\/s3toadl\">Docker container for incremental data copy from S3 to ADLS<\/a>. This service enables Emedgene to copy new data incrementally from S3 to ADLS by datestamp without a dependency on data store structure.<\/p>\n<h3>Application Secrets<\/h3>\n<p>Emedgene\u2019s microservices architecture rests upon a large number of interdependent internal and external services. Emedgene needed a secure and centralized way to manage access to and between these services. \u00a0<a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/key-vault\/\">Azure Key Vault<\/a> provides key and secret management which allowed Emedgene to generate and manage secure access tokens for their services.<\/p>\n<p>However, Emedgene wanted to be able to easily query Azure Key Vault locally using their Azure Service Provider Credentials without having to parse a response object. This approach presented a couple of challenges.\u00a0While the Azure Service REST API supports Service Provider Authentication, the client-level APIs require Credential Authentication for querying the Key Vault. In addition, certain key vault management features such as retrieving a list of all key vaults are not supported by the REST API.<\/p>\n<p>To support Emedgene\u2019s request we created a scalable <a href=\"https:\/\/github.com\/CatalystCode\/azure-key-vault-secret-as-a-service\">Docker container for querying Azure key vaults for secrets using Azure Service Provider Credentials<\/a>. Once the container service is running, users can choose either to retrieve a secret from a vault or search all vaults for a given secret.<\/p>\n<h2><strong>Summary<\/strong><\/h2>\n<p>Not all Azure services have direct 1:1 parity with AWS. When migrating from AWS to Azure there were three important questions which our team had to address:<\/p>\n<ol>\n<li>What was the most efficient path to move the data from AWS?<\/li>\n<li>How could we achieve the same or better compute functionality upon migration?<\/li>\n<li>How to manage access to the newly migrated services?<\/li>\n<\/ol>\n<p>Our solution addresses these concerns by providing Emedgene with better native compute scalability, efficient data transfer support, and access management through the Azure Key Vault.<\/p>\n<p>The solution outlined in this code story is adaptable to any workload that requires:<\/p>\n<ul>\n<li>Continuous transfer of a large amount of data from S3 to Azure Data Lake<\/li>\n<li>Changing the Jenkins DevOps process from AWS resources to Azure resources<\/li>\n<li>Handling Key Vault secrets without embedding the Key Vault API in your code<\/li>\n<\/ul>\n<p>We encourage any team undertaking a similar migration to Azure to take advantage of our code, which can be found in the following GitHub links:<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/jenkinsci\/azure-cli-plugin\">Azure CLI Plugin<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/Azure\/S3ToAdl\">Transfer data from S3 to ADL<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/CatalystCode\/azure-key-vault-secret-as-a-service\">Application Secrets<\/a><a href=\"https:\/\/github.com\/CatalystCode\/azure-key-vault-secret-as-a-service\">\u00a0<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/CatalystCode\/self-destroy-instance\">Self Destroy Instance<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>We demonstrate how we migrated the compute resources of a genomics intelligence platform to Azure, transferring more than 100 TB of blob storage and handling application secrets without embedding the Azure SDK.<\/p>\n","protected":false},"author":21405,"featured_media":13048,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[15,16],"tags":[39,60,131,151,221],"class_list":["post-4929","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-containers","category-devops","tag-amazon-web-services-aws","tag-azure","tag-containers","tag-devops","tag-jenkins-ci"],"acf":[],"blog_post_summary":"<p>We demonstrate how we migrated the compute resources of a genomics intelligence platform to Azure, transferring more than 100 TB of blob storage and handling application secrets without embedding the Azure SDK.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/4929","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/21405"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=4929"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/4929\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/13048"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=4929"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=4929"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=4929"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}