{"id":472,"date":"2021-06-11T12:11:25","date_gmt":"2021-06-11T19:11:25","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/startups\/?p=472"},"modified":"2021-06-11T12:11:25","modified_gmt":"2021-06-11T19:11:25","slug":"cto-choices-what-happens-when-you-need-to-choose-your-database","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/startups\/cto-choices-what-happens-when-you-need-to-choose-your-database\/","title":{"rendered":"CTO Choices \u2013 What happens when you need to choose your database?"},"content":{"rendered":"<blockquote>\n<p>Building a robust technology stack from scratch is a challenging task. CTO Choices is a series of articles aimed at CTOs, Tech founders, and project creators. They help you ask yourself the right question to pick up your tools. We try hard to give you honest, enduring, and agnostic advice. That is why \u2013 to let our true self speak \u2013 the last section of each article will present a heavily opinionated choice. And you will have the right questions to assess if this choice is good for you.\n  After an exploration of the criteria to <a href=\"https:\/\/devblogs.microsoft.com\/startups\/cto-choices-choose-your-frontend-stack\/\">pick a Frontend framework<\/a>, let&#8217;s look at another important component of your architecture: the database.<\/p>\n<\/blockquote>\n<p>Almost any application needs to keep track of some sort of state. For persistent state, you will probably have a database. Now, you have one of the most complicated choice of your architecture to make: which database to pick?<\/p>\n<p>You may not see it at first glance, but choosing a database is a complex task for two main reasons. First, there is dozen dimensions to think about \u2013 we will see them in a minute. Then, is one of the choices that is the most complicated to change afterwards. Changing your main database while having a product in production is an extremely complex task. You better have made the right choice the first time \ud83d\ude0a.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/startups\/wp-content\/uploads\/sites\/66\/2021\/06\/undraw_server_status_5pbv.png\" alt=\"A person in front of a database\" width=\"1296\" height=\"934\" class=\"size-full wp-image-490\" srcset=\"https:\/\/devblogs.microsoft.com\/startups\/wp-content\/uploads\/sites\/66\/2021\/06\/undraw_server_status_5pbv.png 1296w, https:\/\/devblogs.microsoft.com\/startups\/wp-content\/uploads\/sites\/66\/2021\/06\/undraw_server_status_5pbv-300x216.png 300w, https:\/\/devblogs.microsoft.com\/startups\/wp-content\/uploads\/sites\/66\/2021\/06\/undraw_server_status_5pbv-1024x738.png 1024w, https:\/\/devblogs.microsoft.com\/startups\/wp-content\/uploads\/sites\/66\/2021\/06\/undraw_server_status_5pbv-768x553.png 768w\" sizes=\"(max-width: 1296px) 100vw, 1296px\" \/><\/p>\n<h2>What are your constrains?<\/h2>\n<p>Before going into a list of criteria, you need to understand what you\u2019re building. Do you have performance constrains? Do you have the slightest idea of the volume of data and the volume growth over the next 3 to 5 years (in your biggest dreams)? Applications are born with quite different data requirements and\/or constrains. While I will give you my personal choice later in this article, some projects may need special attention.<\/p>\n<p>That\u2019s why I recommend doing a \u201cballpark\u201d exercise of estimating the amount of data per user\/customer and their growth over time. You may also want to understand if you need specific features within your database \u2013 like manipulating geospatial data or doing AI right within the DB \u2013 or later in the architecture \u2013 like multi-region deployments, or if you have specific requirements \u2013 like performance constrains. We will dig in more constrains considerations later, however, this first exercise should give you some hints.<\/p>\n<h2>A lot of criteria to consider.<\/h2>\n<p>So, how to make your own choice while picking a database? It may be not your first criteria, but as a developer, I am looking at <strong>developer experience<\/strong>: Can I execute the database in a container or locally? Can I query the DB both from CLI and from an UI? I\u2019m also looking at the DevOps side of the database: can we automate all the deployment \u2013 from creating a database, to updating a schema or creating new users -? Can we use the database in our automated test pipeline? There is a stark difference between databases on these topics.<\/p>\n<p>There is also a related question to ask: is there a client for the programming language you will be using? You may have a great DB choice, but if it\u2019s only working with few languages, it may be complicated to pick it. You can still create a service to expose DB features through an API, or wrap a native library, but this overhead may be used for other purposes.<\/p>\n<h3>Read and write patterns<\/h3>\n<p>Most of the data stores in use have a read\/write ratio largely in favor of reads. When the numbers of reads is consequent, we tend to add a cache layer in front of our database \u2013 like Redis. However, some constraints like consistency or heavy analytical queries on the transactional database may refrain this addition. You the need to evaluate the maximum capacity of your database: is there a maximum number of concurrent connections? Do you have read-only replication easily available?<\/p>\n<p>Write scalability is an interesting and complex problem area. From sequential IDs to sharding, there is a lot of techniques that you can leverage in both your database engine choice and your data architecture. Most of the (relational) databases are deployed on a single-node, but you need to think what you\u2019ll need in 1, 2 or 3 years from now. Does your hosting provider will be able to grow this server up to what you need in 3 years?<\/p>\n<p>The answer here may be pragmatic: choose something now and accept a redesign or a redeployment later. But ensure you have the proper analytics in your DB in place, so you don\u2019t hit the ceiling too late for this redesign.<\/p>\n<blockquote>\n<p>Few years ago, my team had to move a huge database from one datacenter to another (over 200 Tb). The fastest solution was to buy hundred of hard drives, copy data to them, ship them to the new location and plug them one by one in a \u201cload server\u201d. Even with shipping and handling times, the transfer rate was better than transferring the data over Internet.<\/p>\n<\/blockquote>\n<h3>Scalability<\/h3>\n<p>I\u2019ve already touched a bit the scalability criteria in the previous section. Scalability is also about deploying your database across regions \u2013 for High-availability or regional performance reasons. For example, some opensource databases are great in a single node deployment, but requires paid add-ons when you need to scale to several nodes.<\/p>\n<h3>Hosting<\/h3>\n<p>When you pickup a database, you may not know where you\u2019ll host it now or in 2 years. Hence, having multiple hosting capabilities is probably a clever idea. We can evaluate a database hosting solution with several categories:<\/p>\n<ul>\n<li><strong>Developer and Continuous Integration hosting<\/strong>: Can you run the database locally, on your development machine? Can it be spin up easily and in an automated way if you want to run integration tests in a DevOps pipeline? <\/li>\n<li><strong>Cloud hosting<\/strong>: As a developer, I\u2019ll favor a PaaS hosting over deploying my database on my own server. <\/li>\n<li><strong>Local deployment<\/strong>: While I\u2019m mostly using cloud hosting, there is still a lof of scenarios where local deployment make sense. Some innovative databases are only available as a cloud service. For databases supporting local deployment, you may want to have a detailed look at the options, like support for Kubernetes deployment. <\/li>\n<li><strong>IoT\/Edge deployment<\/strong>: Not all projects need an Edge deployment. If you (may) need it in the future, it\u2019s interesting to learn about the options there. Having different database engines for Edge and cloud deployment can be a solid choice, and sometimes you can have the same engine running both in the cloud and at the edge. <\/li>\n<\/ul>\n<h3>Analytics capabilities<\/h3>\n<p>At some point in the future, I can bet that you\u2019ll need to run some analytics on your database. There is enough to say about this topic for several articles. And yet, there are simple things to take into consideration. For example, is your database compatible with data ingestion and analytics tools you\u2019re planning to use? Is there any analytics capabilities embedded in the database? Some databases &#8211; or hosting providers &#8211; also leverage AI to give you recommendations about your database (regarding performance, security, &#8230;)!<\/p>\n<h3>Operations<\/h3>\n<p>You need to take a loving care of your database. That means doing proper backups (and testing them!), but also monitoring \u2013 for operations reasons but also for debugging purposes \u2013, database engine version upgrade. Even if you\u2019re not an Ops\/DBA person, there are important points to take into consideration. I\u2019ve seen some horror stories when some opensource NoSQL databases went wrong in production. Understand me correctly: there was nothing inherently wrong with these databases. The team was simply not prepared for handling a production incident.<\/p>\n<p>The scope of \u201coperations\u201d is also covering some \u201cgood\u201d incidents. Let\u2019s say you have a traffic spike. You may need to scale up your DB for a few hours. Can you do it with your database (and hosting)?<\/p>\n<h3>Security &amp; Compliance<\/h3>\n<p>Every project is concerned by these two criteria. If the database you pick need to have some features around them, the minimum acceptable level will considerably vary depending on your business, your industry and the type of data you\u2019re handling. Being exhaustive is therefore quite complicated. Here is few questions to help you:<\/p>\n<ul>\n<li>Does your database support a robust user and RBAC management? <\/li>\n<li>Does your database support encryption at rest? In transit? While processing the data? <\/li>\n<li>Is there special Personally identifiable information (PII) features? <\/li>\n<li>Is there data exploration\/labelling features to identify and manage sensitive data? <\/li>\n<li>Can you trace everything that is happening to a server? To a specific database object? <\/li>\n<\/ul>\n<h3>Cost<\/h3>\n<p>Some database engines are known to cost the price of a small apartment. Other ones are open source, and so they are free, right?<\/p>\n<p>As for many other components in your architecture, calculating the true cost (or \u201cTCO\u201d) is a bit more complex. In my previous startup, we were using MySQL (pre-MariaDB area), but we were paying an addon to get replication capabilities. Elasticsearch is a great opensource product that you can start using right away. If you require Security and observability features, then you need to switch to a paid plan (or get it through the Azure Marketplace). It\u2019s easy to lean towards free offerings \u2013 opensource or not. However, it\u2019s your duty to understand the costs across the lifetime of your usage to make an informed decision.<\/p>\n<h2>How many databases solution you need?<\/h2>\n<p>Sometimes, having only one database in your architecture is not the best option. And with managed databases in the cloud, adding a second storage option in your architecture is easier than before.<\/p>\n<p>When I\u2019m designing an architecture, I\u2019m starting with one data store. Adding a second one is only done when there is a clear use case, or when some criteria need to be met. For example, I\u2019m almost never storing files in the database directly, I use the Blob Storage for storing these images, videos, documents, etc. For some of my apps, I\u2019ve even used only the blob storage as a database, due to read-write performance constraints, scale and zero need for querying data. When adding a new data store, you need to go through all your criteria to validate this store. Sometimes optimizing for few milliseconds in query time will hurt several important criteria areas, making the addition of this datastore a bad decision.<\/p>\n<h2>Opinionated choice: Azure SQL and Cosmos DB<\/h2>\n<blockquote>\n<p>Disclaimer 1: I work for Microsoft, and I\u2019ve been in this ecosystem for more than 15 years. Yes it\u2019s an opinionated choice \ud83d\ude09. Disclaimer 2: Azure also offers fully-managed PostgreSQL, MySQL and MariaDB solutions. These database can be a great choice for your project. As always in this series of articles, we invite you to take the criteria detailed above to make your own decision.<\/p>\n<\/blockquote>\n<p>I know, \u201cAzure SQL\u201d. You think you know this product, which is coming from an old world with SQL Server and .NET. However, I bet you\u2019ll learn few things if you keep reading!<\/p>\n<h3>Developer Experience<\/h3>\n<p>Azure SQL can be used in a wide range of languages through ODBC, and natively in <a href=\"https:\/\/docs.microsoft.com\/azure\/azure-sql\/database\/connect-query-java?wt.mc_id=startup-17048-chmaneu\">Java<\/a>, <a href=\"https:\/\/docs.microsoft.com\/azure\/azure-sql\/database\/connect-query-nodejs?tabs=macos&amp;wt.mc_id=startup-17048-chmaneu\">Node.js<\/a>, PHP, <a href=\"https:\/\/docs.microsoft.com\/azure\/azure-sql\/database\/connect-query-python?wt.mc_id=startup-17048-chmaneu\">Python<\/a>, <a href=\"https:\/\/docs.microsoft.com\/azure\/azure-sql\/database\/connect-query-ruby?wt.mc_id=startup-17048-chmaneu\">Ruby<\/a> and .NET, and there is even an <a href=\"https:\/\/docs.microsoft.com\/azure\/azure-sql\/database\/spark-connector?wt.mc_id=startup-17048-chmaneu\">Apache Spark connector<\/a>. With the free and open-source <a href=\"https:\/\/docs.microsoft.com\/sql\/azure-data-studio\/what-is-azure-data-studio?view=sql-server-ver15&amp;wt.mc_id=startup-17048-chmaneu\">Azure Data Studio<\/a>, you have an editor to work on your Database projects, even on Linux while running SQL Server locally in a container.<\/p>\n<p>Doing so not only give you an editor with autocompletion, but also support for Database projects, that allows you to package your app, support schema compare and update in a CI\/CD system and even <a href=\"https:\/\/docs.microsoft.com\/sql\/azure-data-studio\/notebooks\/notebooks-guidance?view=sql-server-ver15&amp;wt.mc_id=startup-17048-chmaneu\">create Notebooks to explore your data with SQL<\/a> (or Spark, Python and Powershell). Oh and did I mention the <a href=\"https:\/\/docs.microsoft.com\/sql\/azure-data-studio\/extensions\/sanddance-extension?view=sql-server-ver15&amp;wt.mc_id=startup-17048-chmaneu\">SandDance extension<\/a> for data exploration?<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/user-images.githubusercontent.com\/11507384\/54236654-52d42800-44d1-11e9-859e-6c5d297a46d2.gif\" alt=\"An animation showing the visualizations generated by SandDance extension\" ><\/img><\/p>\n<p>You can of course use SQL CLI if you want to automate things or are in love with your Windows Terminal (or bash on Linux or WSL2). Did I mention that you can run, manage and develop with SQL Server on a Linux machine?<\/p>\n<p>You\u2019re looking for a NoSQL or a Graph database? Try Azure SQL! With it\u2019s JSON and Graph capabilities, it may suit your needs, even for non-relational storage.<\/p>\n<h3>Scalability<\/h3>\n<p>From <a href=\"https:\/\/docs.microsoft.com\/sql\/relational-databases\/in-memory-oltp\/introduction-to-memory-optimized-tables?view=sql-server-ver15&amp;wt.mc_id=startup-17048-chmaneu\">Memory-optimized tables<\/a> to multi-region highly-available deployment, SQL Server and Azure SQL offers you a wide range of scalability features that should help support a wide range of workloads.<\/p>\n<p>Do you have more than 4 Tb on a single database or do you need very high performance or instant backup? Switch to Azure SQL Hyperscale to unlock your database with independently scalable compute and storage resources, while retaining your knowledge and code. Just need a database for a few years a month? Leverage Azure SQL Serverless offering, that auto shutdown and restart your database when you open a connection.<\/p>\n<h3>Analytics<\/h3>\n<p>I\u2019m surprised by the analytics platform of most of the startups I know. The data is scattered across a wide range of data stores, and the reporting tools are either quite expensive or very low-tech. While not part of Azure SQL, Power BI is a great reporting tool \u2013 and the <strong>desktop version is free<\/strong>. Combined with an Azure SQL database, you\u2019re already ready for a lot of analytics needs, while maintaining all the security features.<\/p>\n<p>If you need machine learning capabilities, run your models directly within the Azure SQL engine! And if you land into the Big Data world, you can reuse your knowledge with <a href=\"https:\/\/docs.microsoft.com\/azure\/synapse-analytics\/overview-what-is?wt.mc_id=startup-17048-chmaneu\">Azure Synapse Analytics<\/a>.<\/p>\n<h3>Security &amp; Operations<\/h3>\n<p>I\u2019m not an expert in this domain. I\u2019ve just stumbled upon a lot of cool features \u2013 at least for me \ud83d\ude0a- during my career. For example, dynamic data masking allows you to mask some columns only for specific users. A credit cart number column can only reveal the last 4 digits for a call center operator while the payment processing service has access to the full number, while this being enforced \u2013 and thus audited- in a single place: the database. Row-level security allows you to hide some rows for some users, while secure enclaves protect the data even from other processes running in the machine.<\/p>\n<p>Do you love blockchain or use blockchain in your architecture? With <a href=\"https:\/\/docs.microsoft.com\/azure\/azure-sql\/database\/ledger-overview?wt.mc_id=startup-17048-chmaneu\">Azure SQL Database ledger<\/a>, you can create an auditable and immutable log of your data contained in a SQL Table&#8230; of offer a trusted off-chain storage for your blockchain operations.<\/p>\n<h3>Cost<\/h3>\n<p>Pricing for Azure SQL and SQL Server can be complicated, and I\u2019m not a licensing specialist. However, there are few things to know about it. First: <strong>SQL Server can be free!<\/strong> Yes, totally free. All developers\u2019 tiers are free, and if you need a development or test database hosted on Azure, you can benefit from the Dev &amp; Test pricing. Express Edition is also free including for production workloads with a limit of 10Gb of data per database. Azure-based pricing can also quite competitive: a replicated database \u2013 with the secondary replica accessible and up &#8211; starts around $14 USD \u2013 11\u20ac\/Month (At time of writing). Yes, that\u2019s a pricing for a database deployed in Azure two regions with an automatic region failover. Oh, and if you\u2019re deploying Azure SQL on an Edge device, it can be as low as $60USD (51\u20ac) per year.<\/p>\n<h2>Comsos DB as a specialized secondary store<\/h2>\n<p><a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/introduction&amp;wt.mc_id=startup-17048-chmaneu\">Cosmos DB<\/a> is an Azure-only multi-model data store created for performance \u2013 with a SLA on read and write times \u2013 and multi-master cross-region deployments. With it&#8217;s multiple APIs support, you can use native MongoDB, Gremlin, or Cassandra SDKs to query and store data within a Cosmos DB collection.<\/p>\n<p>While Cosmos DB doesn\u2019t check all my previous criteria \u2013 you can\u2019t deploy Cosmos DB on premises \u2013 it can be a great solution for some scenario thanks to a set of features: true serverless offering (pay per query), auto scaling, change feed, integration with other Azure services (like Azure Functions) and more recently a Linux Emulator.<\/p>\n<p>Some startups are already using Cosmos DB in production. You can read about <a href=\"https:\/\/devblogs.microsoft.com\/startups\/corlife-insights-cosmosdb\/\">Corlife using CosmosDB for analytical purposes in this article<\/a>.<\/p>\n<h2>What criteria are YOU using to pick your frontend stack?<\/h2>\n<p>Creating Startups DevBlog is all about sharing real-world experiences. So we want to hear from you! Do you have another approach for choosing your tech stack? Do you use other criteria to refine your choices? You should share them in the comments sections!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Almost any application needs to keep track of some sort of state. Now, you have one of the most complicated choice of your architecture to make: which database to pick? You better have made the right choice the first time \ud83d\ude0a.<\/p>\n","protected":false},"author":42265,"featured_media":490,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[11,50,12],"class_list":["post-472","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-startups","tag-architecture","tag-database","tag-serverless"],"acf":[],"blog_post_summary":"<p>Almost any application needs to keep track of some sort of state. Now, you have one of the most complicated choice of your architecture to make: which database to pick? You better have made the right choice the first time \ud83d\ude0a.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/startups\/wp-json\/wp\/v2\/posts\/472","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/startups\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/startups\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/startups\/wp-json\/wp\/v2\/users\/42265"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/startups\/wp-json\/wp\/v2\/comments?post=472"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/startups\/wp-json\/wp\/v2\/posts\/472\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/startups\/wp-json\/wp\/v2\/media\/490"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/startups\/wp-json\/wp\/v2\/media?parent=472"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/startups\/wp-json\/wp\/v2\/categories?post=472"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/startups\/wp-json\/wp\/v2\/tags?post=472"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}