{"id":2160,"date":"2015-11-02T18:01:52","date_gmt":"2015-11-03T02:01:52","guid":{"rendered":"https:\/\/www.microsoft.com\/reallifecode\/index.php\/2015\/11\/02\/working-with-spatial-data-in-table-storage\/"},"modified":"2020-03-19T10:17:13","modified_gmt":"2020-03-19T17:17:13","slug":"working-with-spatial-data-in-table-storage","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/working-with-spatial-data-in-table-storage\/","title":{"rendered":"Working with Spatial Data in Table Storage"},"content":{"rendered":"<p>Several of our customers needed to correlate and aggregate large volumes of location-tagged data. While many higher level databases provide rich capabilities for spatial data (such as SQL Server or MongoDB), most large scale cloud based table storages (e.g. Microsoft Azure Table Storage) don\u2019t provide native support for spatial queries. In this case study, we discuss how we implemented spatial capabilities by mapping entities to unique grid numbers. While our implementation uses Table Storage, this approach is platform independent and can be reused with other storage technologies.<\/p>\n<h2 id=\"overview-of-the-solution\">Overview of the Solution<\/h2>\n<p>In one of our projects, we needed to overlay additional geo-tagged data on top of an interactive map. For instance, we visualized building constructions and overlaid that with demographic information and heat maps to show the movement of people during a specific period. When using a storage system that supports spatial data, this would be a simple query &#8211; just filtering results that are visible within the current zooming window. However if the storage technology doesn\u2019t support spatial data, the query is not so simple. Considering that our data is stored in <a href=\"https:\/\/azure.microsoft.com\/en-us\/documentation\/articles\/storage-table-design-guide\/\">Azure Table Storage<\/a> or ingested through a real-time analytics pipeline (using <a href=\"https:\/\/azure.microsoft.com\/en-us\/documentation\/services\/event-hubs\/\">EventHub<\/a> and <a href=\"https:\/\/azure.microsoft.com\/en-us\/documentation\/services\/stream-analytics\/\">Stream Analytics<\/a>), we had to implement our own approach to query spatial data.<\/p>\n<p>One of the key challenges of spatial data is that each geo-tagged entity is described by at least two properties (latitude and longitude). Therefore to retrieve all entities which are contained within a rectangular boundary requires at least the filtering of two properties (latitude and longitude), followed by an inner join:<\/p>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>[(Entity.Lon \u2265 Query.SouthEast.Lon) AND (Entity.Lon \u2264 Query.NorthWest.Lon)]\r\nINNER JOIN\r\n[(Entity.Lat \u2265 Query.SouthEast.Lat) AND (Entity.Lat \u2264 Query.NorthWest.Lat)]\r\n<\/code><\/pre>\n<\/div>\n<p>This is straightforward if the storage technology indexes all properties (columns); however it is more challenging if the columns used for storing spatial data are not indexed. For instance, using Azure Table Storage, only the partition key and the row key are indexed. (That\u2019s why every query should contain at least one of these keys \u2013 if not, it will result into a full table scan). Furthermore, row keys must be unique across a partition and therefore only one discrete spatial entity can be stored for a given location. Also, both partition and row keys are of datatype string, therefore the use of numeric filters for latitude and longitude aren\u2019t directly supported.<\/p>\n<p>To address these constraints, we decided not to work with latitude and longitude independently, but to create a combined property that refers to a geographical boundary; in our case, a grid. Each set of latitude and longitude values belongs to a discrete grid, and each grid is identified by a unique grid number. We simply use this grid number as the partition key. This allows us to efficiently look up all entities that belong to a certain rectangular boundary by simply filtering the partition key for the required grid identification numbers. Because Table Storage filters are restricted to 15 discrete comparisons, we decided to create a new table for each required grid size. For instance, we store the building construction entities in two tables; one with a grid size of 50 meters, and one with a grid size of 1000 meters.<\/p>\n<h2 id=\"implementation\">Implementation<\/h2>\n<p>To map latitude and longitude to a discrete grid, we used J\u00fcrgen Pfeifer\u2019s Geodesy library which is available on GitHub ( https:\/\/github.com\/juergenpf\/Geodesy) as well as a NuGet package.<\/p>\n<p>Geodesy handles the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Universal_Transverse_Mercator_coordinate_system\">Universal Transverse Mercator (UTM)<\/a> projection, dividing the earth into smaller grids which are then each mapped to a flat map. Based on this, J\u00fcrgen implemented an algorithm to put a finer-grain mesh over the mapped area of the earth to be able to classify a geo-location by a discrete, globally unique mesh number, which we use as our grid number.<\/p>\n<p>Using the library is straightforward: We just instantiate the mesh for a specific grid size and retrieve the grid number for each set of latitude and longitude values. This grid number becomes the partition key for the entity to be stored.\nBelow is a walkthrough of the key steps to implement this approach:<\/p>\n<ul>\n<li>Add the Geodesy NuGet package<\/li>\n<li>Define the using statement for the library:<\/li>\n<\/ul>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>using Geodesy;\r\n<\/code><\/pre>\n<\/div>\n<ul>\n<li>Defining the spatial entity which stores the gridId as its PartitionKey:<\/li>\n<\/ul>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>    class SpatialEntity : TableEntity\r\n    {\r\n        public SpatialEntity() {}\r\n        public SpatialEntity(long gridId, double lat, double lon)\r\n        {\r\n           this.PartitionKey = gridId.ToString();\r\n           this.RowKey = Guid.NewGuid().ToString();\r\n           this.lat = lat;\r\n           this.lon = lon;\r\n        }\r\n        public double lon { get; set; }\r\n        public double lat { get; set; }\r\n        public string additionalProperty { get; set; }\r\n    }\r\n<\/code><\/pre>\n<\/div>\n<ul>\n<li>Create the mesh(es). In our example, we create one for 50 meters and another one for 1000 meters:<\/li>\n<\/ul>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>GlobalMesh mesh50m = new GlobalMesh(50);\r\nGlobalMesh mesh1000m = new GlobalMesh(1000);\r\n<\/code><\/pre>\n<\/div>\n<ul>\n<li>Get the table references for the two tables which are used to store the spatial data:<\/li>\n<\/ul>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>CloudTableClient tableClient = storageAccount.CreateCloudTableClient();\r\nCloudTable spatialTable50m = tableClient.GetTableReference(\u201cspatial50m\u201d);\r\nspatialTable50m.CreateIfNotExists();\r\nCloudTable spatialTable1000m = tableClient.GetTableReference(\u201cspatial1000m\u201d);\r\nspatialTable1000m.CreateIfNotExists();\r\n<\/code><\/pre>\n<\/div>\n<ul>\n<li>Retrieve the grid ids for the lat\/lon pair, one for each grid size:<\/li>\n<\/ul>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>long gridId50m = mesh50m.MeshNumber(lat, lon);\r\nlong gridId1000m = mesh1000m.MeshNumber(lat, lon);\r\n<\/code><\/pre>\n<\/div>\n<ul>\n<li>Create two spatial entities which will be stored in Table Storage:<\/li>\n<\/ul>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>var entity50m = new SpatialEntity(gridId50m, lat, lon);\r\nvar entity1000m = new SpatialEntity(gridId1000m, lat, lon);\r\n<\/code><\/pre>\n<\/div>\n<ul>\n<li>Insert the two spatial entities into their respective tables:<\/li>\n<\/ul>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>spatialTable50m.Execute(TableOperation.Insert(entity50m));\r\nspatialTable1000m.Execute(TableOperation.Insert(entity1000m));\r\n<\/code><\/pre>\n<\/div>\n<ul>\n<li>Create the table query by filtering the gridId that maps to the requested latitude and longitude:<\/li>\n<\/ul>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>var spatialQuery50m = new TableQuery&lt;SpatialEntity&gt;().Where(TableQuery.GenerateFilterCondition(\"PartitionKey\", QueryComparisons.Equal, mesh50m.MeshNumber(lat, lon)));\r\nvar spatialQuery1000m = new TableQuery&lt;SpatialEntity&gt;().Where(TableQuery.GenerateFilterCondition(\"PartitionKey\", QueryComparisons.Equal, mesh1000m.MeshNumber(lat, lon)));\r\n<\/code><\/pre>\n<\/div>\n<ul>\n<li>Query the entities from their respective tables:<\/li>\n<\/ul>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>var entities50m = spatialTable50m.ExecuteQuery(spatialQuery50m);\r\nvar entities1000m = spatialTable1000m.ExecuteQuery(spatialQuery1000m);\r\n<\/code><\/pre>\n<\/div>\n<p>If you\u2019re not developing in .NET, you can use the http:\/\/github.com\/timfpark\/geo-tile library, which uses a slightly less accurate approach. The source code is available in C#, JavaScript, and Python.<\/p>\n<h2 id=\"challenges\">Challenges<\/h2>\n<p>This approach works very well for rectangle-based spatial queries. If there is the need for querying polygons or other geometric structures, a two-step approach is required:<\/p>\n<ol>\n<li>Calculate the bounding rectangle for the polygon and query all entities that belong to grids that are contained within, or intersect with, the bounding rectangle<\/li>\n<li>Run a second local geometric function to select the entities that actually belong to the polygon<\/li>\n<\/ol>\n<p>To simplify the geometric functions, it might be useful to project longitude and latitude onto a Mercator map and transform them into X and Y coordinates in meters. X and Y will be stored as part of the entity, in addition to latitude and longitude. This can be easily done by using one of the provided Geodesy projections:<\/p>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>SphericalMercatorProjection geoTransform = new SphericalMercatorProjection();\r\ndouble x = GeoTransform.LongitudeToX(lon);\r\ndouble y = GeoTransform.LatitudeToY(lat);\r\n<\/code><\/pre>\n<\/div>\n<h2 id=\"opportunities-for-reuse\">Opportunities for Reuse<\/h2>\n<p>This approach can be used across a wide range of storage technologies, even if they don\u2019t provide native spatial capabilities. Even without the need to store data, the Geodesy library is of great help in computing spatial data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Implementing spatial capabilities by mapping entities to unique grid numbers in large-scale cloud-based table storage, like Microsoft Azure Table Storage.<\/p>\n","protected":false},"author":21354,"featured_media":1086,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[11],"tags":[76,95,96,184,185,336],"class_list":["post-2160","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-big-data","tag-azure-event-hubs","tag-azure-stream-analytics","tag-azure-table-storage","tag-geodesy","tag-geotagging","tag-spatial-data"],"acf":[],"blog_post_summary":"<p>Implementing spatial capabilities by mapping entities to unique grid numbers in large-scale cloud-based table storage, like Microsoft Azure Table Storage.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2160","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/21354"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=2160"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2160\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=2160"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=2160"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=2160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}