This article showcases how to take advantage of a highly distributed framework provided by spark engine, to load data into a Clustered Columnstore Index of a relational database like SQL Server or Azure SQL Database, by carefully partitioning the data before insertion.
Executive summary
Our testing shows that Azure SQL Database can be used as a highly scalable low latency key-value store. Starting with a cost-efficient 4-core General Purpose database, we see an order of magnitude increase in workload throughput as we increase dataset size by 100x and scale across the spectrum of database SKUs to a Business ...
Customers often need to move a dataset from a source system into a new destination, inserting rows that doesn't exist in a target table and update those that already exists. With this technique, we've been able to reduce the time needed to upsert a dataset of 2M rows against a target table with 30M rows from 20 hours to 20 minutes.