September 17th, 2025
heart1 reaction

Record Scanner for vinyl collectors cuts costs with Azure Cosmos DB vector search

Azure Cosmos DB Team
Azure Cosmos DB Team

by Artur Drozdz, Founder of Record Scanner (recordscanner.com)

If you’re like me, there’s at least one room in your home with an entire cabinet dedicated to your growing vinyl collection. At this point, you’ve probably heard that vinyl is making a comeback, it’s not just vintage anymore. In fact, after hitting a low during the 90s and 2000s, sales of records grew by 10 percent in 2024, largely driven by Gen Z-ers.

About seven years ago, I looked at my collection and decided it was time to finally organize it. I went searching for an app to help me catalog my treasures, but the solutions I found were not easy to use and they weren’t mobile friendly. That’s when I decided to create a prototype for a cover scanner.

Black Smart Phone with purple gradient background.

What started out as a fun side project for a vinyl enthusiast has grown into an award-winning app for music lovers that reached a quarter of a million downloads. But to get Record Scanner to where it is today, I had to learn all about vector image scanning, find tools to do embeddings, and prototype with open-source solutions. Our first iteration of our scanner was prohibitively expensive because it required a machine with a huge amount of RAM. That’s when we learned about vector search with Azure Cosmos DB, which was nearly 20 times less expensive than our open source based solution.

In this blog, I’ll discuss how we first developed Record Scanner, what we built with Azure Cosmos DB, and what we learned about vector search.

Building a vinyl record catalog mobile app with Azure Cosmos DB

When we set out to build Record Scanner, I wanted it to be as user-friendly as possible. That meant capturing and transforming text instead of having the user manually input characters. It also meant providing a fast, simple way to scan the cover art with a mobile phone. Currently, Record Scanner offers three options for scanning—soon to be four—including:

  1. Barcodes
  2. Catalog numbers
  3. Record cover images
  4. Inner label—coming soon

Developing a record scanner for barcodes was straightforward because it’s a universal system. However, many records were released before barcodes became standard and they were marked with catalog numbers instead. For those, we used optical character recognition (OCR) to transform them into text, so users don’t have to input the information. But the capability we’re best known for is our innovative record cover scanner.

Creating the cover scanner was quite a long journey for us, which started with prototyping a reverse image search using APIs available on the market. But it wasn’t reliable in the long run, and we often incurred downtime or changes to rules we had no control over. From there, we decided to build our own using open-source solutions. Although this worked, we found it was very expensive to hold large amounts of images in the cloud, and it required a virtual machine that used a ton of RAM.

Azure Cosmos DB delivers 20X cheaper vector search

When we looked around for alternatives to reverse image search, we found Azure Cosmos DB and its vector capabilities. It’s not easy to build that database that performs vector search of images but doesn’t rely on a massive machine. It’s really a complex problem, but it looks like the guys at Microsoft figured it out.

A diagram of a blue sphere with arrows

We gave Azure Cosmos DB a try and the first thing we noticed was how fast it was. It has a very similar SQL-based query syntax, so we could easily start writing queries and it’s also around 20 times cheaper than our open-source-based solution because it doesn’t require many resources. Consider this cost comparison: Let’s say we have 15M vectors to store in the database. If we want to run the database by ourselves, we need a virtual machine. And in order to have a good performance the whole index needs to be kept in RAM. Let’s pick a memory optimized Ebdsv5 VM with 256GB of RAM. The result is Azure Cosmos DB comes out to $85/month and the VM option is $1,950/month. Our investigation showed that storing similar amounts of data/workload resulted in around 20 times lower cost when comparing a virtual-machine-based solution to Azure Cosmos DB.

A comparison of cost comparison between Ebdsv5 VM vs Azure Cosmos DB

Endless possibilities with Transformers.js

As we considered JavaScript-based solutions, the release of transformers.js was a game-changer for us. Since this release, machine learning is no longer mainly a Python thing. There are many models to choose from when it comes to generating embeddings. And the best part is, we can run it in a browser! The sample code below shows how easy it is to extract embeddings from an image using the transfromer.js library.

Code

While searching through dozens of vectors is not a heavy task, doing it at scale requires compute power. And here comes Azure Cosmos DB, ready to store and index a huge number of vectors. It’s also pretty simple thanks to official Azure SDK:

Code

Since we’re talking about handling vectors on a large scale, it’s worth mentioning that Azure Cosmos DB also supports bulk operations, which we found really useful. It’s also really straightforward to write queries.

Code

Maybe it’s obvious, but the VectorDistance function can be used in the WHERE clause as well as in the SELECT and ORDER BY clauses. Knowing the exact distance for each returned record allows us to further improve the user experience.

Easy control with autoscaling

While the RU (Request Units) term on Azure is quite technical and sometimes you have to dig a bit to find the answer related to it, Microsoft technicians came up with a higher-level setting that allows easy control and understanding of usage and its cost. You can easily set the throughput and see the expected maximum cost. Everything else looks familiar if you’ve ever browsed Metrics or Insights in the Azure Portal.

Cosmos DB Autoscaling

OCR rounds out our app

Right now, we’re working on a prototype for Record Scanner’s fourth scanning capability—the inner center label of records. Sometimes records can share the same cover art, but different releases contain different information on the inner label. Our prototype for this scanner uses OCR to recognize the text and numbers and we’ve already picked the best model to extract embeddings.

With Azure Cosmos DB powering our search functions, we’re ready to push Record Scanner even further.

Image of Record Scanner App entering a vinyl album cover

See Record Scanner in action

Download on the App Store - Apple A black and white sign with white text

AI-generated content may be incorrect.

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.

To stay in the loop on Azure Cosmos DB updates, follow Azure Cosmos DB on XYouTube, and LinkedIn.

Author

Azure Cosmos DB Team
Azure Cosmos DB Team

Azure Cosmos DB is a fully managed NoSQL, relational, and vector database. It offers single-digit millisecond response times, automatic and instant scalability, along with guaranteed speed at any scale. Business continuity is assured with SLA-backed availability and enterprise-grade security.

0 comments