Embedding vector caching

Craig Dunn

June 29th, 20230 0

Hello prompt engineers,

A few weeks ago I added a custom datastore (the droidcon SF schedule) to the Jetchat OpenAI chat sample. One of the ‘hacks’ I used was generating the embeddings used for similarity comparisons on every startup and caching in memory:

// for each session
val embeddingResult = openAI.embeddings(embeddingRequest) // API request
val vector = embeddingResult.embeddings[0].embedding.toDoubleArray()
vectorCache[session.key] = vector

This results in ~70 web requests each time, plus the (albeit low) monetary cost of the OpenAI embeddings endpoint. It is a fast and easy way to build a demo, but in a production application you would want to avoid both the startup delay and the cost!

In this post I’ll discuss my first attempt building a vector cache on-device, and then some other alternatives to consider when building production-quality apps.

Hardcoding vectors

The solution I tried to improve performance and reduce cost was to hardcode the embeddings. In the GitHub repo you can check out the testing-vectors branch where I experimented with this idea.

The first step was to edit the DroidconEmbeddingsWrapper.kt file to generate a hardcoded class with all the vectors.

// this code emits a file that is itself valid Kotlin code
var fileOut = StringBuilder()
fileOut.append("""package com.example.compose.jetchat.data

/** GENERATED CODE: do not edit */
class DroidconSessionVectors {
    companion object {
        var vectorCache: Map<String, DoubleArray> = mapOf(""")

    for (session in DroidconSessionData.droidconSessions) {
       // OMITTED EMBEDDING REQUEST
    }
    fileOut.append("\r\n            \"$vectorKey\" to doubleArrayOf($vectorString),")
                  Log.i("ABC", "\"$vectorKey\" to doubleArrayOf($vectorString),")
              }
              fileOut.append("""
          )
      }
  }""")
writeToLocalFile("DroidconSessionVectors.kt", fileOut.toString())

The resulting file – DroidconSessionVectors.kt – looks like valid Kotlin and seemed like it would work. Each session key has an associated vector that looks like this (except that there’s thousands of elements!):

"CRAIG DUNN" to doubleArrayOf(0.0015004711, -0.015774444, 0.006049357, -0.001246154, 0.019341666, 0.022664743, -0.022352781, 0.012363204, 9.825119E-4, -0.018202325, 0.02708647, 0.03222707, -0.012180096, -3.1005498E-4, -0.011149263, 0.0071276617, 0.0040283836, -0.003821539, 0.012132623, -0.03179303, ...

I added this Kotlin file to my app project, but a compilation error occurred (MethodTooLargeException)!

Turns out that Kotlin has a 64Kb maximum function size, so these seventy vectors couldn’t all be compiled into the planned helper class. The maximum number of vectors that could be hardcoded in a single class was five! I probably could have broken it up into multiple (fourteen!) functions and still used this approach, but realistically it’s still a hack. For now, I’ve dropped this idea (the ‘serialization’ code will stay in the testing-vectors branch for reference).

The sample in GitHub continues to generate the embeddings each time, but some other approaches to cache vectors for comparison could include:

Local database
Cloud storage

Local database

A better alternative than hardcoding is probably to store vectors in a local database – either populated locally via OpenAI calls or created ahead-of-time and packaged with the app. SQLite is a common choice on mobile platforms and is readily available on Android. There is also the Room library which provides a fluent API for SQLite that supports coroutines, minimizes repetitive code, and adds support for data migrations between schema versions.

Microsoft Semantic Kernel is an AI tool that’s also open-source, and you can see their platform implements an SQLite Database connector which uses a TEXT column to store embeddings – schema is shown below:

  CREATE TABLE IF NOT EXISTS {TableName}(
       collection TEXT,
       key TEXT,
       metadata TEXT,
       embedding TEXT,
       timestamp TEXT,
       PRIMARY KEY(collection, key))

This is a generic table definition, but you could adapt the idea of storing and retrieving a vector as serialized text which you’ll read into an array of values before performing a vector similarity function. You can learn how to use SQLite on developer.android.com.

In the future there might be an Android version of this SQLite Vector Similarity Search extension – sqlite-vss – which adds the ability to store and compare vectors directly, or other Android-local options might become available.

AI in the cloud

An even better solution for productizing OpenAI solutions is probably to run them in the cloud and create an API specifically for your client-side apps.

By building your AI functionality in the cloud, you can more easily update prompts and take advantage of Azure features like vector database storage, other cognitive services, and AI safety tools and processes.

Check this blogpost on Vector Similarity Search with Azure SQL database and OpenAI for an example of how to tackle vector storage and embedding search in the cloud.

Resources and feedback

The code for this sample and the others that were presented at droidcon SF 2023 is available on GitHub.

If you have any questions, use the feedback forum or message us on Twitter @surfaceduodev.

There will be a livestream this week, covering our blog topic from last week. You can also check out the archives on YouTube.

Embedding vector caching

Hardcoding vectors

Local database

AI in the cloud

Resources and feedback

Craig Dunn Principal SW Engineer, Surface Duo Developer Experience

0 comments

Embedding vector caching

Hardcoding vectors

Local database

AI in the cloud

Resources and feedback

Craig Dunn Principal SW Engineer, Surface Duo Developer Experience

Read next

0 comments