Revolutionizing AI Search with Weaviate: An Interview with CEO Bob van Luijt

John Maeda

July 4th, 20230 2

To use Weaviate with Semantic Kernel, visit the sample notebook on the Semantic Kernel GitHub repo. And if you like the repo, please give it a star!

Semantic Kernel Repo

In this interview, Bob van Luijt, CEO and co-founder of Weaviate, shares his journey from starting an internet business at a young age to co-founding Weaviate. Bob discusses the inspiration behind Weaviate, its integration with AI SDKs like Semantic Kernel, and its role in the future of LLMs (Large Language Models). He explains how Weaviate’s vector database differentiates itself in the market, provides examples of companies benefiting from Weaviate’s capabilities, and highlights the challenges faced during its development. Bob envisions Weaviate as a transformative platform for data interaction and intelligence, shaping the future of LLMs and enabling generative AI.

“I started my first internet business when I was 15. Later, I went to study music in The Netherlands and the US while running my software business on the side.” —Bob van Luijt, CEO of Weaviate

SK: Can you tell us about your background and how it led you to co-found Weaviate?

Bob van Luijt: From a young age, I’ve had two major interests, software and art (mostly music). Because I’m of the generation that grew up with the internet (I was born in ’85), I started my first internet business when I was 15. This wasn’t a very fancy business (mostly small companies I knew who needed websites), but I never left the industry. I went to study music in The Netherlands and the US while running my software business on the side. Years later, mid-20s, my interest in the business side of software grew as well. When I saw Sam Ramji (back then, the CEO of Cloud Foundry) talk about open-source business models, I knew that this was the thing I wanted to do, create an open-source software businesses.

SK: What is the main insight or idea that inspired you to create Weaviate?

BvL: This dates back to GloVe and FastText. I very well remember when I saw the first demo based on GloVe back in 2016. This had a huge impact on me because you could see the first glimpse of the magic of machine learning (that is, by doing simple distance calculations, we can capture the meaning of -back then- single words). The idea for Weaviate was simple, if you take a paragraph of text, you can take all the individual words and calculate a centroid to represent the paragraph in vector space. A couple of years it was my co-founder Etienne who proposed the idea of building an end-to-end database where the vector embeddings were a first-class citizen with a purpose build vector index (i.e., the ANN index). Today it’s hard to believe, but while Etienne took over the development of the database, I was traveling around in the US, EU, and even Japan to showcase the capabilities of Weaviate, trying to convince people that vector embeddings or -more specifically- machine learning was the future of search.

It was a Weaviate open-source community member who submitted the pull request for the Semantic Kernel integration.

SK: Can you explain how Weaviate integrates with AI SDKs like Semantic Kernel?

BvL: One thing I’ve learned during my time at Weaviate is that when a new category emerges (in this case, LLMs, MLLMs (Multimodal Large Language Models) and vector databases), developers need the tooling (SDKs, frameworks, etc) to interact with the models and infrastructure. For many developers, an AI SDK like Semantic Kernel is their point of entry into the new category. When they want persistent storage for their embeddings, an out-of-the-box integration is a welcome removal of abstractions. So, to answer your question, that’s exactly how Weaviate integrates with Semantic Kernel; it’s one of the persistent data stores.

BvL: On a side note, this also shows the beauty of both Semantic Kernel and Weaviate being open-source. It was a Weaviate community member who submitted the pull request for the integration.

As an investor once told me: “creating a database takes you eight years; the first week you spend on designing the API, the rest on making the actual database.” We can confirm that this is true!

SK: What role do you see Weaviate playing in the future of LLMs?

BvL: Machine learning models are stateless. This means that the knowledge captured in the weights lacks the data of the present moment. A vector database like Weaviate is stateful and directly integrates with generative machine learning models (for vector embeddings, text, or any other modality). This comes with three big upsides. First, you -often- don’t have to fine-tune because you can directly inject the data into the prompt. Secondly, you can easily store large amounts of proprietary data in the generative models (this is often called: “the memory of the AI”). Thirdly, you can create generative feedback loops back into the database, a concept we have not seen before.

SK: Can you share some concrete examples of how companies are using Weaviate to solve their data challenges?

BvL: We have seen hundreds of use cases in different industries leveraging (hybrid) vector search or generative search capabilities Weaviate has. Two examples of use cases I like are Instabase and NASA. The former shows how tech companies are embracing the capabilities LLMs and vector databases bring to their products, and the latter shows how a vector DB like Weaviate allows organizations to solve existing data challenges in innovative ways.

Weaviate is built for scale with many enterprise-ready features, including -but certainly not limited to- hybrid search, filtering, data storage, cross-references, multitenancy, replication, and many more features.

SK: How does Weaviate differentiate itself from other vector search engines in the market? BvL: There are a few differentiating factors. First of all, Weaviate is open source; the business we’ve built on top of this is both SaaS and hybrid SaaS (where customers run Weaviate inside their VPC). Secondly, Weaviate has a rich modular ecosystem integrating many different ML-model providers. Thirdly, Weaviate is built for scale with many enterprise-ready features, including -but certainly not limited to- hybrid search, filtering, data storage, cross-references, multitenancy, replication, and many more features. We aim to become a scalable, feature-rich one-stop-shop for developers looking for an enterprise-ready vector database.

SK: Can you share any stories of failures or challenges you faced while developing Weaviate?

BvL: Challenges emerged over two axes. First of all, the technology. To put it simply, building a database (or any infrastructure for that matter) is very, very, very hard. As an investor once told me: “creating a database takes you eight years; the first week you spend on designing the API, the rest on making the actual database.” We can confirm that this is true! On top of this, we are not creating a better, faster, cheaper version of an existing database; the whole category of vector DBs is completely new as well! Secondly, the time it takes to promote a new category, I’ve been doing a lot of evangelizing of the space, especially in the early days it was very hard to find people building actual meaningful use cases. I remember that we started to see the first production use cases in 2021, like this one from Keenious.

I aim to enjoy interacting with our amazing community (made out of our; users + customers + colleagues) every day and see how I can contribute to Weaviate now without dwindling too much on the past.

SK: How do you envision the future of Weaviate and its impact on the industry?

BvL: I believe that Weaviate has the potential to revolutionize how companies interact with and derive insights from their data. As the adoption of AI and machine learning continues to grow, Weaviate can become a fundamental component of intelligent systems, enabling them to understand and process natural language and make more informed decisions. I envision Weaviate becoming the go-to platform for companies looking to leverage the power of vector search technology.

SK: What qualities or unique expertise do you bring to the field of technology entrepreneurship and new media art?

BvL: This is a surprisingly difficult question to answer, mostly because all the amazing things that are currently happening with Weaviate and AI as an industry at large have quite a humbling effect on me. But if I had to pick three things, I would say: relentless curiosity to make things, presence, and grid. The first one is something I’m born with; I simply like to create things with the tools I have a talent for. Presence is the most difficult one, but because everything is moving so fast, I’m trying to stay in the now; I aim to enjoy interacting with our amazing community (made out of our; users + customers + colleagues) every day and see how I can contribute to Weaviate now without dwindling too much on the past. Grid is easy with presence; you move forward one step at a time.

SK: What are the key takeaways you would like readers to have about you and your relationship to the integration of Weaviate with AI SDKs and its role in the future of LLMs?

BvL: I want readers to understand that Weaviate is not just a search engine but a powerful tool that can unlock the potential of AI and machine learning models. By integrating with AI SDKs and providing a scalable infrastructure, Weaviate enables companies to leverage these models in their own applications. It has the potential to shape the future of LLM AIs by enabling them to process and understand vast amounts of unstructured data. Everything started with the search engine but is now evolving into core infrastructure for generative AI together with all these amazing AI SDKs.

About Bob van Luijt

Bob van Luijt is a technology entrepreneur, technologist, and new media artist from the Netherlands. He is the co-founder of Weaviate and the chairman of the Creative Software Foundation. In March 2016 Van Luijt started the open source vector search engine Weaviate.He has published and lectured about (open-source) software business models and the positioning of broadly applicable infrastructure software (e.g., databases and search engines). During a presentation for TEDxUniversiteitVanAmsterdam he shared his ideas about how the evolvement of language impacts ideas in software development.