Tokens

So, vector databases use lists of numbers called vectors.

We use human language.

LLMs, like ChatGPT, let us type in our human language.

So, how do vectors come in?

Well, LLMs need to translate text into vectors.

To do that, they break what we type into them into smaller pieces. These pieces are called “tokens”.

For example, the search “dogs that need a lot of space” becomes: [“dogs”, “that”, “need”, “a”, “lot”, “of”, “space”]

Each of these tokens needs to be translated into a numeric representation - a vector.

They each get their own place in the multi-dimensional space of the vector database. Sort of like latitude and longitude.

“Dog” might live at [0.1] “Need” might live at [3.2] “Space” might live at [9.0]

Those numbers are all made up to illustrate the concept.

The important thing to understand is they all get an “address” in the vector store.

Where things get really interesting is - vectors don’t hold just words. They can also embed meaning.