Language Models: What Are They?

They’re computer programs designed to understand and generate human language.

But, they don’t understand language the way that people do.

What they do is - identify language patterns so that they can predict what comes next. They’re like autocomplete on a whole new level.

They use probability to make their predictions. To make sure that these predictions are relevant, the models need training.

To train a language model, we have to feed it. We feed it lots of text - books, websites, news articles, blogs, conversations. You get the idea.

The more that we feed these models, the better they get at spotting patterns in how we use language.

Because people use ChatGPT daily, it’s getting really good at spotting patterns. It uses that pattern recognition to respond to us with the best probable response.

Behind ChatGPT are Large Language Models at an insane scale. These things are HUGE.

But not all models are the same. And, that’s by design.

Not all models need to be the size of those powering ChatGPT and others like - Google Gemini.

In fact, some models are better because they’re smaller.

Let’s explore some key differences of model sizes.