Transformers

Yeah, that pun was bad.

Attention became the center of a powerful new model design: the Transformer Architecture.

With Transformers, attention evolved into something called Multi-Head Attention. This means models can pay attention to multiple things at the same time.

Imagine a book club where everyone reads the same book at the same time. Each person takes notes on something different:

One focuses on the setting
Another on the main character
One on the main conflict
And so on…

Then they put their notes together to understand the story from different perspectives.

This gives the model a richer understanding of the content.

Transformers work like that. And, they can do it in an instant.

That’s why ChatGPT seems to know exactly what you’re talking about, as you type.

It can respond with on-topic. It stays on track. Even though it can hallucinate, it can also give answers that feel helpful - human.

That’s why transformers are the foundation of today’s Large Language Models (LLMs).