Attention Mechanism

So what was this big breakthrough?

Early models read each word one at a time. It just processed them in order. Every word was equal.

But, when you and I read, we don’t process words one at a time. We look at the whole sentence and understand the context.

Researchers realized that - models could benefit from learning to “pay attention” to the most important parts of a sentence too.

Kinda like giving them a highlighter.

This idea is called the attention mechanism.

By allowing models to pay attention - they can weigh which words are most relevant as they read.

This led to an even more transformative development in language models. That pun will hit in a second.