Attention Mechanism
So what was this big breakthrough?
Early models read each word one at a time. It just processed them in order. Every word was equal.
But, when you and I read, we don’t process words one at a time. We look at the whole sentence and understand the context.
Researchers realized that - models could benefit from learning to “pay attention” to the most important parts of a sentence too.
Kinda like giving them a highlighter.
This idea is called the attention mechanism.
By allowing models to pay attention - they can weigh which words are most relevant as they read.
This led to an even more transformative development in language models. That pun will hit in a second.