Introduction:

Before you read the article, you can skim over the previous articles

Self-Attention & Multi-Head Attention


The Illustrated Whole structure of Transformer