Brief:

Before you read this article, you can skim over the page:

Attention Structure


Self-Attention Structure


Multi-Head Attention Structure


Reference:

自然语言处理中的自注意力机制(Self-attention Mechanism)

Self Attention和Multi-Head Attention的原理和实现_陈建驱的博客-CSDN博客_multihead self-attention

Attention机制详解(二)--Self-Attention与Transformer