Brief:
Before you read this article, you can skim over the page:
Attention Structure
Reference:
自然语言处理中的自注意力机制(Self-attention Mechanism)
Self Attention和Multi-Head Attention的原理和实现_陈建驱的博客-CSDN博客_multihead self-attention
Attention机制详解(二)--Self-Attention与Transformer