QWEN 2.5模型结构解析与代码解读

最新推荐文章于 2026-04-14 09:15:04 发布

原创

最新推荐文章于 2026-04-14 09:15:04 发布 · 4.2k 阅读

标签

#人工智能

阿里开源了一系列很好的大模型，其中QWEN 2.5系列的大模型性能很好，DeepSeek也因为QWEN的优良性能，而选择了该系列模型进行知识蒸馏。

我对Qwen模型的源码也进行了研究，以进一步了解大模型发展的最新技术。从架构上，Qwen2.5也是采用了基于Transformer的Decoder Only的架构，引入了GQA分组查询，SwiGLU激活，RoPE旋转位置编码，QKV偏置以及RMSNorm正则化等技术。

以Qwen 2.5 0.5B参数的模型为例，以下代码可以打印该模型的架构：

from transformers import AutoModel, AutoConfig


model_path = '../models/Qwen2.5-0.5B'
# 从本地加载（需确保存在 config.json 和 model.safetensors）
config = AutoConfig.from_pretrained(model_path)
model = AutoModel.from_pretrained(
    model_path, 
    config=config,
    use_safetensors=True  # 强制使用 safetensors 格式
)

print(model)

模型结构如下：

Qwen2Model(
  (embed_tokens): Embedding(151936, 896)
  (layers): ModuleList(
    (0-23): 24 x Qwen2DecoderLayer(
      (self_attn): Qwen2SdpaAttention(
        (q_proj): Linear(in_features=896, out_features=896, bias=True)
        (k_proj): Linear(in_features=896, out_features=128, bias=True)
        (v_proj): Linear(in_features=896, out_features=128, bias=True)
        (o_proj): Linear(in_features=896, out_features=896, bias=False)
        (rotary_emb): Qwen2RotaryEmbedding()
      )
      (mlp): Qwen2MLP(
        (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
        (up_proj): Linear(in_features=896, out_features=4864, bias=False)
        (down_proj): Linear(in_features=4864, out_features=896, bias=False)
        (act_fn): SiLU()
      )
      (input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
      (post_attention_layernorm): Qwen2RMSNorm((896,), eps=1