【大模型】8.5具备会话历史感知能力的检索增强生成（RAG）智能问答系统

最新推荐文章于 2026-06-13 13:48:05 发布

原创最新推荐文章于 2026-06-13 13:48:05 发布 · 406 阅读

14 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#langchain #人工智能

大模型应用开发专栏收录该内容

20 篇文章

订阅专栏

1. 代码功能简要说明

该代码基于LangChain框架构建具备会话历史感知能力的检索增强生成（RAG）智能问答系统，核心能力与流程如下：

数据层：从指定网页（LilianWeng的Agent主题博客）爬取文本，切割为短片段后转为向量，存入Chroma轻量级向量数据库；
检索层：构建「历史感知检索器」，能结合会话历史将用户的模糊追问（如“What are common ways of doing it?”）重构为可独立检索的完整问题（如“What are common ways of Task Decomposition?”）；
问答层：整合检索器与会话历史管理，基于向量库的检索结果和会话上下文生成精准回答，支持通过session_id隔离不同用户的对话历史，最终实现“基于网页文档、带上下文关联”的智能问答。

2. 带逐行详细注释的完整代码

# 导入os库：1.配置网络代理（解决国内访问OpenAI/网页的网络限制） 2.配置LangChain环境变量
import os
# 导入bs4（BeautifulSoup）：用于解析网页HTML，过滤指定类名的内容（仅保留博客核心内容）
import bs4

# 导入create_stuff_documents_chain：核心RAG组件，将检索到的文档填充到提示模板，串联模型生成回答
from langchain.chains.combine_documents import create_stuff_documents_chain
# 导入create_history_aware_retriever：构建「历史感知检索器」，能结合会话历史重构用户模糊问题
from langchain.chains.history_aware_retriever import create_history_aware_retriever
# 导入create_retrieval_chain：整合检索器和问答链，实现“检索→问答”的完整RAG流程
from langchain.chains.retrieval import create_retrieval_chain
# 导入Chroma：LangChain适配的Chroma向量数据库，用于存储/检索文本向量
from langchain_chroma import Chroma
# 导入WebBaseLoader：网页文档加载器，爬取指定URL的文本内容
from langchain_community.document_loaders import WebBaseLoader
# 导入提示模板相关：创建带变量/会话历史占位符的提示词模板
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
# 导入RunnableWithMessageHistory：为链添加会话历史管理能力，支持session_id隔离
from langchain_core.runnables import RunnableWithMessageHistory
# 导入RecursiveCharacterTextSplitter：递归字符文本切割器，拆分长文本为短片段（避免向量检索语义割裂）
from langchain_text_splitters import RecursiveCharacterTextSplitter
# 导入ChatMessageHistory：单会话的消息历史存储类，封装用户/助手的聊天记录
from langchain_community.chat_message_histories import ChatMessageHistory
# 导入OpenAI相关组件：ChatOpenAI（聊天模型）、OpenAIEmbeddings（文本嵌入模型，转文本为向量）
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# 配置HTTP代理：127.0.0.1:7890是代理工具的本地端口，确保访问外网网页/OpenAI API
os.environ['http_proxy'] = '127.0.0.1:7890'
# 配置HTTPS代理：网页/OpenAI API基于HTTPS，需配置该代理确保请求正常
os.environ['https_proxy'] = '127.0.0.1:7890'

# 开启LangChain Tracing V2：追踪链的执行过程（检索、重构问题、生成回答），便于调试
os.environ["LANGCHAIN_TRACING_V2"] = "true"
# 配置LangChain项目名称：追踪数据归类到该项目，便于管理不同应用
os.environ["LANGCHAIN_PROJECT"] = "LangchainDemo"
# 配置LangChain API Key：认证LangChain Smith服务（追踪功能必需），替换为自己的密钥
os.environ["LANGCHAIN_API_KEY"] = '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'
# os.environ["TAVILY_API_KEY"] = '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'  # 本代码未使用Tavily搜索，注释掉

# ===================== 核心步骤1：初始化OpenAI聊天模型 =====================
# 聊天机器人案例
# 创建ChatOpenAI模型实例：指定gpt-4-turbo模型，适配LangChain调用规范
model = ChatOpenAI(model='gpt-4-turbo')

# ===================== 核心步骤2：加载网页文档数据 =====================
# 1、加载数据: 一篇博客内容数据
# 初始化WebBaseLoader：爬取指定URL的网页内容
# - web_paths：待爬取的网页URL列表（本案例仅爬取一篇Agent主题博客）
# - bs_kwargs：BeautifulSoup解析参数，parse_only指定仅解析class为post-header/post-title/post-content的标签
#   作用：过滤掉网页导航栏、广告等无关内容，只保留博客核心文本
loader = WebBaseLoader(
    web_paths=['https://lilianweng.github.io/posts/2023-06-23-agent/'],
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(class_=('post-header', 'post-title', 'post-content'))
    )
)

# 执行加载：爬取网页内容并转为LangChain的Document对象列表
docs = loader.load()

# 测试加载结果（注释）：查看加载的文档数量和内容
# print(len(docs))
# print(docs)

# ===================== 核心步骤3：切割长文本为短片段 =====================
# 2、大文本的切割
# 示例文本（注释）：用于理解文本切割的作用
# text = "hello world, how about you? thanks, I am fine.  the machine learning class. So what I wanna do today is just spend a little time going over the logistics of the class, and then we'll start to talk a bit about machine learning"

# 初始化递归字符文本切割器：
# - chunk_size=1000：每个文本片段的最大字符数（避免单片段过长，超出模型上下文窗口）
# - chunk_overlap=200：相邻片段的重叠字符数（避免切割导致语义割裂，如一句话被切到两个片段）
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

# 执行切割：将加载的长文档拆分为短文本片段（Document对象列表）
splits = splitter.split_documents(docs)

# ===================== 核心步骤4：将文本片段存入Chroma向量库 =====================
# 2、存储（注：原代码注释标错，实际是步骤4）
# 初始化Chroma向量库：将切割后的文本片段转为向量并存储
# - documents=splits：待存储的文本片段列表
# - embedding=OpenAIEmbeddings()：使用OpenAI的嵌入模型（text-embedding-ada-002）将文本转为1536维向量
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# ===================== 核心步骤5：创建基础检索器 =====================
# 3、检索器
# 将Chroma向量库转为LangChain标准检索器：默认返回相似度最高的片段，支持后续参数调整
retriever = vectorstore.as_retriever()

# ===================== 核心步骤6：构建基础RAG问答链 =====================
# 整合

# 创建一个问题的模板：核心是“仅基于检索到的上下文回答问题”，避免模型编造答案
system_prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer 
the question. If you don't know the answer, say that you 
don't know. Use three sentences maximum and keep the answer concise.\n

{context}  # 变量：检索到的博客文本片段（上下文）
"""

# 构建提示模板：包含系统指令+会话历史占位符+用户问题
# - ("system", system_prompt)：系统指令，指定回答规则和上下文填充位置
# - MessagesPlaceholder("chat_history")：会话历史占位符，执行时填充该session_id的聊天记录
# - ("human", "{input}")：用户问题占位符，接收用户输入
prompt = ChatPromptTemplate.from_messages(  # 提问和回答的 历史记录  模板
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),  # 会话历史占位符，关联上下文
        ("human", "{input}"),
    ]
)

# 得到基础问答链：create_stuff_documents_chain将检索到的上下文填充到system_prompt，串联模型生成回答
# 逻辑：上下文 → 提示模板 → 模型 → 回答
chain1 = create_stuff_documents_chain(model, prompt)

# 测试基础RAG链（注释）：直接整合检索器和问答链，无历史感知能力
# chain2 = create_retrieval_chain(retriever, chain1)
# resp = chain2.invoke({'input': "What is Task Decomposition?"})
# print(resp['answer'])

'''
注意：
一般情况下，我们构建的链（chain）直接使用输入问答记录来关联上下文。但在此案例中，查询检索器也需要对话上下文才能被理解。

解决办法：
添加一个子链(chain)，它采用最新用户问题和聊天历史，并在它引用历史信息中的任何信息时重新表述问题。这可以被简单地认为是构建一个新的“历史感知”检索器。
这个子链的目的：让检索过程融入了对话的上下文。
'''

# ===================== 核心步骤7：构建历史感知检索器 =====================
# 创建一个子链：用于重构用户模糊问题，生成可独立检索的问题
# 子链的提示模板：指定“仅重构问题，不回答”，核心是结合历史生成独立问题
contextualize_q_system_prompt = """Given a chat history and the latest user question 
which might reference context in the chat history, 
formulate a standalone question which can be understood 
without the chat history. Do NOT answer the question, 
just reformulate it if needed and otherwise return it as is."""

# 构建重构问题的提示模板：系统指令+会话历史+用户问题
retriever_history_temp = ChatPromptTemplate.from_messages(
    [
        ('system', contextualize_q_system_prompt),
        MessagesPlaceholder('chat_history'),  # 会话历史占位符，用于理解用户追问的上下文
        ("human", "{input}"),
    ]
)

# 创建历史感知检索器：
# 逻辑：用户模糊问题 → 结合历史重构为独立问题 → 调用基础检索器 → 返回相关片段
history_chain = create_history_aware_retriever(model, retriever, retriever_history_temp)

# ===================== 核心步骤8：初始化会话历史存储 =====================
# 保持问答的历史记录：字典store用于存储所有用户的会话历史
# key：session_id（会话唯一标识）；value：ChatMessageHistory对象（存储该用户的历史消息）
store = {}

# 定义会话历史获取函数（LangChain要求的固定格式）：根据session_id返回对应的会话历史对象
def get_session_history(session_id: str):
    # 如果session_id不在store中（新用户），创建新的ChatMessageHistory对象并存入store
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    # 返回该session_id对应的会话历史对象
    return store[session_id]

# ===================== 核心步骤9：构建完整的历史感知RAG链 =====================
# 创建父链chain: 把历史感知检索器和基础问答链整合，实现“重构问题→检索→回答”的完整流程
chain = create_retrieval_chain(history_chain, chain1)

# 为链添加会话历史管理能力：
# - chain：完整的RAG链
# - get_session_history：会话历史获取函数
# - input_messages_key='input'：用户输入的key（对应提示模板的{input}）
# - history_messages_key='chat_history'：会话历史的key（对应提示模板的chat_history）
# - output_messages_key='answer'：回答输出的key
result_chain = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key='input',
    history_messages_key='chat_history',
    output_messages_key='answer'
)

# ===================== 核心步骤10：测试带历史感知的RAG问答 =====================
# 第一轮对话：提问Task Decomposition的定义（无上下文依赖）
print('=== 第一轮对话（session_id=zs123456） ===')  # 优化print：标注会话ID
resp1 = result_chain.invoke(
    {'input': 'What is Task Decomposition?'},
    config={'configurable': {'session_id': 'zs123456'}}  # 指定会话ID，隔离不同用户
)
print(f'用户问题：What is Task Decomposition?')  # 优化print：展示用户问题
print(f'模型回答：{resp1["answer"]}\n')  # 优化print：展示模型回答，换行分隔

# 第二轮对话：追问“它的常见方法有哪些？”（依赖第一轮上下文，需重构问题）
print('=== 第二轮对话（session_id=ls123456） ===')  # 优化print：标注会话ID
resp2 = result_chain.invoke(
    {'input': 'What are common ways of doing it?'},
    config={'configurable': {'session_id': 'ls123456'}}  # 新会话ID，与第一轮隔离
)
print(f'用户问题：What are common ways of doing it?')  # 优化print：展示用户问题
print(f'模型回答：{resp2["answer"]}')  # 优化print：展示模型回答

3. 无任何注释的代码版本

import os
import bs4
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.history_aware_retriever import create_history_aware_retriever
from langchain.chains.retrieval import create_retrieval_chain
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableWithMessageHistory
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

os.environ['http_proxy'] = '127.0.0.1:7890'
os.environ['https_proxy'] = '127.0.0.1:7890'

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "LangchainDemo"
os.environ["LANGCHAIN_API_KEY"] = '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'

model = ChatOpenAI(model='gpt-4-turbo')

loader = WebBaseLoader(
    web_paths=['https://lilianweng.github.io/posts/2023-06-23-agent/'],
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(class_=('post-header', 'post-title', 'post-content'))
    )
)

docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

splits = splitter.split_documents(docs)

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

system_prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer 
the question. If you don't know the answer, say that you 
don't know. Use three sentences maximum and keep the answer concise.\n

{context}
"""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

chain1 = create_stuff_documents_chain(model, prompt)

contextualize_q_system_prompt = """Given a chat history and the latest user question 
which might reference context in the chat history, 
formulate a standalone question which can be understood 
without the chat history. Do NOT answer the question, 
just reformulate it if needed and otherwise return it as is."""

retriever_history_temp = ChatPromptTemplate.from_messages(
    [
        ('system', contextualize_q_system_prompt),
        MessagesPlaceholder('chat_history'),
        ("human", "{input}"),
    ]
)

history_chain = create_history_aware_retriever(model, retriever, retriever_history_temp)

store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

chain = create_retrieval_chain(history_chain, chain1)

result_chain = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key='input',
    history_messages_key='chat_history',
    output_messages_key='answer'
)

print('=== 第一轮对话（session_id=zs123456） ===')
resp1 = result_chain.invoke(
    {'input': 'What is Task Decomposition?'},
    config={'configurable': {'session_id': 'zs123456'}}
)
print(f'用户问题：What is Task Decomposition?')
print(f'模型回答：{resp1["answer"]}\n')

print('=== 第二轮对话（session_id=ls123456） ===')
resp2 = result_chain.invoke(
    {'input': 'What are common ways of doing it?'},
    config={'configurable': {'session_id': 'ls123456'}}
)
print(f'用户问题：What are common ways of doing it?')
print(f'模型回答：{resp2["answer"]}')

4. 核心知识点详解（系统梳理+表格）

4.1 核心组件/概念对照表

核心组件/概念	导入路径	通俗解释	核心用法	本案例作用
WebBaseLoader	langchain_community.document_loaders.WebBaseLoader	网页文档加载器，爬取指定URL文本，支持过滤无关内容	`WebBaseLoader(web_paths=[URL], bs_kwargs=过滤参数).load()`	爬取Agent博客内容，仅保留核心文本（过滤导航/广告）
RecursiveCharacterTextSplitter	langchain_text_splitters.RecursiveCharacterTextSplitter	递归字符切割器，拆分长文本为短片段，避免语义割裂	`RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200).split_documents(文档)`	将博客长文本拆分为1000字符/段、重叠200字符的片段，适配向量检索
Chroma	langchain_chroma.Chroma	轻量级向量数据库，存储文本向量并提供相似度检索	`Chroma.from_documents(documents=片段, embedding=OpenAIEmbeddings())`	将切割后的文本转为向量并存储，为检索提供数据基础
create_stuff_documents_chain	langchain.chains.combine_documents.create_stuff_documents_chain	基础RAG问答链，将检索上下文填充到提示模板生成回答	`create_stuff_documents_chain(模型, 提示模板)`	构建“上下文+历史+问题→回答”的基础问答逻辑
create_history_aware_retriever	langchain.chains.history_aware_retriever.create_history_aware_retriever	历史感知检索器，重构模糊追问为独立可检索问题	`create_history_aware_retriever(模型, 基础检索器, 重构模板)`	解决“What are common ways of doing it?”无上下文的问题，重构为包含Task Decomposition的检索问题
create_retrieval_chain	langchain.chains.retrieval.create_retrieval_chain	顶层RAG链，整合“检索→问答”流程	`create_retrieval_chain(检索器, 问答链)`	串联“重构问题→检索→回答”的完整RAG流程
RunnableWithMessageHistory	langchain_core.runnables.RunnableWithMessageHistory	会话历史包装器，支持session_id隔离不同用户对话	`RunnableWithMessageHistory(链, 历史函数, input/history/output_key)`	为RAG链添加会话管理，通过session_id隔离对话
ChatMessageHistory	langchain_community.chat_message_histories.ChatMessageHistory	单会话消息存储类，封装用户/助手消息	`ChatMessageHistory()`（创建空历史）	存储每个session_id的会话历史，实现上下文关联
MessagesPlaceholder	langchain_core.prompts.MessagesPlaceholder	提示模板中的历史占位符，自动填充会话记录	`MessagesPlaceholder("chat_history")`	在提示模板中预留历史位置，让模型/检索器感知上下文

4.2 关键知识点深度解释

（1）历史感知检索器的核心价值

普通RAG的痛点：用户追问（如“它的常见方法有哪些？”）无上下文，检索器无法理解“它”指什么，导致检索结果错误。
历史感知检索器的解决逻辑：

（2）文本切割的关键参数

chunk_size=1000：单片段最大字符数，需适配模型上下文窗口（gpt-4-turbo支持128k，但小片段检索更精准）；
chunk_overlap=200：相邻片段重叠字符数，避免“一句话被切到两个片段”导致的语义丢失（如“Task Decomposition is xxx”不会被拆分）。

（3）会话历史管理的核心参数

input_messages_key='input'：指定用户输入的key（对应{'input': '问题'}）；
history_messages_key='chat_history'：指定会话历史的key（对应提示模板的chat_history占位符）；
output_messages_key='answer'：指定回答输出的key（对应resp['answer']）；
session_id：隔离不同用户的会话，如zs123456和ls123456的历史互不干扰。

5. print函数修改说明

5.1 优化对比与原因

原代码print	优化后print	核心改进
`print(resp1['answer'])` `print(resp2['answer'])`	`print('=== 第一轮对话（session_id=zs123456） ===')` `print(f'用户问题：xxx')` `print(f'模型回答：xxx')`	1. 标注会话ID，区分不同用户/轮次的对话； 2. 展示“用户问题→模型回答”的对应关系，小白易理解； 3. 换行分隔两轮对话，提升输出可读性

5.2 输出示例（参考）

=== 第一轮对话（session_id=zs123456） ===
用户问题：What is Task Decomposition?
模型回答：Task Decomposition is the process of breaking down complex tasks into smaller, manageable sub-tasks to simplify execution. It is a core technique in agent systems to handle complicated objectives efficiently. It helps reduce the cognitive load of the agent and improve task completion accuracy.

=== 第二轮对话（session_id=ls123456） ===
用户问题：What are common ways of doing it?
模型回答：Common ways of Task Decomposition include hierarchical decomposition, which breaks tasks into nested sub-tasks; functional decomposition, which splits tasks by different functions; and temporal decomposition, which divides tasks based on time sequence. It may also involve goal-oriented decomposition to align sub-tasks with final objectives.

总结（关键点回顾）

核心能力：历史感知检索器解决RAG“模糊追问无上下文”的痛点，实现精准检索；
核心流程：网页加载→文本切割→向量存储→历史感知检索→问答→会话历史管理；
关键组件：
- create_history_aware_retriever是实现历史感知的核心；
- RunnableWithMessageHistory是会话隔离、上下文关联的关键；
实用技巧：
- 文本切割时设置chunk_overlap避免语义割裂；
- 通过session_id隔离不同用户的会话，避免历史混淆；
- 提示模板中明确“仅基于上下文回答”，避免模型编造答案。