为了增强 AI 模型以完成特定任务,我们需要补充特定领域的知识。
例如,客户支持聊天机器人需要与业务相关的信息,而法律机器人则依赖历史案例数据。开发人员通常使用检索增强生成 (RAG) 从数据库中获取相关知识并改进 AI 响应。然而,传统的 RAG 方法在检索过程中经常会遗漏上下文,从而导致失败。
在这篇文章中,我们介绍了“上下文检索”,这是一种使用上下文嵌入来提高检索准确性的方法,通过重新排序来减少失败。
参考链接:https://azureopenai.cloudallonline.com/?zxwsh214
微软 Azure OpenAI 企业账户接口申请
对于较大的知识库,检索增强生成 (RAG) 提供了一种可扩展的解决方案。现代 RAG 系统结合了两种强大的检索方法:
最佳 RAG 实施结合了以下两种方法:
传统 RAG 的挑战在于如何将文档拆分成更小的块以进行高效检索,有时会丢失重要的上下文。例如,考虑一个学术数据库,其中有人问你:“史密斯博士在 2021 年的主要研究重点是什么?”如果检索到的块显示“研究重点是人工智能”,它可能缺乏清晰度,没有指定史密斯博士或确切的年份,因此很难确定答案。这个问题可能会降低此类知识密集型领域的检索结果的准确性和实用性。
上下文检索通过在嵌入之前将特定于块的解释上下文添加到每个块之前来解决此问题(“上下文嵌入”)。我们将为每个块生成上下文文本。
典型的 RAG 管道通常具有以下组件。如您所见,我们有一个经过身份验证并通过内容安全系统的用户输入(在此处了解更多信息 )。下一步是基于历史对话的查询重写器,您还可以附加查询扩展以改进生成的答案。接下来我们有检索器和重新排名器。在 RAG 管道中,检索器和排名器在查找和优先考虑相关上下文方面起着至关重要的互补作用。检索器充当初始过滤器,有效地搜索大型文档集合以根据与查询的语义相似性识别潜在相关的块。常见的检索方法包括密集检索器(如基于嵌入的搜索)或稀疏检索器(如 BM25)。然后,排名器充当更复杂的第二阶段,获取检索器的候选段落并执行详细的相关性评分。排名器可以利用强大的语言模型来分析查询和每个段落之间的深层语义关系,同时考虑事实对齐、答案覆盖率和上下文相关性等因素。这种两阶段方法平衡了效率和准确性——检索器快速缩小搜索空间,而排序器对一小组有希望的候选者应用更多的计算密集型分析,以确定生成阶段最相关的上下文。
在这个例子中,我们将使用 Langchain 作为我们的框架来构建它。
import os
from typing import List, Tuple
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain_openai import AzureOpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_openai import AzureChatOpenAI
from langchain.prompts import ChatPromptTemplate
from rank_bm25 import BM25Okapi
import cohere
import logging
import time
from llama_parse import LlamaParse
from azure.ai.documentintelligence.models import DocumentAnalysisFeature
from langchain_community.document_loaders.doc_intelligence import AzureAIDocumentIntelligenceLoader
# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
load_dotenv('azure.env', override=True)
现在让我们创建一个自定义的 Retriever,并实现上下文嵌入。以下是代码。
class ContextualRetrieval:
def __init__(self):
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=100,
self.embeddings = AzureOpenAIEmbeddings(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
azure_deployment="text-embedding-ada-002",
openai_api_version="2024-03-01-preview",
azure_endpoint =os.environ["AZURE_OPENAI_ENDPOINT"]
self.llm = AzureChatOpenAI(
api_key=os.environ["AZURE_OPENAI_API_KEY"],
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
azure_deployment="gpt-4o",
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
self.cohere_client = cohere.Client(os.getenv("COHERE_API_KEY"))
def load_pdf_and_parse(self, pdf_path: str) -> str:
loader = AzureAIDocumentIntelligenceLoader(file_path=pdf_path,
api_key = os.getenv("AZURE_DOCUMENT_INTELLIGENCE_KEY"),
api_endpoint = os.getenv("AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT"),
api_model="prebuilt-layout",
api_version="2024-02-29-preview",
mode='markdown',
analysis_features = [DocumentAnalysisFeature.OCR_HIGH_RESOLUTION])
try:
documents = loader.load()
if not documents:
raise ValueError("No content extracted from the PDF.")
return " ".join([doc.page_content for doc in documents])
except Exception as e:
logging.error(f"Error while parsing the file '{pdf_path}': {str(e)}")
raise
def process_document(self, document: str) -> Tuple[List[Document], List[Document]]:
if not document.strip():
raise ValueError("The document is empty after parsing.")
chunks = self.text_splitter.create_documents([document])
contextualized_chunks = self._generate_contextualized_chunks(document, chunks)
return chunks, contextualized_chunks
def _generate_contextualized_chunks(self, document: str, chunks: List[Document]) -> List[Document]:
contextualized_chunks = []
for chunk in chunks:
context = self._generate_context(document, chunk.page_content)
contextualized_content = f"{context}\n\n{chunk.page_content}"
contextualized_chunks.append(Document(page_content=contextualized_content, metadata=chunk.metadata))
return contextualized_chunks
def _generate_context(self, document: str, chunk: str) -> str:
prompt = ChatPromptTemplate.from_template("""
You are an AI assistant specializing in document analysis. Your task is to provide brief, relevant context for a chunk of text from the given document.
Here is the document:
<document>
{document}
</document>
Here is the chunk we want to situate within the whole document:
<chunk>
{chunk}
</chunk>
Provide a concise context (2-3 sentences) for this chunk, considering the following guidelines:
1. Identify the main topic or concept discussed in the chunk.
2. Mention any relevant information or comparisons from the broader document context.
3. If applicable, note how this information relates to the overall theme or purpose of the document.
4. Include any key figures, dates, or percentages that provide important context.
5. Do not use phrases like "This chunk discusses" or "This section provides". Instead, directly state the context.
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.
Context:
""")
messages = prompt.format_messages(document=document, chunk=chunk)
response = self.llm.invoke(messages)
return response.content
def create_bm25_index(self, chunks: List[Document]) -> BM25Okapi:
tokenized_chunks = [chunk.page_content.split() for chunk in chunks]
return BM25Okapi(tokenized_chunks)
def generate_answer(self, query: str, relevant_chunks: List[str]) -> str:
prompt = ChatPromptTemplate.from_template("""
Based on the following information, please provide a concise and accurate answer to the question.
If the information is not sufficient to answer the question, say so.
Question: {query}
Relevant information:
{chunks}
Answer:
""")
messages = prompt.format_messages(query=query, chunks="\n\n".join(relevant_chunks))
response = self.llm.invoke(messages)
return response.content
def rerank_results(self, query: str, documents: List[Document], top_n: int = 3) -> List[Document]:
logging.info(f"Reranking {len(documents)} documents for query: {query}")
doc_contents = [doc.page_content for doc in documents]
max_retries = 3
for attempt in range(max_retries):
try:
reranked = self.cohere_client.rerank(
model="rerank-english-v2.0",
query=query,
documents=doc_contents,
top_n=top_n
break
except cohere.errors.TooManyRequestsError:
if attempt < max_retries - 1:
logging.warning(f"Rate limit hit. Waiting for 60 seconds before retry {attempt + 1}/{max_retries}")
time.sleep(60) # Wait for 60 seconds before retrying
else:
logging.error("Rate limit hit. Max retries reached. Returning original documents.")
return documents[:top_n]
logging.info(f"Reranking complete. Top {top_n} results:")
reranked_docs = []
for idx, result in enumerate(reranked.results):
original_doc = documents[result.index]
reranked_docs.append(original_doc)
logging.info(f" {idx+1}. Score: {result.relevance_score:.4f}, Index: {result.index}")
return reranked_docs
def expand_query(self, original_query: str) -> str:
prompt = ChatPromptTemplate.from_template("""
You are an AI assistant specializing in document analysis. Your task is to expand the given query to include related terms and concepts that might be relevant for a more comprehensive search of the document.
Original query: {query}
Please provide an expanded version of this query, including relevant terms, concepts, or related ideas that might help in summarizing the full document. The expanded query should be a single string, not a list.
Expanded query:
""")
messages = prompt.format_messages(query=original_query)
response = self.llm.invoke(messages)
return response.content
现在让我们加载一个具有上下文嵌入的示例 PDF,并为普通块和上下文感知块创建 2 个索引。
让我们定义流程查询函数
cr = ContextualRetrieval()
pdf_path = "1.pdf"
document = cr.load_pdf_with_llama_parse(pdf_path)
# Process the document
chunks, contextualized_chunks = cr.process_document(document)
# Create BM25 index
contextualized_bm25_index = cr.create_bm25_index(contextualized_chunks)
normal_bm25_index = cr.create_bm25_index(chunks)
现在让我们针对两个索引运行查询来比较结果。
def process_query(query: str, processor: AutoProcessor, model: ColPali) -> np.ndarray:
mock_image = Image.new('RGB', (224, 224), color='white')
inputs = processor(text=query, images=mock_image, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
embeddings = model(**inputs)
return torch.mean(embeddings, dim=1).float().cpu().numpy().tolist()[0]
original_query = "When does the term of the Agreement commence and how long does it last?"
print(f"\nOriginal Query: {original_query}")
process_query(cr, original_query, normal_bm25_index, chunks)
情境感知索引
original_query = "When does the term of the Agreement commence and how long does it last?"
print(f"\nOriginal Query: {original_query}")
process_query(cr, original_query, contextualized_bm25_index, contextualized_chunks)
由于上下文检索器的存在,您很可能从后者那里得到更好的答案。现在让我们根据基准来评估这一点。我们将使用 Azure AI SDK 进行 RAG 评估。首先让我们加载数据集。
您可以根据以下 jsonlines 创建您的基本事实。
{"chat_history":[],"question":"What is short-term memory in the context of the model?","ground_truth":"Short-term memory involves utilizing in-context learning to learn."}
import pandas as pd
df = pd.read_json(output_file, lines=True, orient="records")
df.head()
现在,一旦我们加载数据集,我们就可以根据标准检索策略与上下文嵌入检索策略来运行它。
normal_answers = []
contexual_answers = []
for index, row in df.iterrows():
normal_answers.append(process_query(cr, row["question"], normal_bm25_index, chunks))
contexual_answers.append(process_query(cr, row["question"], contextualized_bm25_index, contextualized_chunks))
让我们根据基本事实进行评估,在这种情况下,我使用相似度分数进行评估。您可以使用任何其他内置或自定义指标。在此处了解更多信息。
from azure.ai.evaluation import SimilarityEvaluator
# Initialzing Relevance Evaluator
similarity_eval = SimilarityEvaluator(model_config)
df["answer"] = normal_answers
df['score'] = df.apply(lambda x : similarity_eval(
response=x["answer"],
ground_truth = x["ground_truth"],
query=x["question"],
), axis = 1)
df["answer_contextual"] = contexual_answers
df['score_contextual'] = df.apply(lambda x : similarity_eval(
response=x["answer_contextual"],
ground_truth = x["ground_truth"],
query=x["question"],
), axis = 1)
如您所见,上下文嵌入增加了检索量,因此相似度得分也反映了这一点。本博文概述的上下文检索系统展示了一种复杂的文档分析和问答方法。通过集成各种 NLP 技术(例如使用 GPT-4 进行上下文化、使用 BM25 进行高效索引、使用 Cohere 模型进行重新排序以及查询扩展),系统不仅可以检索相关信息,还可以理解和综合这些信息以提供准确的答案。这种模块化架构确保了灵活性,允许随着更好的技术的出现而增强或替换单个组件。随着自然语言处理领域的不断发展,这样的系统将变得越来越重要,使大量文本在不同领域更易于访问、搜索和操作。