如何提高 RAG 准确率？附Azure OpenAI教程

作者：等保测评办理发布时间：2024-11-21

OpenAI

为了增强 AI 模型以完成特定任务，我们需要补充特定领域的知识。

例如，客户支持聊天机器人需要与业务相关的信息，而法律机器人则依赖历史案例数据。开发人员通常使用检索增强生成 (RAG) 从数据库中获取相关知识并改进 AI 响应。然而，传统的 RAG 方法在检索过程中经常会遗漏上下文，从而导致失败。

在这篇文章中，我们介绍了“上下文检索”，这是一种使用上下文嵌入来提高检索准确性的方法，通过重新排序来减少失败。

参考链接：https://azureopenai.cloudallonline.com/?zxwsh214

微软 Azure OpenAI 企业账户接口申请

对于较大的知识库，检索增强生成 (RAG) 提供了一种可扩展的解决方案。现代 RAG 系统结合了两种强大的检索方法：

使用嵌入进行语义搜索

将知识库分成可管理的部分（通常每个部分有几百个标记）将这些块转换为捕捉语义的向量嵌入将嵌入存储在向量数据库中以进行相似性搜索

使用 BM25 进行词汇搜索

基于 TF-IDF（词频-逆文档频率）原则考虑文档长度和词频饱和度擅长寻找精确匹配和特定术语

最佳 RAG 实施结合了以下两种方法：

将知识库分成几块生成 TF-IDF 编码和语义嵌入使用 BM25 和嵌入相似性运行并行搜索使用等级融合合并和重复数据删除结果在提示中包含最相关的块使用增强上下文生成响应

传统 RAG 的挑战在于如何将文档拆分成更小的块以进行高效检索，有时会丢失重要的上下文。例如，考虑一个学术数据库，其中有人问你：“史密斯博士在 2021 年的主要研究重点是什么？”如果检索到的块显示“研究重点是人工智能”，它可能缺乏清晰度，没有指定史密斯博士或确切的年份，因此很难确定答案。这个问题可能会降低此类知识密集型领域的检索结果的准确性和实用性。

上下文检索通过在嵌入之前将特定于块的解释上下文添加到每个块之前来解决此问题（“上下文嵌入”）。我们将为每个块生成上下文文本。

典型的 RAG 管道通常具有以下组件。如您所见，我们有一个经过身份验证并通过内容安全系统的用户输入（在此处了解更多信息）。下一步是基于历史对话的查询重写器，您还可以附加查询扩展以改进生成的答案。接下来我们有检索器和重新排名器。在 RAG 管道中，检索器和排名器在查找和优先考虑相关上下文方面起着至关重要的互补作用。检索器充当初始过滤器，有效地搜索大型文档集合以根据与查询的语义相似性识别潜在相关的块。常见的检索方法包括密集检索器（如基于嵌入的搜索）或稀疏检索器（如 BM25）。然后，排名器充当更复杂的第二阶段，获取检索器的候选段落并执行详细的相关性评分。排名器可以利用强大的语言模型来分析查询和每个段落之间的深层语义关系，同时考虑事实对齐、答案覆盖率和上下文相关性等因素。这种两阶段方法平衡了效率和准确性——检索器快速缩小搜索空间，而排序器对一小组有希望的候选者应用更多的计算密集型分析，以确定生成阶段最相关的上下文。

在这个例子中，我们将使用 Langchain 作为我们的框架来构建它。

import os

from typing import List, Tuple

from dotenv import load_dotenv

from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.schema import Document

from langchain_openai import AzureOpenAIEmbeddings

from langchain_community.vectorstores import FAISS

from langchain_openai import AzureChatOpenAI

from langchain.prompts import ChatPromptTemplate

from rank_bm25 import BM25Okapi

import cohere

import logging

import time

from llama_parse import LlamaParse

from azure.ai.documentintelligence.models import DocumentAnalysisFeature

from langchain_community.document_loaders.doc_intelligence import AzureAIDocumentIntelligenceLoader

# Set up logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

load_dotenv('azure.env', override=True)

现在让我们创建一个自定义的 Retriever，并实现上下文嵌入。以下是代码。

使用 Azure AI 文档智能进行 PDF 解析将文档分解为可管理的块，同时保留上下文实现复杂的文本分割和重叠，以确保在块边界处不会丢失任何信息

class ContextualRetrieval:

def __init__(self):

self.text_splitter = RecursiveCharacterTextSplitter(

chunk_size=800,

chunk_overlap=100,

self.embeddings = AzureOpenAIEmbeddings(

api_key=os.getenv("AZURE_OPENAI_API_KEY"),

azure_deployment="text-embedding-ada-002",

openai_api_version="2024-03-01-preview",

azure_endpoint =os.environ["AZURE_OPENAI_ENDPOINT"]

self.llm = AzureChatOpenAI(

api_key=os.environ["AZURE_OPENAI_API_KEY"],

azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],

azure_deployment="gpt-4o",

temperature=0,

max_tokens=None,

timeout=None,

max_retries=2,

self.cohere_client = cohere.Client(os.getenv("COHERE_API_KEY"))

def load_pdf_and_parse(self, pdf_path: str) -> str:

loader = AzureAIDocumentIntelligenceLoader(file_path=pdf_path,

api_key = os.getenv("AZURE_DOCUMENT_INTELLIGENCE_KEY"),

api_endpoint = os.getenv("AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT"),

api_model="prebuilt-layout",

api_version="2024-02-29-preview",

mode='markdown',

analysis_features = [DocumentAnalysisFeature.OCR_HIGH_RESOLUTION])

try:

documents = loader.load()

if not documents:

raise ValueError("No content extracted from the PDF.")

return " ".join([doc.page_content for doc in documents])

except Exception as e:

logging.error(f"Error while parsing the file '{pdf_path}': {str(e)}")

raise

def process_document(self, document: str) -> Tuple[List[Document], List[Document]]:

if not document.strip():

raise ValueError("The document is empty after parsing.")

chunks = self.text_splitter.create_documents([document])

contextualized_chunks = self._generate_contextualized_chunks(document, chunks)

return chunks, contextualized_chunks

def _generate_contextualized_chunks(self, document: str, chunks: List[Document]) -> List[Document]:

contextualized_chunks = []

for chunk in chunks:

context = self._generate_context(document, chunk.page_content)

contextualized_content = f"{context}\n\n{chunk.page_content}"

contextualized_chunks.append(Document(page_content=contextualized_content, metadata=chunk.metadata))

return contextualized_chunks

def _generate_context(self, document: str, chunk: str) -> str:

prompt = ChatPromptTemplate.from_template("""

You are an AI assistant specializing in document analysis. Your task is to provide brief, relevant context for a chunk of text from the given document.

Here is the document:

{document}

</document>

Here is the chunk we want to situate within the whole document:

<chunk>

{chunk}

</chunk>

Provide a concise context (2-3 sentences) for this chunk, considering the following guidelines:

1. Identify the main topic or concept discussed in the chunk.

2. Mention any relevant information or comparisons from the broader document context.

3. If applicable, note how this information relates to the overall theme or purpose of the document.

4. Include any key figures, dates, or percentages that provide important context.

5. Do not use phrases like "This chunk discusses" or "This section provides". Instead, directly state the context.

Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.

Context:

""")

messages = prompt.format_messages(document=document, chunk=chunk)

response = self.llm.invoke(messages)

return response.content

def create_bm25_index(self, chunks: List[Document]) -> BM25Okapi:

tokenized_chunks = [chunk.page_content.split() for chunk in chunks]

return BM25Okapi(tokenized_chunks)

def generate_answer(self, query: str, relevant_chunks: List[str]) -> str:

prompt = ChatPromptTemplate.from_template("""

Based on the following information, please provide a concise and accurate answer to the question.

If the information is not sufficient to answer the question, say so.

Question: {query}

Relevant information:

{chunks}

Answer:

""")

messages = prompt.format_messages(query=query, chunks="\n\n".join(relevant_chunks))

response = self.llm.invoke(messages)

return response.content

def rerank_results(self, query: str, documents: List[Document], top_n: int = 3) -> List[Document]:

logging.info(f"Reranking {len(documents)} documents for query: {query}")

doc_contents = [doc.page_content for doc in documents]

max_retries = 3

for attempt in range(max_retries):

try:

reranked = self.cohere_client.rerank(

model="rerank-english-v2.0",

query=query,

documents=doc_contents,

top_n=top_n

break

except cohere.errors.TooManyRequestsError:

if attempt < max_retries - 1:

logging.warning(f"Rate limit hit. Waiting for 60 seconds before retry {attempt + 1}/{max_retries}")

time.sleep(60) # Wait for 60 seconds before retrying

else:

logging.error("Rate limit hit. Max retries reached. Returning original documents.")

return documents[:top_n]

logging.info(f"Reranking complete. Top {top_n} results:")

reranked_docs = []

for idx, result in enumerate(reranked.results):

original_doc = documents[result.index]

reranked_docs.append(original_doc)

logging.info(f" {idx+1}. Score: {result.relevance_score:.4f}, Index: {result.index}")

return reranked_docs

def expand_query(self, original_query: str) -> str:

prompt = ChatPromptTemplate.from_template("""

You are an AI assistant specializing in document analysis. Your task is to expand the given query to include related terms and concepts that might be relevant for a more comprehensive search of the document.

Original query: {query}

Please provide an expanded version of this query, including relevant terms, concepts, or related ideas that might help in summarizing the full document. The expanded query should be a single string, not a list.

Expanded query:

""")

messages = prompt.format_messages(query=original_query)

response = self.llm.invoke(messages)

return response.content

现在让我们加载一个具有上下文嵌入的示例 PDF，并为普通块和上下文感知块创建 2 个索引。

让我们定义流程查询函数

cr = ContextualRetrieval()

pdf_path = "1.pdf"

document = cr.load_pdf_with_llama_parse(pdf_path)

# Process the document

chunks, contextualized_chunks = cr.process_document(document)

# Create BM25 index

contextualized_bm25_index = cr.create_bm25_index(contextualized_chunks)

normal_bm25_index = cr.create_bm25_index(chunks)

现在让我们针对两个索引运行查询来比较结果。

def process_query(query: str, processor: AutoProcessor, model: ColPali) -> np.ndarray:

mock_image = Image.new('RGB', (224, 224), color='white')

inputs = processor(text=query, images=mock_image, return_tensors="pt")

inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():

embeddings = model(**inputs)

return torch.mean(embeddings, dim=1).float().cpu().numpy().tolist()[0]

original_query = "When does the term of the Agreement commence and how long does it last?"

print(f"\nOriginal Query: {original_query}")

process_query(cr, original_query, normal_bm25_index, chunks)

情境感知索引

original_query = "When does the term of the Agreement commence and how long does it last?"

print(f"\nOriginal Query: {original_query}")

process_query(cr, original_query, contextualized_bm25_index, contextualized_chunks)

由于上下文检索器的存在，您很可能从后者那里得到更好的答案。现在让我们根据基准来评估这一点。我们将使用 Azure AI SDK 进行 RAG 评估。首先让我们加载数据集。

您可以根据以下 jsonlines 创建您的基本事实。

{"chat_history":[],"question":"What is short-term memory in the context of the model?","ground_truth":"Short-term memory involves utilizing in-context learning to learn."}

import pandas as pd

df = pd.read_json(output_file, lines=True, orient="records")

df.head()

现在，一旦我们加载数据集，我们就可以根据标准检索策略与上下文嵌入检索策略来运行它。

normal_answers = []

contexual_answers = []

for index, row in df.iterrows():

normal_answers.append(process_query(cr, row["question"], normal_bm25_index, chunks))

contexual_answers.append(process_query(cr, row["question"], contextualized_bm25_index, contextualized_chunks))

让我们根据基本事实进行评估，在这种情况下，我使用相似度分数进行评估。您可以使用任何其他内置或自定义指标。在此处了解更多信息。

from azure.ai.evaluation import SimilarityEvaluator

# Initialzing Relevance Evaluator

similarity_eval = SimilarityEvaluator(model_config)

df["answer"] = normal_answers

df['score'] = df.apply(lambda x : similarity_eval(

response=x["answer"],

ground_truth = x["ground_truth"],

query=x["question"],

), axis = 1)

df["answer_contextual"] = contexual_answers

df['score_contextual'] = df.apply(lambda x : similarity_eval(

response=x["answer_contextual"],

ground_truth = x["ground_truth"],

query=x["question"],

), axis = 1)

如您所见，上下文嵌入增加了检索量，因此相似度得分也反映了这一点。本博文概述的上下文检索系统展示了一种复杂的文档分析和问答方法。通过集成各种 NLP 技术（例如使用 GPT-4 进行上下文化、使用 BM25 进行高效索引、使用 Cohere 模型进行重新排序以及查询扩展），系统不仅可以检索相关信息，还可以理解和综合这些信息以提供准确的答案。这种模块化架构确保了灵活性，允许随着更好的技术的出现而增强或替换单个组件。随着自然语言处理领域的不断发展，这样的系统将变得越来越重要，使大量文本在不同领域更易于访问、搜索和操作。

如何提高 RAG 准确率？附Azure OpenAI教程

推荐体验

相关资讯

如何提高 RAG 准确率？附Azure OpenAI教程

准确率最高的计划软件

黑马计划软件准确率高

大　发　预测准确率100%

大　发　极速预测准确率100%

近期资讯

惠普全新笔记本曝光：AMD锐龙AI MAX+ PRO 395顶级APU加持

比亚迪出海痛击巨鳄

董明珠回应格力空调卖得贵：我们做的不是眼前一点利益

马斯克大胆预测：2027年AI智力将反超人类

惠普Omen Max 16游戏本曝光！用上酷睿Ultra 9+RTX 5080

余承东：问界M9对得起那四个字

全球首台！中国30MW级纯氢燃气轮机点火成功

靠微信“送礼物”股价暴涨的微盟：五年亏损50亿用户锐减

日本飞机相撞燃起大火致5死15伤！调查报告公布：误解指令

华为史上最强鸿蒙平板柔光版开售：5799元起

推荐体验

AIGC重要产品

AI对话：类ChatGPT产品体验

好用的AI绘画工具

火热的AIGC产品

AIGC近期要闻

大公司发布的大模型产品都有哪些？

政府对AIGC的扶持政策

AIGC对就业的影响：我们会失业吗？

AIGC产业影响

AIGC对内容创作的影响

AIGC对绘画设计领域的影响

AIGC对各行各业的影响