Perdido en el medio: El problema con los contextos largos

“Independientemente de la arquitectura de tu modelo, existe una degradación sustancial del rendimiento cuando incluyes más de 10 documentos recuperados. En resumen: Cuando los modelos deben acceder a información relevante en medio de contextos largos, tienden a ignorar los documentos proporcionados. Ver: https://arxiv.org/abs/2307.03172

Para evitar este problema, puedes reordenar los documentos después de recuperarlos para evitar la degradación del rendimiento.”

Por: Langchain

Lost in the Middle

Librerías

from operator import itemgetter

from dotenv import load_dotenv
from langchain.chat_models import ChatOpenAI
from langchain.document_transformers import LongContextReorder
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

from src.langchain_docs_loader import LangchainDocsLoader, num_tokens_from_string

load_dotenv()
True

Carga de datos

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=350,
    chunk_overlap=10,
    length_function=num_tokens_from_string,
)

documents = LangchainDocsLoader().load()
documents = text_splitter.split_documents(documents)

Creación de retriever

retriever = Chroma.from_documents(documents, embedding=OpenAIEmbeddings()).as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 10,
        "fetch_k": 50,
    },
)
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-embedding-ada-002 in organization org-vwqjdaXGZeEg6mWAVSflJXD9 on tokens per min. Limit: 1000000 / min. Current: 774651 / min. Contact us through our help center at help.openai.com if you continue to have issues..

Consulta con el retriever

relevant_docs = retriever.get_relevant_documents(
    "How to use LCEL ainvoke with a retriever?"
)
relevant_docs
[Document(page_content='## LLMRails as a Retriever\u200b\n\nLLMRails, as all the other LangChain vectorstores, is most often used as a LangChain Retriever:\n\n```python\nretriever = llm_rails.as_retriever()\nretriever\n```\n\n```text\n    LLMRailsRetriever(tags=None, metadata=None, vectorstore=<langchain.vectorstores.llm_rails.LLMRails object at 0x107b9c040>, search_type=\'similarity\', search_kwargs={\'k\': 5})\n```\n\n```python\nquery = "What is your approach to national defense"\nretriever.get_relevant_documents(query)[0]\n```', metadata={'description': 'LLMRails is a API platform for building GenAI applications. It provides an easy-to-use API for document indexing and querying that is managed by LLMRails and is optimized for performance and accuracy.', 'language': 'en', 'source': 'https://python.langchain.com/docs/integrations/vectorstores/llm_rails', 'title': 'LLMRails | 🦜️🔗 Langchain'}),
 Document(page_content="QdrantTranslator\\n6. WeaviateTranslator\\n\\nAnd remote retrievers like:\\n\\n1. RemoteLangChainRetriever'}", metadata={'description': "In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT to analyze the code base of the LangChain itself.", 'language': 'en', 'source': 'https://python.langchain.com/docs/use_cases/question_answering/how_to/code/code-analysis-deeplake', 'title': "Use LangChain, GPT and Activeloop's Deep Lake to work with code base | 🦜️🔗 Langchain"}),
 Document(page_content='# Retrieve as you generate with FLARE\n\nThis notebook is an implementation of Forward-Looking Active REtrieval augmented generation (FLARE).\n\nPlease see the original repo [here](https://github.com/jzbjyb/FLARE/tree/main).\n\nThe basic idea is:\n\n- Start answering a question\n- If you start generating tokens the model is uncertain about, look up relevant documents\n- Use those documents to continue generating\n- Repeat until finished\n\nThere is a lot of cool detail in how the lookup of relevant documents is done.\nBasically, the tokens that model is uncertain about are highlighted, and then an LLM is called to generate a question that would lead to that answer. For example, if the generated text is `Joe Biden went to Harvard`, and the tokens the model was uncertain about was `Harvard`, then a good generated question would be `where did Joe Biden go to college`. This generated question is then used in a retrieval step to fetch relevant documents.\n\nIn order to set up this chain, we will need three things:\n\n- An LLM to generate the answer\n- An LLM to generate hypothetical questions to use in retrieval\n- A retriever to use to look up answers for\n\nThe LLM that we use to generate the answer needs to return logprobs so we can identify uncertain tokens. For that reason, we HIGHLY recommend that you use the OpenAI wrapper (NB: not the ChatOpenAI wrapper, as that does not return logprobs).\n\nThe LLM we use to generate hypothetical questions to use in retrieval can be anything. In this notebook we will use ChatOpenAI because it is fast and cheap.', metadata={'description': 'This notebook is an implementation of Forward-Looking Active REtrieval augmented generation (FLARE).', 'language': 'en', 'source': 'https://python.langchain.com/docs/use_cases/question_answering/how_to/flare', 'title': 'Retrieve as you generate with FLARE | 🦜️🔗 Langchain'}),
 Document(page_content='# Self-querying\n\nA self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of stored documents and to execute those filters.\n\n![](https://drive.google.com/uc?id=1OQUN-0MJcDUxmPXofgS7MqReEs720pqS)\n\n## Get started\u200b\n\nWe\'ll use a Pinecone vector store in this example.\n\nFirst we\'ll want to create a `Pinecone` vector store and seed it with some data. We\'ve created a small demo set of documents that contain summaries of movies.\n\nTo use Pinecone, you need to have `pinecone` package installed and you must have an API key and an environment. Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart).\n\n**Note:** The self-query retriever requires you to have `lark` package installed.\n\n```python\n# !pip install lark pinecone-client\n```\n\n```python\nimport os\n\nimport pinecone\n\npinecone.init(api_key=os.environ["PINECONE_API_KEY"], environment=os.environ["PINECONE_ENV"])\n```\n\n```python\nfrom langchain.schema import Document\nfrom langchain.embeddings.openai import OpenAIEmbeddings\nfrom langchain.vectorstores import Pinecone', metadata={'description': 'A self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of stored documents and to execute those filters.', 'language': 'en', 'source': 'https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/', 'title': 'Self-querying | 🦜️🔗 Langchain'}),
 Document(page_content="- [Parent Document Retriever](/docs/modules/data_connection/retrievers/parent_document_retriever): This allows you to create multiple embeddings per parent document, allowing you to look up smaller chunks but return larger context.\n- [Self Query Retriever](/docs/modules/data_connection/retrievers/self_query): User questions often contain a reference to something that isn't just semantic but rather expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the _semantic_ part of a query from other _metadata filters_ present in the query.\n- [Ensemble Retriever](/docs/modules/data_connection/retrievers/ensemble): Sometimes you may want to retrieve documents from multiple different sources, or using multiple different algorithms. The ensemble retriever allows you to easily do this.\n- And more!", metadata={'description': "Many LLM applications require user-specific data that is not part of the model's training set.", 'language': 'en', 'source': 'https://python.langchain.com/docs/modules/data_connection/', 'title': 'Retrieval | 🦜️🔗 Langchain'}),
 Document(page_content='Again, we can use the [LangSmith trace](https://smith.langchain.com/public/18460363-0c70-4c72-81c7-3b57253bb58c/r) to explore the prompt structure.\n\n### Going deeper\u200b\n\n- Agents, such as the [conversational retrieval agent](/docs/use_cases/question_answering/how_to/conversational_retrieval_agents), can be used for retrieval when necessary while also holding a conversation.', metadata={'description': 'Open In Collab', 'language': 'en', 'source': 'https://python.langchain.com/docs/use_cases/chatbots', 'title': 'Chatbots | 🦜️🔗 Langchain'}),
 Document(page_content='for Zep - A long-term memory store for LLM applications.](/docs/integrations/retrievers/zep_memorystore)', metadata={'language': 'en', 'source': 'https://python.langchain.com/docs/integrations/retrievers', 'title': 'Retrievers | 🦜️🔗 Langchain'}),
 Document(page_content='- ⛓ [How to use BGE Embeddings for LangChain](https://youtu.be/sWRvSG7vL4g?si=85jnvnmTCF9YIWXI)\n- ⛓ [How to use Custom Prompts for RetrievalQA on LLaMA-2 7B](https://youtu.be/PDwUKves9GY?si=sMF99TWU0p4eiK80)', metadata={'description': 'Below are links to tutorials and courses on LangChain. For written guides on common use cases for LangChain, check out the use cases guides.', 'language': 'en', 'source': 'https://python.langchain.com/docs/additional_resources/tutorials', 'title': 'Tutorials | 🦜️🔗 Langchain'}),
 Document(page_content='```python\nllm_with_tools = llm.bind(\n    functions=[\n        # The retriever tool\n        format_tool_to_openai_function(retriever_tool), \n        # Response schema\n        convert_pydantic_to_openai_function(Response)\n    ]\n)\n```\n\n```python\nagent = {\n    "input": lambda x: x["input"],\n    # Format agent scratchpad from intermediate steps\n    "agent_scratchpad": lambda x: format_to_openai_functions(x[\'intermediate_steps\'])\n} | prompt | llm_with_tools | parse\n```\n\n```python\nagent_executor = AgentExecutor(tools=[retriever_tool], agent=agent, verbose=True)\n```\n\n## Run the agent\u200b\n\nWe can now run the agent! Notice how it responds with a dictionary with two keys: `answer` and `sources`\n\n```python\nagent_executor.invoke({"input": "what did the president say about kentaji brown jackson"}, return_only_outputs=True)\n```', metadata={'description': 'This notebook covers how to have an agent return a structured output.', 'language': 'en', 'source': 'https://python.langchain.com/docs/modules/agents/how_to/agent_structured', 'title': 'Returning Structured Output | 🦜️🔗 Langchain'}),
 Document(page_content='This time the answer is correct, since the self-querying retriever created a filter on the landlord attribute of the metadata, correctly filtering to document that specifically is about the DHA Group landlord. The resulting source chunks are all relevant to this landlord, and this improves answer accuracy even though the landlord is not directly mentioned in the specific chunk that contains the correct answer.', metadata={'description': 'This notebook covers how to load documents from Docugami. It provides the advantages of using this system over alternative data loaders.', 'language': 'en', 'source': 'https://python.langchain.com/docs/integrations/document_loaders/docugami', 'title': 'Docugami | 🦜️🔗 Langchain'})]

Reordenado de documentos

reordering = LongContextReorder()
reordered_docs = list(reordering.transform_documents(relevant_docs))
reordered_docs
[Document(page_content="QdrantTranslator\\n6. WeaviateTranslator\\n\\nAnd remote retrievers like:\\n\\n1. RemoteLangChainRetriever'}", metadata={'description': "In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT to analyze the code base of the LangChain itself.", 'language': 'en', 'source': 'https://python.langchain.com/docs/use_cases/question_answering/how_to/code/code-analysis-deeplake', 'title': "Use LangChain, GPT and Activeloop's Deep Lake to work with code base | 🦜️🔗 Langchain"}),
 Document(page_content='# Self-querying\n\nA self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of stored documents and to execute those filters.\n\n![](https://drive.google.com/uc?id=1OQUN-0MJcDUxmPXofgS7MqReEs720pqS)\n\n## Get started\u200b\n\nWe\'ll use a Pinecone vector store in this example.\n\nFirst we\'ll want to create a `Pinecone` vector store and seed it with some data. We\'ve created a small demo set of documents that contain summaries of movies.\n\nTo use Pinecone, you need to have `pinecone` package installed and you must have an API key and an environment. Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart).\n\n**Note:** The self-query retriever requires you to have `lark` package installed.\n\n```python\n# !pip install lark pinecone-client\n```\n\n```python\nimport os\n\nimport pinecone\n\npinecone.init(api_key=os.environ["PINECONE_API_KEY"], environment=os.environ["PINECONE_ENV"])\n```\n\n```python\nfrom langchain.schema import Document\nfrom langchain.embeddings.openai import OpenAIEmbeddings\nfrom langchain.vectorstores import Pinecone', metadata={'description': 'A self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of stored documents and to execute those filters.', 'language': 'en', 'source': 'https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/', 'title': 'Self-querying | 🦜️🔗 Langchain'}),
 Document(page_content='Again, we can use the [LangSmith trace](https://smith.langchain.com/public/18460363-0c70-4c72-81c7-3b57253bb58c/r) to explore the prompt structure.\n\n### Going deeper\u200b\n\n- Agents, such as the [conversational retrieval agent](/docs/use_cases/question_answering/how_to/conversational_retrieval_agents), can be used for retrieval when necessary while also holding a conversation.', metadata={'description': 'Open In Collab', 'language': 'en', 'source': 'https://python.langchain.com/docs/use_cases/chatbots', 'title': 'Chatbots | 🦜️🔗 Langchain'}),
 Document(page_content='- ⛓ [How to use BGE Embeddings for LangChain](https://youtu.be/sWRvSG7vL4g?si=85jnvnmTCF9YIWXI)\n- ⛓ [How to use Custom Prompts for RetrievalQA on LLaMA-2 7B](https://youtu.be/PDwUKves9GY?si=sMF99TWU0p4eiK80)', metadata={'description': 'Below are links to tutorials and courses on LangChain. For written guides on common use cases for LangChain, check out the use cases guides.', 'language': 'en', 'source': 'https://python.langchain.com/docs/additional_resources/tutorials', 'title': 'Tutorials | 🦜️🔗 Langchain'}),
 Document(page_content='This time the answer is correct, since the self-querying retriever created a filter on the landlord attribute of the metadata, correctly filtering to document that specifically is about the DHA Group landlord. The resulting source chunks are all relevant to this landlord, and this improves answer accuracy even though the landlord is not directly mentioned in the specific chunk that contains the correct answer.', metadata={'description': 'This notebook covers how to load documents from Docugami. It provides the advantages of using this system over alternative data loaders.', 'language': 'en', 'source': 'https://python.langchain.com/docs/integrations/document_loaders/docugami', 'title': 'Docugami | 🦜️🔗 Langchain'}),
 Document(page_content='```python\nllm_with_tools = llm.bind(\n    functions=[\n        # The retriever tool\n        format_tool_to_openai_function(retriever_tool), \n        # Response schema\n        convert_pydantic_to_openai_function(Response)\n    ]\n)\n```\n\n```python\nagent = {\n    "input": lambda x: x["input"],\n    # Format agent scratchpad from intermediate steps\n    "agent_scratchpad": lambda x: format_to_openai_functions(x[\'intermediate_steps\'])\n} | prompt | llm_with_tools | parse\n```\n\n```python\nagent_executor = AgentExecutor(tools=[retriever_tool], agent=agent, verbose=True)\n```\n\n## Run the agent\u200b\n\nWe can now run the agent! Notice how it responds with a dictionary with two keys: `answer` and `sources`\n\n```python\nagent_executor.invoke({"input": "what did the president say about kentaji brown jackson"}, return_only_outputs=True)\n```', metadata={'description': 'This notebook covers how to have an agent return a structured output.', 'language': 'en', 'source': 'https://python.langchain.com/docs/modules/agents/how_to/agent_structured', 'title': 'Returning Structured Output | 🦜️🔗 Langchain'}),
 Document(page_content='for Zep - A long-term memory store for LLM applications.](/docs/integrations/retrievers/zep_memorystore)', metadata={'language': 'en', 'source': 'https://python.langchain.com/docs/integrations/retrievers', 'title': 'Retrievers | 🦜️🔗 Langchain'}),
 Document(page_content="- [Parent Document Retriever](/docs/modules/data_connection/retrievers/parent_document_retriever): This allows you to create multiple embeddings per parent document, allowing you to look up smaller chunks but return larger context.\n- [Self Query Retriever](/docs/modules/data_connection/retrievers/self_query): User questions often contain a reference to something that isn't just semantic but rather expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the _semantic_ part of a query from other _metadata filters_ present in the query.\n- [Ensemble Retriever](/docs/modules/data_connection/retrievers/ensemble): Sometimes you may want to retrieve documents from multiple different sources, or using multiple different algorithms. The ensemble retriever allows you to easily do this.\n- And more!", metadata={'description': "Many LLM applications require user-specific data that is not part of the model's training set.", 'language': 'en', 'source': 'https://python.langchain.com/docs/modules/data_connection/', 'title': 'Retrieval | 🦜️🔗 Langchain'}),
 Document(page_content='# Retrieve as you generate with FLARE\n\nThis notebook is an implementation of Forward-Looking Active REtrieval augmented generation (FLARE).\n\nPlease see the original repo [here](https://github.com/jzbjyb/FLARE/tree/main).\n\nThe basic idea is:\n\n- Start answering a question\n- If you start generating tokens the model is uncertain about, look up relevant documents\n- Use those documents to continue generating\n- Repeat until finished\n\nThere is a lot of cool detail in how the lookup of relevant documents is done.\nBasically, the tokens that model is uncertain about are highlighted, and then an LLM is called to generate a question that would lead to that answer. For example, if the generated text is `Joe Biden went to Harvard`, and the tokens the model was uncertain about was `Harvard`, then a good generated question would be `where did Joe Biden go to college`. This generated question is then used in a retrieval step to fetch relevant documents.\n\nIn order to set up this chain, we will need three things:\n\n- An LLM to generate the answer\n- An LLM to generate hypothetical questions to use in retrieval\n- A retriever to use to look up answers for\n\nThe LLM that we use to generate the answer needs to return logprobs so we can identify uncertain tokens. For that reason, we HIGHLY recommend that you use the OpenAI wrapper (NB: not the ChatOpenAI wrapper, as that does not return logprobs).\n\nThe LLM we use to generate hypothetical questions to use in retrieval can be anything. In this notebook we will use ChatOpenAI because it is fast and cheap.', metadata={'description': 'This notebook is an implementation of Forward-Looking Active REtrieval augmented generation (FLARE).', 'language': 'en', 'source': 'https://python.langchain.com/docs/use_cases/question_answering/how_to/flare', 'title': 'Retrieve as you generate with FLARE | 🦜️🔗 Langchain'}),
 Document(page_content='## LLMRails as a Retriever\u200b\n\nLLMRails, as all the other LangChain vectorstores, is most often used as a LangChain Retriever:\n\n```python\nretriever = llm_rails.as_retriever()\nretriever\n```\n\n```text\n    LLMRailsRetriever(tags=None, metadata=None, vectorstore=<langchain.vectorstores.llm_rails.LLMRails object at 0x107b9c040>, search_type=\'similarity\', search_kwargs={\'k\': 5})\n```\n\n```python\nquery = "What is your approach to national defense"\nretriever.get_relevant_documents(query)[0]\n```', metadata={'description': 'LLMRails is a API platform for building GenAI applications. It provides an easy-to-use API for document indexing and querying that is managed by LLMRails and is optimized for performance and accuracy.', 'language': 'en', 'source': 'https://python.langchain.com/docs/integrations/vectorstores/llm_rails', 'title': 'LLMRails | 🦜️🔗 Langchain'})]

Uso del reordenador en nuestro pipeline de Retrieval Augmented Generation

def combine_documents(documents: list[Document]) -> str:
    return "\n\n".join([doc.page_content for doc in documents])


prompt = PromptTemplate.from_template(
    """Given the following text extracts:
-----
{context}
-----
                                      
Answer the following question, if you don't know the answer, just write "I don't know.

Question: {question}"""
)

llm = ChatOpenAI(temperature=0)

stuff_chain = (
    {
        "context": itemgetter("question")
        | retriever
        | reordering.transform_documents
        | combine_documents,
        "question": itemgetter("question"),
    }
    | prompt
    | llm
)
response = stuff_chain.invoke(input={"question": "How to create a chain using LCEL?"}).content
print(response)
To create a chain using LCEL, you can follow these steps:

1. Import the necessary modules and classes from the LangChain library.
2. Define the components of your chain, such as LLMs, prompts, and tools.
3. Use the LCEL syntax to chain together the components in the desired order.
4. Invoke the chain with the input data to get the output.

Here is an example of creating a chain using LCEL:

```python
from langchain.prompts.prompt import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.anonymizers import PresidioReversibleAnonymizer

# Define the components
anonymizer = PresidioReversibleAnonymizer()
prompt = PromptTemplate.from_template(template="{anonymized_text}")
llm = ChatOpenAI(temperature=0)

# Chain the components together
chain = {"anonymized_text": anonymizer.anonymize} | prompt | llm

# Invoke the chain with input data
text = "This is a sample text."
response = chain.invoke(text)

# Get the output
output = response.content
print(output)
```

In this example, the chain starts with the `anonymizer.anonymize` function, which anonymizes the input text. The anonymized text is then passed to the `prompt` component, which generates a prompt using the anonymized text. Finally, the prompt is passed to the `llm` component, which generates the output response.

Note that this is just a basic example, and you can customize the chain by adding more components or modifying the existing ones according to your requirements.