Conceptos aprendidos

Librerías

from functools import partial
from operator import itemgetter
from typing import Sequence

from dotenv import load_dotenv
from langchain.base_language import BaseLanguageModel
from langchain.chat_models import ChatOpenAI
from langchain.document_transformers import LongContextReorder
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import SQLRecordManager, index
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder, PromptTemplate
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain.schema import BaseRetriever, Document, StrOutputParser
from langchain.schema.messages import BaseMessageChunk
from langchain.schema.runnable import Runnable, RunnableMap
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

from src.langchain_docs_loader import LangchainDocsLoader, num_tokens_from_string

load_dotenv()

True

Procesamiento de datos

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=50,
    length_function=num_tokens_from_string,
)

keyword_docs = LangchainDocsLoader(
    include_output_cells=True,
    include_links_in_header=True,
).load()

splitted_docs = text_splitter.split_documents(keyword_docs)

filtered_docs = [
    doc
    for doc in splitted_docs
    if doc.page_content not in ("```", "```text", "```python")
]

len(filtered_docs)

Indexación

Almacenaje de documento en Vectorstore

record_manager = SQLRecordManager(
    db_url="sqlite:///:memory:",
    namespace="langchain",
)

record_manager.create_schema()

embeddings = OpenAIEmbeddings()

vectorstore = Chroma(collection_name="langchain", embedding_function=embeddings)

indexing_result = index(
    docs_source=filtered_docs,
    record_manager=record_manager,
    vector_store=vectorstore,
    batch_size=1000,
    cleanup="full",
    source_id_key="source",
)

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-embedding-ada-002 in organization org-vwqjdaXGZeEg6mWAVSflJXD9 on tokens per min. Limit: 1000000 / min. Current: 788782 / min. Contact us through our help center at help.openai.com if you continue to have issues..
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-embedding-ada-002 in organization org-vwqjdaXGZeEg6mWAVSflJXD9 on tokens per min. Limit: 1000000 / min. Current: 712780 / min. Contact us through our help center at help.openai.com if you continue to have issues..
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-embedding-ada-002 in organization org-vwqjdaXGZeEg6mWAVSflJXD9 on tokens per min. Limit: 1000000 / min. Current: 631346 / min. Contact us through our help center at help.openai.com if you continue to have issues..
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 8.0 seconds as it raised RateLimitError: Rate limit reached for default-text-embedding-ada-002 in organization org-vwqjdaXGZeEg6mWAVSflJXD9 on tokens per min. Limit: 1000000 / min. Current: 554121 / min. Contact us through our help center at help.openai.com if you continue to have issues..

indexing_result

{'num_added': 2851, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

Obtención de los documentos almacenados

Si nuestra base de documentos inicial contenía documentos duplicados, éstos se han eliminado en el proceso de indexación. Por lo tanto, el número de documentos almacenados en Vectorstore podría ser menor que el número de documentos de la base inicial.

Al obtener los documentos almacenados en Vectorstore podemos tener una copia fidedigna de la base de datos inicial, pero sin duplicados. Esta copia puede ser utlizada para crear un nuevo índice o inicializar un retriever.

vector_keys = vectorstore.get(
    ids=record_manager.list_keys(), include=["documents", "metadatas"]
)

docs_in_vectorstore = [
    Document(page_content=page_content, metadata=metadata)
    for page_content, metadata in zip(
        vector_keys["documents"], vector_keys["metadatas"]
    )
]

Inicialización de retrievers

keyword_retriever = BM25Retriever.from_documents(docs_in_vectorstore)
keyword_retriever.k = 5

semantic_retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 5,
        "fetch_k": 50,
        "lambda_mult": 0.3,
    },
)

retriever = EnsembleRetriever(
    retrievers=[keyword_retriever, semantic_retriever],
    weights=[0.3, 0.7],
)

Creación de RAG

Prompts

CONDENSE_QUESTION_TEMPLATE = """\
Given the following conversation and a follow up question, rephrase the follow up \
question to be a standalone question.

Chat History:
====================
{chat_history}
====================

Follow Up Input: {question}
Standalone Question:"""

SYSTEM_ANSWER_QUESTION_TEMPLATE = """\
You are an expert programmer and problem-solver, tasked with answering any question \
about 'Langchain' with high quality answers and without making anything up.

Generate a comprehensive and informative answer of 80 words or less for the \
given question based solely on the provided search results (URL and content). You must \
only use information from the provided search results. Use an unbiased and \
journalistic tone. Combine search results together into a coherent answer. Do not \
repeat text. Cite search results using [${{number}}] notation. Only cite the most \
relevant results that answer the question accurately. Place these citations at the end \
of the sentence or paragraph that reference them - do not put them all at the end. If \
different results refer to different entities within the same name, write separate \
answers for each entity.

If there is nothing in the context relevant to the question at hand, just say "Hmm, \
I'm not sure.". Don't try to make up an answer. This is not a suggestion. This is a rule.

Anything between the following `context` html blocks is retrieved from a knowledge \
bank, not part of the conversation with the user.

<context>
    {context}
</context>

REMBEMBER: If there is no relevant information within the context, just say "Hmm, \
I'm not sure.". Don't try to make up an answer. This is not a suggestion. This is a rule. \
Anything between the preceding 'context' html blocks is retrieved from a knowledge bank, \
not part of the conversation with the user.

Take a deep breath and relax. You are an expert programmer and problem-solver. You can do this.
You can cite all the relevant information from the search results. Let's go!"""

Creación de cadena de retrieval

def create_retriever_chain(
    llm: BaseLanguageModel[BaseMessageChunk],
    retriever: BaseRetriever,
    use_chat_history: bool,
):
    CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(CONDENSE_QUESTION_TEMPLATE)
    if not use_chat_history:
        initial_chain = (itemgetter("question")) | retriever
        return initial_chain
    else:
        condense_question_chain = (
            {
                "question": itemgetter("question"),
                "chat_history": itemgetter("chat_history"),
            }
            | CONDENSE_QUESTION_PROMPT
            | llm
            | StrOutputParser()
        )
        conversation_chain = condense_question_chain | retriever
        return conversation_chain

Truncado de documentos recuperados a un número de documentos

def get_k_or_less_documents(documents: list[Document], k: int):
    if len(documents) <= k:
        return documents
    else:
        return documents[:k]

Reordenado de documentos recuperados

def reorder_documents(documents: list[Document]):
    reorder = LongContextReorder()
    return reorder.transform_documents(documents)

Formateo de documentos recuperados

def format_docs(docs: Sequence[Document]) -> str:
    formatted_docs: list[str] = []
    for i, doc in enumerate(docs):
        doc_string = f"<doc id='{i}'>{doc.page_content}</doc>"
        formatted_docs.append(doc_string)
    return "\n".join(formatted_docs)

Creación de cadena de respuesta

def create_answer_chain(
    llm: BaseLanguageModel[BaseMessageChunk],
    retriever: BaseRetriever,
    use_chat_history: bool,
    k: int = 5,
) -> Runnable:
    retriever_chain = create_retriever_chain(llm, retriever, use_chat_history)

    _get_k_or_less_documents = partial(get_k_or_less_documents, k=k)

    context = RunnableMap(
        {
            "context": (
                retriever_chain
                | _get_k_or_less_documents
                | reorder_documents
                | format_docs
            ),
            "question": itemgetter("question"),
            "chat_history": itemgetter("chat_history"),
        }
    )

    prompt = ChatPromptTemplate.from_messages(
        messages=[
            ("system", SYSTEM_ANSWER_QUESTION_TEMPLATE),
            MessagesPlaceholder(variable_name="chat_history"),
            ("human", "{question}"),
        ]
    )

    response_synthesizer = prompt | llm | StrOutputParser()
    response_chain = context | response_synthesizer

    return response_chain

Interacción con el usuario

Inicialización del chatbot

llm = ChatOpenAI(model="gpt-3.5-turbo-16k", temperature=0.0)

answer_chain = create_answer_chain(
    llm=llm, retriever=retriever, use_chat_history=False, k=6
)

Ejemplo 1

question = "How to use .stream method in my chain with code example?"

print(
    answer_chain.invoke(  # type: ignore
        {
            "question": question,
            "chat_history": [],
        }
    )
)

To use the `.stream` method in your chain, you can follow these steps:

1. Import the necessary classes:
```python
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
```

2. Create an instance of the chat model:
```python
model = ChatOpenAI()
```

3. Define a prompt template using the `ChatPromptTemplate` class:
```python
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
```

4. Combine the prompt and model into a chain:
```python
chain = prompt | model
```

5. Use the `.stream` method to iterate over the streamed response:
```python
for s in chain.stream({"topic": "bears"}):
    print(s.content, end="", flush=True)
```

This will stream back chunks of the response, allowing you to process the output as it becomes available. In the example above, it will print a bear-themed joke.

Please note that this is just a basic example, and you can customize the prompt and model according to your specific use case.

keyword_docs = keyword_retriever.get_relevant_documents(
    query=question,
)

for i, doc in enumerate(keyword_docs, start=1):
    print(f"{i}. {doc.metadata['source']}: {doc.metadata.get('description', '')}")

1. https://python.langchain.com/docs/modules/memory/adding_memory: This notebook goes over how to use the Memory class with an LLMChain.
2. https://python.langchain.com/docs/integrations/providers/langchain_decorators: lanchchain decorators is a layer on the top of LangChain that provides syntactic sugar 🍭 for writing custom langchain prompts and chains
3. https://python.langchain.com/docs/use_cases/question_answering/how_to/code/twitter-the-algorithm-analysis-deeplake: In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT4 to analyze the code base of the twitter algorithm.
4. https://python.langchain.com/docs/integrations/memory/motorhead_memory: Motörhead is a memory server implemented in Rust. It automatically handles incremental summarization in the background and allows for stateless applications.
5. https://python.langchain.com/docs/guides/safety/moderation: This notebook walks through examples of how to use a moderation chain, and several common ways for doing so. Moderation chains are useful for detecting text that could be hateful, violent, etc. This can be useful to apply on both user input, but also on the output of a Language Model. Some API providers, like OpenAI, specifically prohibit you, or your end users, from generating some types of harmful content. To comply with this (and to just generally prevent your application from being harmful) you may often want to append a moderation chain to any LLMChains, in order to make sure any output the LLM generates is not harmful.

semantic_docs = semantic_retriever.get_relevant_documents(
    query=question,
)

for i, doc in enumerate(semantic_docs, start=1):
    print(f"{i}. {doc.metadata['source']}: {doc.metadata.get('description', '')}")

1. https://python.langchain.com/docs/expression_language/interface: In an effort to make it as easy as possible to create custom chains, we've implemented a "Runnable" protocol that most components implement. This is a standard interface with a few different methods, which makes it easy to define custom chains as well as making it possible to invoke them in a standard way. The standard interface exposed includes:
2. https://python.langchain.com/docs/use_cases/apis: Open In Collab
3. https://python.langchain.com/docs/modules/callbacks/: Head to Integrations for documentation on built-in callbacks integrations with 3rd-party tools.
4. https://python.langchain.com/docs/guides/deployments/template_repos: So, you've created a really cool chain - now what? How do you deploy it and make it easily shareable with the world?
5. https://python.langchain.com/docs/modules/chains/document/map_reduce: The map reduce documents chain first applies an LLM chain to each document individually (the Map step), treating the chain output as a new document. It then passes all the new documents to a separate combine documents chain to get a single output (the Reduce step). It can optionally first compress, or collapse, the mapped documents to make sure that they fit in the combine documents chain (which will often pass them to an LLM). This compression step is performed recursively if necessary.

ensemble_docs = retriever.get_relevant_documents(
    query=question,
)

for i, doc in enumerate(ensemble_docs, start=1):
    print(f"{i}. {doc.metadata['source']}: {doc.metadata.get('description', '')}")

1. https://python.langchain.com/docs/expression_language/interface: In an effort to make it as easy as possible to create custom chains, we've implemented a "Runnable" protocol that most components implement. This is a standard interface with a few different methods, which makes it easy to define custom chains as well as making it possible to invoke them in a standard way. The standard interface exposed includes:
2. https://python.langchain.com/docs/use_cases/apis: Open In Collab
3. https://python.langchain.com/docs/modules/callbacks/: Head to Integrations for documentation on built-in callbacks integrations with 3rd-party tools.
4. https://python.langchain.com/docs/guides/deployments/template_repos: So, you've created a really cool chain - now what? How do you deploy it and make it easily shareable with the world?
5. https://python.langchain.com/docs/modules/chains/document/map_reduce: The map reduce documents chain first applies an LLM chain to each document individually (the Map step), treating the chain output as a new document. It then passes all the new documents to a separate combine documents chain to get a single output (the Reduce step). It can optionally first compress, or collapse, the mapped documents to make sure that they fit in the combine documents chain (which will often pass them to an LLM). This compression step is performed recursively if necessary.
6. https://python.langchain.com/docs/modules/memory/adding_memory: This notebook goes over how to use the Memory class with an LLMChain.
7. https://python.langchain.com/docs/integrations/providers/langchain_decorators: lanchchain decorators is a layer on the top of LangChain that provides syntactic sugar 🍭 for writing custom langchain prompts and chains
8. https://python.langchain.com/docs/use_cases/question_answering/how_to/code/twitter-the-algorithm-analysis-deeplake: In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT4 to analyze the code base of the twitter algorithm.
9. https://python.langchain.com/docs/integrations/memory/motorhead_memory: Motörhead is a memory server implemented in Rust. It automatically handles incremental summarization in the background and allows for stateless applications.
10. https://python.langchain.com/docs/guides/safety/moderation: This notebook walks through examples of how to use a moderation chain, and several common ways for doing so. Moderation chains are useful for detecting text that could be hateful, violent, etc. This can be useful to apply on both user input, but also on the output of a Language Model. Some API providers, like OpenAI, specifically prohibit you, or your end users, from generating some types of harmful content. To comply with this (and to just generally prevent your application from being harmful) you may often want to append a moderation chain to any LLMChains, in order to make sure any output the LLM generates is not harmful.

Ejemplo 2

question = "How to use .batch method in my chain with code example?"

print(
    answer_chain.invoke(  # type: ignore
        {
            "question": question,
            "chat_history": [],
        }
    )
)

To use the `.batch` method in your chain, you can follow the code example below:

```python
results = agent_executor.batch([{"input": x} for x in inputs], return_exceptions=True)
```

In this example, `agent_executor` is the instance of your chain, and `inputs` is a list of input questions or queries that you want to pass to the chain. The `.batch` method allows you to process multiple inputs in parallel, which can be more efficient than processing them one by one. The `return_exceptions=True` parameter ensures that any exceptions raised during the processing of inputs are returned instead of raising an error.

Please note that this code example assumes you have already set up your chain and have the necessary inputs ready.

keyword_docs = keyword_retriever.get_relevant_documents(
    query=question,
)

for i, doc in enumerate(keyword_docs, start=1):
    print(f"{i}. {doc.metadata['source']}: {doc.metadata.get('description', '')}")

1. https://python.langchain.com/docs/modules/memory/adding_memory: This notebook goes over how to use the Memory class with an LLMChain.
2. https://python.langchain.com/docs/integrations/providers/langchain_decorators: lanchchain decorators is a layer on the top of LangChain that provides syntactic sugar 🍭 for writing custom langchain prompts and chains
3. https://python.langchain.com/docs/use_cases/question_answering/how_to/code/twitter-the-algorithm-analysis-deeplake: In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT4 to analyze the code base of the twitter algorithm.
4. https://python.langchain.com/docs/integrations/memory/motorhead_memory: Motörhead is a memory server implemented in Rust. It automatically handles incremental summarization in the background and allows for stateless applications.
5. https://python.langchain.com/docs/guides/safety/moderation: This notebook walks through examples of how to use a moderation chain, and several common ways for doing so. Moderation chains are useful for detecting text that could be hateful, violent, etc. This can be useful to apply on both user input, but also on the output of a Language Model. Some API providers, like OpenAI, specifically prohibit you, or your end users, from generating some types of harmful content. To comply with this (and to just generally prevent your application from being harmful) you may often want to append a moderation chain to any LLMChains, in order to make sure any output the LLM generates is not harmful.

semantic_docs = semantic_retriever.get_relevant_documents(
    query=question,
)

for i, doc in enumerate(semantic_docs, start=1):
    print(f"{i}. {doc.metadata['source']}: {doc.metadata.get('description', '')}")

1. https://python.langchain.com/docs/use_cases/question_answering/how_to/conversational_retrieval_agents: This is an agent specifically optimized for doing retrieval when necessary and also holding a conversation.
2. https://python.langchain.com/docs/use_cases/qa_structured/integrations/sqlite: This example demonstrates the use of the SQLDatabaseChain for answering questions over a SQL database.
3. https://python.langchain.com/docs/integrations/vectorstores/chroma: Chroma is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.
4. https://python.langchain.com/docs/guides/langsmith/walkthrough: Open In Collab
5. https://python.langchain.com/docs/modules/callbacks/: Head to Integrations for documentation on built-in callbacks integrations with 3rd-party tools.

ensemble_docs = retriever.get_relevant_documents(
    query=question,
)

for i, doc in enumerate(ensemble_docs, start=1):
    print(f"{i}. {doc.metadata['source']}: {doc.metadata.get('description', '')}")

1. https://python.langchain.com/docs/use_cases/question_answering/how_to/conversational_retrieval_agents: This is an agent specifically optimized for doing retrieval when necessary and also holding a conversation.
2. https://python.langchain.com/docs/use_cases/qa_structured/integrations/sqlite: This example demonstrates the use of the SQLDatabaseChain for answering questions over a SQL database.
3. https://python.langchain.com/docs/integrations/vectorstores/chroma: Chroma is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.
4. https://python.langchain.com/docs/guides/langsmith/walkthrough: Open In Collab
5. https://python.langchain.com/docs/modules/callbacks/: Head to Integrations for documentation on built-in callbacks integrations with 3rd-party tools.
6. https://python.langchain.com/docs/modules/memory/adding_memory: This notebook goes over how to use the Memory class with an LLMChain.
7. https://python.langchain.com/docs/integrations/providers/langchain_decorators: lanchchain decorators is a layer on the top of LangChain that provides syntactic sugar 🍭 for writing custom langchain prompts and chains
8. https://python.langchain.com/docs/use_cases/question_answering/how_to/code/twitter-the-algorithm-analysis-deeplake: In this tutorial, we are going to use Langchain + Activeloop's Deep Lake with GPT4 to analyze the code base of the twitter algorithm.
9. https://python.langchain.com/docs/integrations/memory/motorhead_memory: Motörhead is a memory server implemented in Rust. It automatically handles incremental summarization in the background and allows for stateless applications.
10. https://python.langchain.com/docs/guides/safety/moderation: This notebook walks through examples of how to use a moderation chain, and several common ways for doing so. Moderation chains are useful for detecting text that could be hateful, violent, etc. This can be useful to apply on both user input, but also on the output of a Language Model. Some API providers, like OpenAI, specifically prohibit you, or your end users, from generating some types of harmful content. To comply with this (and to just generally prevent your application from being harmful) you may often want to append a moderation chain to any LLMChains, in order to make sure any output the LLM generates is not harmful.