Self Retrievers

Un recuperador autoconsultante puede analizar y entender las consultas que se le hacen en lenguaje natural, y luego, puede buscar y filtrar información relevante de su base de datos o documentos almacenados basándose en esas consultas. Esto lo hace transformando las consultas en un formato estructurado que puede interpretar y procesar de manera eficiente. Esto significa que, además de comparar la consulta del usuario con los documentos para encontrar coincidencias, también puede filtrar los resultados según criterios específicos extraídos de la consulta del usuario.

Librerías

from pprint import pprint

from dotenv import load_dotenv
from langchain.chains import create_tagging_chain_pydantic
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import SQLRecordManager, index
from langchain.retrievers import SelfQueryRetriever
from langchain.schema import Document
from langchain.text_splitter import Language, RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from pydantic import BaseModel, Field

from src.langchain_docs_loader import LangchainDocsLoader, num_tokens_from_string

load_dotenv()

True

Carga de datos

text_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.MARKDOWN,
    chunk_size=400,
    chunk_overlap=50,
    length_function=num_tokens_from_string,
)

loader = LangchainDocsLoader(include_output_cells=False)
docs = loader.load()
docs = text_splitter.split_documents(docs)
len(docs)

docs = [doc for doc in docs if doc.page_content != "```"]

Inicializado de modelo de lenguaje

llm = ChatOpenAI(temperature=0.1)

Etiquetado de documentos

Los documentos por sí mismos son útiles, pero cuando son etiquetados con información adicional, pueden volverse más útiles. Por ejemplo, si etiquetamos los documentos con su idioma, podemos filtrar los documentos que no estén en el idioma que nos interesa. Si etiquetamos los documentos con su tema, podemos filtrar los documentos que no estén relacionados con el tema que nos interesa. De esta manera, podemos reducir el espacio de búsqueda y obtener mejores resultados.

Creación de esquema de etiquetas

class Tags(BaseModel):
    completness: str = Field(
        description="Describes how useful is the text in terms of self-explanation. It is critical to excel.",
        enum=["Very", "Quite", "Medium", "Little", "Not"],
    )
    code_snippet: bool = Field(
        default=False,
        description="Whether the text fragment includes a code snippet. Code snippets are valid markdown code blocks.",
    )
    description: bool = Field(
        default=False, description="Whether the text fragment includes a description."
    )
    talks_about_vectorstore: bool = Field(
        default=False,
        description="Whether the text fragment talks about a vectorstore.",
    )
    talks_about_retriever: bool = Field(
        default=False, description="Whether the text fragment talks about a retriever."
    )
    talks_about_chain: bool = Field(
        default=False, description="Whether the text fragment talks about a chain."
    )
    talks_about_expression_language: bool = Field(
        default=False,
        description="Whether the text fragment talks about an langchain expression language.",
    )
    contains_markdown_table: bool = Field(
        default=False,
        description="Whether the text fragment contains a markdown table.",
    )


pprint(Tags.schema())

{'properties': {'code_snippet': {'default': False,
                                 'description': 'Whether the text fragment '
                                                'includes a code snippet. Code '
                                                'snippets are valid markdown '
                                                'code blocks.',
                                 'title': 'Code Snippet',
                                 'type': 'boolean'},
                'completness': {'description': 'Describes how useful is the '
                                               'text in terms of '
                                               'self-explanation. It is '
                                               'critical to excel.',
                                'enum': ['Very',
                                         'Quite',
                                         'Medium',
                                         'Little',
                                         'Not'],
                                'title': 'Completness',
                                'type': 'string'},
                'contains_markdown_table': {'default': False,
                                            'description': 'Whether the text '
                                                           'fragment contains '
                                                           'a markdown table.',
                                            'title': 'Contains Markdown Table',
                                            'type': 'boolean'},
                'description': {'default': False,
                                'description': 'Whether the text fragment '
                                               'includes a description.',
                                'title': 'Description',
                                'type': 'boolean'},
                'talks_about_chain': {'default': False,
                                      'description': 'Whether the text '
                                                     'fragment talks about a '
                                                     'chain.',
                                      'title': 'Talks About Chain',
                                      'type': 'boolean'},
                'talks_about_expression_language': {'default': False,
                                                    'description': 'Whether '
                                                                   'the text '
                                                                   'fragment '
                                                                   'talks '
                                                                   'about an '
                                                                   'langchain '
                                                                   'expression '
                                                                   'language.',
                                                    'title': 'Talks About '
                                                             'Expression '
                                                             'Language',
                                                    'type': 'boolean'},
                'talks_about_retriever': {'default': False,
                                          'description': 'Whether the text '
                                                         'fragment talks about '
                                                         'a retriever.',
                                          'title': 'Talks About Retriever',
                                          'type': 'boolean'},
                'talks_about_vectorstore': {'default': False,
                                            'description': 'Whether the text '
                                                           'fragment talks '
                                                           'about a '
                                                           'vectorstore.',
                                            'title': 'Talks About Vectorstore',
                                            'type': 'boolean'}},
 'required': ['completness'],
 'title': 'Tags',
 'type': 'object'}

Creación de cadena de generación de etiquetas (etiquetador)

tagging_prompt = """Extract the desired information from the following passage.

Only extract the properties mentioned in the 'information_extraction' function.
Completness should involve more than one sentence.
To consider that a passage talks about a property, it is enough that it mentions it once.
If there is no mention of a property, set it to False. It only applies for the talk_about_* properties.

For instance,
To set `talks_about_vectorstore` to True, document should contain the word 'vectorstore' at least once.
To set `talks_about_retriever` to True, document should contain the word 'retriever' at least once.
To set `talks_about_chain` to True, document should contain the word 'chain' at least once.
To set `talks_about_expression_language` to True, document should contain the word 'expression language' or 'LCEL' at least once.

Passage:
{input}
"""

tagging_chain = create_tagging_chain_pydantic(Tags, llm)

Ejemplos de uso del etiquetador

Probablemente, un fragmento que únicamente contiene una lista de enlaces a otros fragmentos que también se encuentran indexados no es muy útil. Esto podría ocasionar que recuperemos un documento que no es relevante para la consulta, mientras el documento que sí es relevante no se encuentre en los primeros lugares de la lista de resultados.

idx = 0

result = tagging_chain.invoke(input={"input": docs[idx].page_content})
print(result.get("input"))
pprint(result.get("text").dict())

[📄️ DependentsDependents stats for langchain-ai/langchain](/docs/additional_resources/dependents)[📄️ TutorialsBelow are links to tutorials and courses on LangChain. For written guides on common use cases for LangChain, check out the use cases guides.](/docs/additional_resources/tutorials)[📄️ YouTube videos⛓ icon marks a new addition [last update 2023-09-21]](/docs/additional_resources/youtube)[🔗 Gallery](https://github.com/kyrolabs/awesome-langchain)
{'code_snippet': False,
 'completness': 'Not',
 'contains_markdown_table': False,
 'description': True,
 'talks_about_chain': True,
 'talks_about_expression_language': True,
 'talks_about_retriever': True,
 'talks_about_vectorstore': True}

Un fragmento con enlace a su documentación y ejemplo de uso sería más útil.

idx = 1000

result = tagging_chain.invoke(input={"input": docs[idx].page_content})
print(result.get("input"))
pprint(result.get("text").dict())

# AWS DynamoDB

[Amazon AWS DynamoDB](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/dynamodb/index.html) is a fully managed `NoSQL` database service that provides fast and predictable performance with seamless scalability.

This notebook goes over how to use `DynamoDB` to store chat message history.

First make sure you have correctly configured the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html). Then make sure you have installed `boto3`.

```bash
pip install boto3
```

Next, create the `DynamoDB` Table where we will be storing messages:

```python
import boto3

# Get the service resource.
dynamodb = boto3.resource("dynamodb")

# Create the DynamoDB table.
table = dynamodb.create_table(
    TableName="SessionTable",
    KeySchema=[{"AttributeName": "SessionId", "KeyType": "HASH"}],
    AttributeDefinitions=[{"AttributeName": "SessionId", "AttributeType": "S"}],
    BillingMode="PAY_PER_REQUEST",
)

# Wait until the table exists.
table.meta.client.get_waiter("table_exists").wait(TableName="SessionTable")

# Print out some data about the table.
print(table.item_count)
```

## DynamoDBChatMessageHistory

```python
from langchain.memory.chat_message_histories import DynamoDBChatMessageHistory

history = DynamoDBChatMessageHistory(table_name="SessionTable", session_id="0")

history.add_user_message("hi!")

history.add_ai_message("whats up?")
```

> **API Reference:**
> - [DynamoDBChatMessageHistory](https://api.python.langchain.com/en/latest/memory/langchain.memory.chat_message_histories.dynamodb.DynamoDBChatMessageHistory.html)

```python
history.messages
```
{'code_snippet': True,
 'completness': 'Very',
 'contains_markdown_table': True,
 'description': True,
 'talks_about_chain': True,
 'talks_about_expression_language': True,
 'talks_about_retriever': True,
 'talks_about_vectorstore': True}

idx = 1400

result = tagging_chain.invoke(input={"input": docs[idx].page_content})
print(result.get("input"))
pprint(result.get("text").dict())

text content from PubMed Central and publisher web sites.](/docs/integrations/retrievers/pubmed)[📄️ RePhraseQueryRetrieverSimple retriever that applies an LLM between the user input and the query pass the to retriever.](/docs/integrations/retrievers/re_phrase)[📄️ SEC filings dataSEC filings data powered by Kay.ai and Cybersyn.](/docs/integrations/retrievers/sec_filings)[📄️ SVMSupport vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.](/docs/integrations/retrievers/svm)[📄️ TF-IDFTF-IDF means term-frequency times inverse document-frequency.](/docs/integrations/retrievers/tf_idf)[📄️ VespaVespa is a fully featured search engine and vector database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query.](/docs/integrations/retrievers/vespa)[📄️ Weaviate Hybrid SearchWeaviate is an open source vector database.](/docs/integrations/retrievers/weaviate-hybrid)[📄️ WikipediaWikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. Wikipedia is the largest and most-read reference work in history.](/docs/integrations/retrievers/wikipedia)[📄️ ZepRetriever Example for Zep - A long-term memory store for LLM applications.](/docs/integrations/retrievers/zep_memorystore)
{'code_snippet': False,
 'completness': 'Not',
 'contains_markdown_table': False,
 'description': False,
 'talks_about_chain': False,
 'talks_about_expression_language': False,
 'talks_about_retriever': True,
 'talks_about_vectorstore': False}

Etiquetado de documentos

tagging_results = tagging_chain.batch(
    inputs=[{"input": doc.page_content} for doc in docs[:200]],
    return_exceptions=True,
    config={
        "max_concurrency": 50,
    },
)

docs_with_tags = [
    Document(
        page_content=doc.page_content,
        metadata={
            **doc.metadata,
            **result.get("text").dict(),
        },
    )
    for doc, result in zip(docs, tagging_results)
    if not isinstance(result, Exception)
]

f"Documents with tags: {len(docs_with_tags)}"

'Documents with tags: 184'

Indexado de documentos

vectorstore = Chroma(
    collection_name="langchain_docs",
    embedding_function=OpenAIEmbeddings(),
)

record_manager = SQLRecordManager(
    db_url="sqlite:///:memory:",
    namespace="chroma/langchain_docs",
)

record_manager.create_schema()

index(
    docs_source=docs_with_tags,
    record_manager=record_manager,
    vector_store=vectorstore,
    cleanup="full",
    source_id_key="source",
)

{'num_added': 184, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

Recuperación de documentos con un `Self Retriever`

Creación de interfaz de los metadatos disponibles en el índice

metadata_field_info = [
    AttributeInfo(
        name="completness",
        description="Describes how useful is the text in terms of self-explanation. It is critical to excel.",
        type='enum=["Very", "Quite", "Medium", "Little", "Not"]',
    ),
    AttributeInfo(
        name="code_snippet",
        description="Whether the text fragment includes a code snippet. Code snippets are valid markdown code blocks.",
        type="bool",
    ),
    AttributeInfo(
        name="description",
        description="Whether the text fragment includes a description.",
        type="bool",
    ),
    AttributeInfo(
        name="talks_about_vectorstore",
        description="Whether the text fragment talks about a vectorstore.",
        type="bool",
    ),
    AttributeInfo(
        name="talks_about_retriever",
        description="Whether the text fragment talks about a retriever.",
        type="bool",
    ),
    AttributeInfo(
        name="talks_about_chain",
        description="Whether the text fragment talks about a chain.",
        type="bool",
    ),
    AttributeInfo(
        name="contains_markdown_table",
        description="Whether the text fragment contains a markdown table.",
        type="bool",
    ),
]

document_content_description = "Langchain documentation"

Creación de `retriever`

llm = ChatOpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
    llm=llm,
    vectorstore=vectorstore,
    document_contents=document_content_description,
    metadata_field_info=metadata_field_info,
    enable_limit=True,
    verbose=True,
)

Recuperación de documentos con el `retriever`

relevant_documents = retriever.get_relevant_documents(
    "useful documents that talk about expression language and retrievers"
)
relevant_documents

/Users/jvelezmagic/Documents/github/course/langchain-course/course_document_management/.venv/lib/python3.11/site-packages/langchain/chains/llm.py:280: UserWarning: The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.
  warnings.warn(

query='expression language retrievers' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='completness', value='Very'), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='talks_about_retriever', value=True)]) limit=None

[Document(page_content='```\n\n```python\nchain = (\n    {"context": retriever, "question": RunnablePassthrough()} \n    | prompt \n    | model \n    | StrOutputParser()\n)\n```\n\n```python\nchain.invoke("where did harrison work?")\n```\n\n```python\ntemplate = """Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n\nAnswer in the following language: {language}\n"""\nprompt = ChatPromptTemplate.from_template(template)\n\nchain = {\n    "context": itemgetter("question") | retriever, \n    "question": itemgetter("question"), \n    "language": itemgetter("language")\n} | prompt | model | StrOutputParser()\n```\n\n```python\nchain.invoke({"question": "where did harrison work", "language": "italian"})\n```', metadata={'code_snippet': True, 'completness': 'Very', 'contains_markdown_table': False, 'description': True, 'language': 'en', 'source': 'https://python.langchain.com/docs/expression_language/cookbook/retrieval', 'talks_about_chain': True, 'talks_about_expression_language': True, 'talks_about_retriever': True, 'talks_about_vectorstore': False, 'title': 'RAG | 🦜️🔗 Langchain'}),
 Document(page_content='# RAG\n\nLet\'s look at adding in a retrieval step to a prompt and LLM, which adds up to a "retrieval-augmented generation" chain\n\n```bash\npip install langchain openai faiss-cpu tiktoken\n```\n\n```python\nfrom operator import itemgetter\n\nfrom langchain.prompts import ChatPromptTemplate\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.embeddings import OpenAIEmbeddings\nfrom langchain.schema.output_parser import StrOutputParser\nfrom langchain.schema.runnable import RunnablePassthrough\nfrom langchain.vectorstores import FAISS\n```\n\n> **API Reference:**\n> - [ChatPromptTemplate](https://api.python.langchain.com/en/latest/prompts/langchain.prompts.chat.ChatPromptTemplate.html)\n> - [ChatOpenAI](https://api.python.langchain.com/en/latest/chat_models/langchain.chat_models.openai.ChatOpenAI.html)\n> - [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.openai.OpenAIEmbeddings.html)\n> - [StrOutputParser](https://api.python.langchain.com/en/latest/schema/langchain.schema.output_parser.StrOutputParser.html)\n> - [RunnablePassthrough](https://api.python.langchain.com/en/latest/schema/langchain.schema.runnable.passthrough.RunnablePassthrough.html)\n> - [FAISS](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.faiss.FAISS.html)\n\n```python\nvectorstore = FAISS.from_texts(["harrison worked at kensho"], embedding=OpenAIEmbeddings())\nretriever = vectorstore.as_retriever()\n\ntemplate = """Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n"""\nprompt = ChatPromptTemplate.from_template(template)\n\nmodel = ChatOpenAI()', metadata={'code_snippet': False, 'completness': 'Very', 'contains_markdown_table': False, 'description': False, 'language': 'en', 'source': 'https://python.langchain.com/docs/expression_language/cookbook/retrieval', 'talks_about_chain': True, 'talks_about_expression_language': True, 'talks_about_retriever': True, 'talks_about_vectorstore': False, 'title': 'RAG | 🦜️🔗 Langchain'}),
 Document(page_content='## Manipulating outputs/inputs\u200b\n\nMaps can be useful for manipulating the output of one Runnable to match the input format of the next Runnable in a sequence.\n\n```python\nfrom langchain.embeddings import OpenAIEmbeddings\nfrom langchain.schema.output_parser import StrOutputParser\nfrom langchain.schema.runnable import RunnablePassthrough\nfrom langchain.vectorstores import FAISS\n\nvectorstore = FAISS.from_texts(["harrison worked at kensho"], embedding=OpenAIEmbeddings())\nretriever = vectorstore.as_retriever()\ntemplate = """Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n"""\nprompt = ChatPromptTemplate.from_template(template)\n\nretrieval_chain = (\n    {"context": retriever, "question": RunnablePassthrough()} \n    | prompt \n    | model \n    | StrOutputParser()\n)\n\nretrieval_chain.invoke("where did harrison work?")', metadata={'code_snippet': True, 'completness': 'Very', 'contains_markdown_table': False, 'description': True, 'language': 'en', 'source': 'https://python.langchain.com/docs/expression_language/how_to/map', 'talks_about_chain': True, 'talks_about_expression_language': True, 'talks_about_retriever': True, 'talks_about_vectorstore': True, 'title': 'Use RunnableMaps | 🦜️🔗 Langchain'}),
 Document(page_content='## Without References\u200b\n\nWhen references aren\'t available, you can still predict the preferred response.\nThe results will reflect the evaluation model\'s preference, which is less reliable and may result\nin preferences that are factually incorrect.\n\n```python\nfrom langchain.evaluation import load_evaluator\n\nevaluator = load_evaluator("pairwise_string")\n```\n\n> **API Reference:**\n> - [load_evaluator](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.loading.load_evaluator.html)\n\n```python\nevaluator.evaluate_string_pairs(\n    prediction="Addition is a mathematical operation.",\n    prediction_b="Addition is a mathematical operation that adds two numbers to create a third number, the \'sum\'.",\n    input="What is addition?",\n)\n```', metadata={'code_snippet': True, 'completness': 'Very', 'contains_markdown_table': False, 'description': True, 'language': 'en', 'source': 'https://python.langchain.com/docs/guides/evaluation/comparison/pairwise_string', 'talks_about_chain': True, 'talks_about_expression_language': True, 'talks_about_retriever': True, 'talks_about_vectorstore': False, 'title': 'Pairwise String Comparison | 🦜️🔗 Langchain'})]

relevant_documents = retriever.get_relevant_documents(
    "useful documents that talk about expression language and retrievers or vectorstores"
)
relevant_documents

query='expression language' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='completness', value='Very'), Operation(operator=<Operator.OR: 'or'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='talks_about_retriever', value=True), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='talks_about_vectorstore', value=True)])]) limit=None

[Document(page_content='# Code writing\n\nExample of how to use LCEL to write Python code.\n\n```python\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate\nfrom langchain.schema.output_parser import StrOutputParser\nfrom langchain.utilities import PythonREPL\n```\n\n> **API Reference:**\n> - [ChatOpenAI](https://api.python.langchain.com/en/latest/chat_models/langchain.chat_models.openai.ChatOpenAI.html)\n> - [ChatPromptTemplate](https://api.python.langchain.com/en/latest/prompts/langchain.prompts.chat.ChatPromptTemplate.html)\n> - [SystemMessagePromptTemplate](https://api.python.langchain.com/en/latest/prompts/langchain.prompts.chat.SystemMessagePromptTemplate.html)\n> - [HumanMessagePromptTemplate](https://api.python.langchain.com/en/latest/prompts/langchain.prompts.chat.HumanMessagePromptTemplate.html)\n> - [StrOutputParser](https://api.python.langchain.com/en/latest/schema/langchain.schema.output_parser.StrOutputParser.html)\n> - [PythonREPL](https://api.python.langchain.com/en/latest/utilities/langchain.utilities.python.PythonREPL.html)\n\n```python\ntemplate = """Write some python code to solve the user\'s problem. \n\nReturn only python code in Markdown format, e.g.:\n\n```python\n....\n```"""\nprompt = ChatPromptTemplate.from_messages(\n    [("system", template), ("human", "{input}")]\n)\n\nmodel = ChatOpenAI()\n```\n\n```python\ndef _sanitize_output(text: str):\n    _, after = text.split("```python")\n    return after.split("```")[0]\n```\n\n```python\nchain = prompt | model | StrOutputParser() | _sanitize_output | PythonREPL().run\n```\n\n```python\nchain.invoke({"input": "whats 2 plus 2"})\n```', metadata={'code_snippet': True, 'completness': 'Very', 'contains_markdown_table': True, 'description': True, 'language': 'en', 'source': 'https://python.langchain.com/docs/expression_language/cookbook/code_writing', 'talks_about_chain': True, 'talks_about_expression_language': True, 'talks_about_retriever': True, 'talks_about_vectorstore': True, 'title': 'Code writing | 🦜️🔗 Langchain'}),
 Document(page_content='```\n\n```python\nchain = (\n    {"context": retriever, "question": RunnablePassthrough()} \n    | prompt \n    | model \n    | StrOutputParser()\n)\n```\n\n```python\nchain.invoke("where did harrison work?")\n```\n\n```python\ntemplate = """Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n\nAnswer in the following language: {language}\n"""\nprompt = ChatPromptTemplate.from_template(template)\n\nchain = {\n    "context": itemgetter("question") | retriever, \n    "question": itemgetter("question"), \n    "language": itemgetter("language")\n} | prompt | model | StrOutputParser()\n```\n\n```python\nchain.invoke({"question": "where did harrison work", "language": "italian"})\n```', metadata={'code_snippet': True, 'completness': 'Very', 'contains_markdown_table': False, 'description': True, 'language': 'en', 'source': 'https://python.langchain.com/docs/expression_language/cookbook/retrieval', 'talks_about_chain': True, 'talks_about_expression_language': True, 'talks_about_retriever': True, 'talks_about_vectorstore': False, 'title': 'RAG | 🦜️🔗 Langchain'}),
 Document(page_content='## Using Constitutional Principles\u200b\n\nCustom rubrics are similar to principles from [Constitutional AI](https://arxiv.org/abs/2212.08073). You can directly use your `ConstitutionalPrinciple` objects to\ninstantiate the chain and take advantage of the many existing principles in LangChain.\n\n```python\nfrom langchain.chains.constitutional_ai.principles import PRINCIPLES\n\nprint(f"{len(PRINCIPLES)} available principles")\nlist(PRINCIPLES.items())[:5]\n```\n\n```python\nevaluator = load_evaluator(\n    EvaluatorType.CRITERIA, criteria=PRINCIPLES["harmful1"]\n)\neval_result = evaluator.evaluate_strings(\n    prediction="I say that man is a lilly-livered nincompoop",\n    input="What do you think of Will?",\n)\nprint(eval_result)\n```\n\n## Configuring the LLM\u200b\n\nIf you don\'t specify an eval LLM, the `load_evaluator` method will initialize a `gpt-4` LLM to power the grading chain. Below, use an anthropic model instead.\n\n```python\n# %pip install ChatAnthropic\n# %env ANTHROPIC_API_KEY=<API_KEY>\n```\n\n```python\nfrom langchain.chat_models import ChatAnthropic\n\nllm = ChatAnthropic(temperature=0)\nevaluator = load_evaluator("criteria", llm=llm, criteria="conciseness")\n```\n\n> **API Reference:**\n> - [ChatAnthropic](https://api.python.langchain.com/en/latest/chat_models/langchain.chat_models.anthropic.ChatAnthropic.html)\n\n```python\neval_result = evaluator.evaluate_strings(\n    prediction="What\'s 2+2? That\'s an elementary question. The answer you\'re looking for is that two and two is four.",\n    input="What\'s 2+2?",\n)\nprint(eval_result)\n```', metadata={'code_snippet': True, 'completness': 'Very', 'contains_markdown_table': True, 'description': True, 'language': 'en', 'source': 'https://python.langchain.com/docs/guides/evaluation/string/criteria_eval_chain', 'talks_about_chain': True, 'talks_about_expression_language': True, 'talks_about_retriever': True, 'talks_about_vectorstore': True, 'title': 'Criteria Evaluation | 🦜️🔗 Langchain'}),
 Document(page_content='# You can load by enum or by raw python string\nevaluator = load_evaluator(\n    "embedding_distance", distance_metric=EmbeddingDistance.EUCLIDEAN\n)\n```\n\n## Select Embeddings to Use\u200b\n\nThe constructor uses `OpenAI` embeddings by default, but you can configure this however you want. Below, use huggingface local embeddings\n\n```python\nfrom langchain.embeddings import HuggingFaceEmbeddings\n\nembedding_model = HuggingFaceEmbeddings()\nhf_evaluator = load_evaluator("embedding_distance", embeddings=embedding_model)\n```\n\n> **API Reference:**\n> - [HuggingFaceEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.huggingface.HuggingFaceEmbeddings.html)\n\n```python\nhf_evaluator.evaluate_strings(prediction="I shall go", reference="I shan\'t go")\n```\n\n```python\nhf_evaluator.evaluate_strings(prediction="I shall go", reference="I will go")\n```\n\n[](None)_1. Note: When it comes to semantic similarity, this often gives better results than older string distance metrics (such as those in the [StringDistanceEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.string_distance.base.StringDistanceEvalChain.html#langchain.evaluation.string_distance.base.StringDistanceEvalChain)), though it tends to be less reliable than evaluators that use the LLM directly (such as the [QAEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.qa.eval_chain.QAEvalChain.html#langchain.evaluation.qa.eval_chain.QAEvalChain) or [LabeledCriteriaEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.criteria.eval_chain.LabeledCriteriaEvalChain.html#langchain.evaluation.criteria.eval_chain.LabeledCriteriaEvalChain)) _', metadata={'code_snippet': True, 'completness': 'Very', 'contains_markdown_table': True, 'description': True, 'language': 'en', 'source': 'https://python.langchain.com/docs/guides/evaluation/string/embedding_distance', 'talks_about_chain': True, 'talks_about_expression_language': True, 'talks_about_retriever': True, 'talks_about_vectorstore': True, 'title': 'Embedding Distance | 🦜️🔗 Langchain'})]

Librerías

Carga de datos

Inicializado de modelo de lenguaje

Etiquetado de documentos

Creación de esquema de etiquetas

Creación de cadena de generación de etiquetas (etiquetador)

Ejemplos de uso del etiquetador

Etiquetado de documentos

Indexado de documentos

Recuperación de documentos con un Self Retriever

Creación de interfaz de los metadatos disponibles en el índice

Creación de retriever

Recuperación de documentos con el retriever

Recuperación de documentos con un `Self Retriever`

Creación de `retriever`

Recuperación de documentos con el `retriever`