In the evolving landscape of AI and natural language processing, Retrieval-Augmented Generation (RAG) has become a powerful technique for building applications that combine retrieval capabilities with generative models. By integrating RAG with Google’s Gemini models and LangChain, developers can create robust systems capable of handling complex queries, extracting information from vast datasets, and generating insightful responses based on retrieved data.
In this tutorial, I will guide you through the process of setting up your environment and tools needed for working with RAG, Gemini, and LangChain. Whether you’re developing a conversational AI, a knowledge extraction system, or document-based question-answering capabilities, this guide will help you get started quickly and effectively.
Prerequisites
Before diving into the code, you need to ensure that your environment is ready. In this guide, we will cover the installation of essential utilities such as tools for handling PDFs, optical character recognition (OCR), and various Python packages required for working with generative AI and retrieval mechanisms.
We’ll also configure access to Google’s Gemini models using LangChain, which will empower you to build and customize intelligent retrieval and generation pipelines.
Step 1: Setting Up Dependencies
To get started, you’ll need to install several key dependencies that support working with PDFs, image processing, and OCR, as well as the core LangChain and Google Generative AI packages.
Run the following commands in your terminal to install the necessary tools and libraries:
!apt-get install poppler-utils
!apt-get install tesseract-ocr
!pip install langchain_experimental langchain_core
!pip install google-generativeai
!pip install google-ai-generativelanguage
!pip install langchain-google-genai
!pip install "langchain[docarray]"
!pip install pydantic==1.10.8
!pip install pdfminer.six
!pip install -U docarray
!pip install unstructured
!pip install pillow-heif
!pip install pdf2image
!pip install unstructured_inference
!pip install pytesseract
!pip install unstructured-pytesseract
Next, you’ll need to set up your Google API key. Visit the Google Cloud Console, create a project if you don’t have one, enable the Generative Language API, and create an API key in the credentials section. Once you have your key, use it in your code as follows:
import os
import google.generativeai as genai
os.environ["GOOGLE_API_KEY"] = "YOUR_GOOGLE_API_KEY"
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
Initializing Models
With your environment set up, you can now initialize the necessary models. You’ll need to set up both a chat model and an embeddings model. Use the following code to accomplish this:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings
model = ChatGoogleGenerativeAI(model="gemini-pro")
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
Processing Your Document
The next step is to process your document. This involves chunking your PDF into manageable pieces and creating a vector store for efficient retrieval. Here’s how to do it:
from unstructured.partition.pdf import partition_pdf
from langchain.vectorstores import DocArrayInMemorySearch
file_path = "PATH_TO_YOUR_PDF.pdf"
raw_elements = partition_pdf(
file_path,
chunking_strategy="by_title",
infer_table_structure=True,
max_characters=1000,
new_after_n_chars = 1500,
combine_text_under_n_chars=250,
strategy="hi_res"
)
vectorstore = DocArrayInMemorySearch.from_texts(
[element.text for element in raw_elements],
embedding=embeddings
)
retriever = vectorstore.as_retriever()Building Your LangChain Pipeline
Now it’s time to build your LangChain pipeline. This involves defining a prompt template, setting up an output parser, and constructing the chain that will process your queries. Here’s the code to accomplish this:
from langchain.prompts.chat import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableMap
template = """Answer the question as a full sentence based only on the
following context:
{context}
Question: {question}
prompt = ChatPromptTemplate.from_template(template)
output_parser = StrOutputParser()
chain = RunnableMap({
"context": lambda x: retriever.get_relevant_documents(x["question"]),
"question": lambda x: x["question"]
}) | prompt | model | output_parser
Using Your RAG System
With everything set up, you can now use your RAG system to answer questions based on your document. Simply invoke the chain with a question, like this:
result = chain.invoke({"question": "What is this thingamajig?"})
print(result)
You can repeat this process with different questions as needed, allowing you to extract information from your document efficiently and accurately.
By following these steps, you’ve created a powerful RAG system using Gemini and LangChain. This system can process documents, understand context, and provide informed answers to your questions.
Retrieval-Augmented Generation: Unlocking Gemini and LangChain was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.