RAG is one of those patterns that sounds more complicated than it has to be.
At its core, retrieval-augmented generation is just:
- Store some documents
- Embed the user’s question
- Find the most relevant docs
- Send those docs to the model as context
- Return an answer with sources
I built a small Python example that shows that flow end to end with Telnyx AI Inference.
Repo: https://github.com/team-telnyx/telnyx-code-examples/tree/main/build-rag-with-telnyx-inference-python
What it does
The app exposes a Flask API for asking questions against a tiny in-memory knowledge base.
You send a question like:
{
"question": "Production signup broke after rotating an API key. Logs show 401 errors. What should we check first?"
}
The app
- creates an embedding for the question
- compares it against embeddings for the sample documents
- retrieves the most relevant sources
- sends those sources to a chat model
- returns a grounded answer plus source titles
Why this pattern is useful
A normal LLM call only knows what is in the prompt and the model’s training data. RAG lets your app answer with your own docs, policies, product information, support notes, or internal knowledge base. That makes it useful for things like:
- support assistants
- internal docs search
- onboarding copilots
- product Q&A
- troubleshooting workflows
- agent tools that need source-grounded answers
How the example works
The example keeps the moving parts intentionally small.
There is an in-memory DOCUMENTS list. On the first request, the app creates embeddings for those documents and caches them. When a user asks a question, the app embeds the question, compares it to the document embeddings, and sends the best matches to the model.
The answer response includes source titles, so you can see what context the app used instead of treating the model like a black box.
Try it
Clone the repo:
git clone https://github.com/team-telnyx/telnyx-code-examples.git
cd telnyx-code-examples/build-rag-with-telnyx-inference-python
Install dependencies and run the app:
pip install -r requirements.txt
cp .env.example .env
python app.py
Ask a question:
curl -X POST http://localhost:5000/rag/ask
-H "Content-Type: application/json"
-d '{
"question": "Production signup broke after rotating an API key. Logs show 401 errors. What should we check first?"
}'
Why I like this example
It is deliberately small, but it gives you the core pieces of a real RAG workflow:
- embeddings
- retrieval
- source grounding
- chat completion
- a clean API surface
From there, you could swap the in-memory docs for a vector database, pull content from product docs, or turn it into a support assistant.
The Telnyx code examples repo is also structured to be agent-readable, so coding agents can inspect these examples and help you extend them into fuller applications.
Resources
Code example
Telnyx AI repo with skills/toolkits
Telnyx AI Inference docs