I had a 158,000-row product catalog. Each row: a free-text description. I needed structured attributes out of it: brand, category, material, sentiment score.
So I wrote the obvious thing. A for-loop, one GPT call per row, dump results into a list. Ran it Friday evening. Monday morning: $400 spent, crashed at row 91K, no checkpoint. Cool.
Started over. That’s when I decided to build the tool properly.
The gap nobody’s filling
There are plenty of LLM frameworks out there. LangChain does chains and agents. LlamaIndex does RAG. DSPy optimizes prompts. They’re all good at what they do.
But here’s the thing, none of them answer a much dumber question:
I have a CSV. I want to run an LLM on every row. Give me back new columns.
No agents. No retrieval graph. Just “map this prompt over tabular data and collect structured results.” Data engineers do this all the time (classification, entity extraction, scoring) and every single one writes the same brittle glue code from scratch.
So I built Ondine
Ondine is a Python SDK. You give it a DataFrame , pandas or Polars, a prompt, and a model. It gives you back new columns.
from ondine import QuickPipeline
result = QuickPipeline(
source="products.csv",
prompt="Extract brand, category, and sentiment from the description",
model="gpt-4o-mini",
budget_limit=25.00,
).run()
print(result[["brand", "category", "sentiment"]].head())
That’s the whole API for 80% of use cases. The other 20% uses the full pipeline builder, but I’ll skip that here.
The features I built because they burned me
Checkpoint/resume was the first thing I implemented. Obviously. I’d already lost $400 to one crash. Ondine saves state to disk continuously crash at row 50K of 200K, restart, it picks up at 50K. The number of runs this has saved me from restarting is genuinely embarrassing.
Cost control came second. Before any API calls happen, Ondine estimates what the full run will cost. You set a hard budget cap in dollars. When it hits the cap, it stops. Not “sends you a warning email.” Stops. I wanted something I could leave running overnight without checking my bank account in the morning.
pipeline = QuickPipeline(
source="big_dataset.csv",
prompt="Classify this product",
model="gpt-4o-mini",
budget_limit=50.00, # hard stop
)
Structured output is the one people underestimate until they hit it. You ask GPT to return JSON, and maybe 95% of the time it works. But at 100K rows, that 5% means 5,000 broken responses. Ondine enforces Pydantic schemas and re-prompts automatically on malformed output. You define the shape, it guarantees it.
Multi-row batching is where it gets interesting. Instead of one API call per row, Ondine packs N rows into a single call. The LLM processes all N at once. Same total tokens, but 200 HTTP calls instead of 10,000 for a 10K-row dataset at batch_size=50. The throughput difference is massive.
Then there’s the anti-hallucination layer, this one I built later, after I realized the LLM was occasionally inventing brand names that didn’t appear anywhere in the source text. A post processing context store (Rust + SQLite + FTS5) checks whether outputs are grounded in the input. Catches contradictions between duplicate rows too. Adds maybe 3% overhead to the pipeline.
What Ondine is not
Not an agent framework. Not a RAG pipeline builder (though it has a knowledge store if you need to inject reference docs into the prompt). Not a prompt engineering tool.
It’s a batch processor for tabular data. If your job is “I have a spreadsheet and need AI to fill in columns,” that’s the use case. Anything more exotic and you probably want LangChain or LlamaIndex.
Provider support
Switching models is one line:
model="gpt-4o-mini" # OpenAI
model="groq/llama-3.1-70b" # Groq
model="ollama/mistral" # Local via Ollama
model="mlx/meta-llama/..." # Apple Silicon, no server needed
100+ providers through LiteLLM. Your pipeline code doesn’t change.
Try it
pip install ondine
from ondine import QuickPipeline
result = QuickPipeline(
source="your_data.csv",
prompt="Your task here",
model="gpt-4o-mini",
).run()
GitHub: github.com/ptimizeroracle/ondine
Website: ondine.dev
Docs: docs.ondine.dev
MIT licensed. If you run into rough edges, I’m on GitHub Issues — I respond to everything.