Cloud and DevOps work is not hard because the commands are hard.
It is hard because your context is fragmented.
- Terminals, dashboards, logs, CI, rollouts
- Different auth models per system
- Different failure modes per workflow
- No shared audit trail of what happened and why
Skyflo.ai is my attempt to reduce that fragmentation.
Skyflo is an open-source AI agent for Cloud and DevOps.
It unifies Kubernetes operations and CI/CD behind a natural language interface, with approvals built in.
The important part is not “chat with prod”.
The important part is control.
What Skyflo is
Skyflo is a command center, an agent engine, and a standardized tool layer.
- Command Center (UI): a chat interface that streams every step in real time
- Engine: a service that runs a LangGraph workflow and turns intent into safe tool calls
- MCP Server: a tool server exposing standardized integrations for Kubernetes and CI/CD
You express intent in plain English.
What executes is always a validated tool call.
What Skyflo is not
Skyflo is not “give an LLM kubectl and pray”.
If you want autonomous mutation in production without approvals, you are optimizing for demos.
You are not optimizing for reliability.
Skyflo is designed for operators who want automation without losing control.
- SREs
- DevOps engineers
- cloud architects
- platform teams
- security minded operators
The safety model
Skyflo is built around a simple policy:
The agent can propose and prepare. You approve mutations.
That policy shows up in four places.
1) Human in the loop for mutations
Any WRITE operation requires explicit approval.
Examples:
-
apply, rollout promote or cancel - Helm upgrade or rollback
- actions that stop or cancel builds
Read only workflows can run end to end.
2) Plan → Execute → Verify
This is the only loop that matters in operations.
Skyflo runs an iterative workflow:
- Plan: interpret intent and perform lightweight discovery when needed
- Execute: call tools, then pause for approval if the next step is a write
- Verify: validate outcomes against intent, then continue or stop
3) Everything streams in real time
Operators do not trust black boxes.
Skyflo streams what it is doing as it does it:
- model output
- tool progress
- tool results
- workflow events
- approvals and decisions
This is operational safety.
It is also how you debug the agent.
4) Standardized tool execution via MCP
Integration work is where most agent projects fail.
Skyflo uses MCP so tools are:
- discoverable
- typed
- validated
- documented
- separable from the agent logic
The Engine does not know kubectl.
It knows tools.
Supported tools (today)
Skyflo ships with standardized tools for:
- Kubernetes: discovery, get and describe, logs and exec, safe apply and diff flows
- Argo Rollouts: status, pause and resume, promote and cancel, progressive delivery visibility
- Helm: search, install, upgrade, rollback with dry run and diff first safety
- Jenkins: jobs, builds, logs, SCM context, identity, secure auth, CSRF handling
Write operations always require approval.
What real workflows look like
These prompts map to real on call muscle memory.
Debug a production issue
Show me the last 200 lines of logs for checkout in production. If there are errors, summarize them. Then check if any rollout is in progress.
What you should see:
- discovery of namespace, deployment, and pods
- logs and events
- rollout state inspection
- a structured summary
Progressive delivery with guardrails
Canary rollout auth-backend in dev through 10/25/50/100 steps. Pause at 25% if error rate increases.
What you should see:
- a rollout plan and read only checks first
- an approval gate before any mutation
- verification after each step
Jenkins investigation
Pull logs for the last failed build of job X. Extract the first failing stage and tell me what changed since the last green build.
What you should see:
- build discovery
- log retrieval
- structured extraction
- a concrete next step
Quick start
Install Skyflo.ai into a Kubernetes cluster:
curl -sL https://raw.githubusercontent.com/skyflo-ai/skyflo/main/deployment/install.sh -o install.sh
chmod +x install.sh
./install.sh
Skyflo supports multiple LLM providers via LiteLLM, including self hosted models.
Contributing: step into the operator’s seat
The best way to contribute to Skyflo is to first use it the way it is meant to be used.
Not by reading issues.
Not by scanning PRs.
By running it, observing it, and tracing how decisions flow through the system.
Think of this as stepping into the contributor’s shoes.
1) Run Skyflo locally and watch it work
Start by cloning the repo and running Skyflo on your own machine or cluster.
Do not rush to change anything yet.
Use it as an operator would:
- issue a few realistic prompts
- watch how intent becomes a plan
- observe where approvals are enforced
- see how tools are discovered and executed
- follow the streamed events in the UI
At this stage, the goal is intuition, not contribution.
You should be able to explain what the agent is doing at each step without reading the code.
2) Trace the agentic loop end to end
Once you are comfortable with the surface, go deeper.
Add logs and traces to the engine.
Follow a single request through the entire lifecycle:
- intent parsing
- planning and discovery
- tool selection and execution
- approval gates
- verification and termination
This is where most understanding is built.
You will see where state lives, how LangGraph nodes transition, and why certain steps are deliberately slow or blocked.
You will also see why “just let it run” is not acceptable in production.
3) Study closed issues and merged PRs
Before picking something new, look at what has already shipped.
Read closed issues and their corresponding PRs.
Focus on:
- what problem was being solved
- what safety constraints shaped the solution
- how tools were extended or restricted
- how streaming, approvals, and verification were preserved
This gives you a strong signal of project standards.
You will quickly see what kinds of changes are welcomed and which ones are rejected.
4) Pick a good first issue, or create one
At this point, picking an issue becomes straightforward.
There are almost always good first issues available.
If you cannot find one that matches your understanding, create one yourself.
Good issues usually come from observations like:
- a workflow that feels clunky when used
- a missing verification step
- a tool that exposes too much power
- a safety check that should exist but does not
Open the issue.
Propose the shape of the solution.
Then start implementing it.
If you follow this path once, you will not just contribute to Skyflo.
You will understand how production grade agent systems are built, debugged, and kept safe.
That experience is far more valuable than shipping another isolated feature.
Get involved
Pick an issue and start contributing. Or create a new issue and start a discussion.
- GitHub: https://github.com/skyflo-ai/skyflo
- Issues: https://github.com/skyflo-ai/skyflo/issues
- Discord: https://discord.gg/kCFNavMund
Connect with me
I am Karan Jagtiani, founder of Skyflo.ai.
You can find me at karan.social
Drop a comment if you have a good first issue or runbook idea for Skyflo.


” width=”800″ height=”450″>

