Software

5 minute read

Skyflo: AI agent for Cloud & DevOps

Marisa Tranchina

December 31, 2025

Cloud and DevOps work is not hard because the commands are hard.

It is hard because your context is fragmented.

Terminals, dashboards, logs, CI, rollouts
Different auth models per system
Different failure modes per workflow
No shared audit trail of what happened and why

Skyflo.ai is my attempt to reduce that fragmentation.

Skyflo is an open-source AI agent for Cloud and DevOps.
It unifies Kubernetes operations and CI/CD behind a natural language interface, with approvals built in.

The important part is not “chat with prod”.

The important part is control.

What Skyflo is

Skyflo is a command center, an agent engine, and a standardized tool layer.

Command Center (UI): a chat interface that streams every step in real time
Engine: a service that runs a LangGraph workflow and turns intent into safe tool calls
MCP Server: a tool server exposing standardized integrations for Kubernetes and CI/CD

You express intent in plain English.

What executes is always a validated tool call.

What Skyflo is not

Skyflo is not “give an LLM kubectl and pray”.

If you want autonomous mutation in production without approvals, you are optimizing for demos.
You are not optimizing for reliability.

Skyflo is designed for operators who want automation without losing control.

SREs
DevOps engineers
cloud architects
platform teams
security minded operators

The safety model

Skyflo is built around a simple policy:

The agent can propose and prepare. You approve mutations.

That policy shows up in four places.

1) Human in the loop for mutations

Any WRITE operation requires explicit approval.

Examples:

apply, rollout promote or cancel
Helm upgrade or rollback
actions that stop or cancel builds

Read only workflows can run end to end.

2) Plan → Execute → Verify

” width=”800″ height=”450″>

This is the only loop that matters in operations.

Skyflo runs an iterative workflow:

Plan: interpret intent and perform lightweight discovery when needed
Execute: call tools, then pause for approval if the next step is a write
Verify: validate outcomes against intent, then continue or stop

3) Everything streams in real time

Operators do not trust black boxes.

Skyflo streams what it is doing as it does it:

model output
tool progress
tool results
workflow events
approvals and decisions

This is operational safety.
It is also how you debug the agent.

4) Standardized tool execution via MCP

Integration work is where most agent projects fail.

Skyflo uses MCP so tools are:

discoverable
typed
validated
documented
separable from the agent logic

The Engine does not know kubectl.
It knows tools.

Supported tools (today)

Skyflo ships with standardized tools for:

Kubernetes: discovery, get and describe, logs and exec, safe apply and diff flows
Argo Rollouts: status, pause and resume, promote and cancel, progressive delivery visibility
Helm: search, install, upgrade, rollback with dry run and diff first safety
Jenkins: jobs, builds, logs, SCM context, identity, secure auth, CSRF handling

Write operations always require approval.

What real workflows look like

These prompts map to real on call muscle memory.

Debug a production issue

Show me the last 200 lines of logs for checkout in production. If there are errors, summarize them. Then check if any rollout is in progress.

What you should see:

discovery of namespace, deployment, and pods
logs and events
rollout state inspection
a structured summary

Progressive delivery with guardrails

Canary rollout auth-backend in dev through 10/25/50/100 steps. Pause at 25% if error rate increases.

What you should see:

a rollout plan and read only checks first
an approval gate before any mutation
verification after each step

Jenkins investigation

Pull logs for the last failed build of job X. Extract the first failing stage and tell me what changed since the last green build.

What you should see:

build discovery
log retrieval
structured extraction
a concrete next step

Quick start

Install Skyflo.ai into a Kubernetes cluster:

curl -sL https://raw.githubusercontent.com/skyflo-ai/skyflo/main/deployment/install.sh -o install.sh
chmod +x install.sh
./install.sh

Skyflo supports multiple LLM providers via LiteLLM, including self hosted models.

Contributing: step into the operator’s seat

The best way to contribute to Skyflo is to first use it the way it is meant to be used.

Not by reading issues.
Not by scanning PRs.
By running it, observing it, and tracing how decisions flow through the system.

Think of this as stepping into the contributor’s shoes.

1) Run Skyflo locally and watch it work

Start by cloning the repo and running Skyflo on your own machine or cluster.

Do not rush to change anything yet.

Use it as an operator would:

issue a few realistic prompts
watch how intent becomes a plan
observe where approvals are enforced
see how tools are discovered and executed
follow the streamed events in the UI

At this stage, the goal is intuition, not contribution.
You should be able to explain what the agent is doing at each step without reading the code.

2) Trace the agentic loop end to end

Once you are comfortable with the surface, go deeper.

Add logs and traces to the engine.

Follow a single request through the entire lifecycle:

intent parsing
planning and discovery
tool selection and execution
approval gates
verification and termination

This is where most understanding is built.

You will see where state lives, how LangGraph nodes transition, and why certain steps are deliberately slow or blocked.
You will also see why “just let it run” is not acceptable in production.

3) Study closed issues and merged PRs

Before picking something new, look at what has already shipped.

Read closed issues and their corresponding PRs.

Focus on:

what problem was being solved
what safety constraints shaped the solution
how tools were extended or restricted
how streaming, approvals, and verification were preserved

This gives you a strong signal of project standards.
You will quickly see what kinds of changes are welcomed and which ones are rejected.

4) Pick a good first issue, or create one

At this point, picking an issue becomes straightforward.

There are almost always good first issues available.
If you cannot find one that matches your understanding, create one yourself.

Good issues usually come from observations like:

a workflow that feels clunky when used
a missing verification step
a tool that exposes too much power
a safety check that should exist but does not

Open the issue.
Propose the shape of the solution.
Then start implementing it.

If you follow this path once, you will not just contribute to Skyflo.

You will understand how production grade agent systems are built, debugged, and kept safe.
That experience is far more valuable than shipping another isolated feature.

Get involved

Pick an issue and start contributing. Or create a new issue and start a discussion.

GitHub: https://github.com/skyflo-ai/skyflo
Issues: https://github.com/skyflo-ai/skyflo/issues
Discord: https://discord.gg/kCFNavMund

Connect with me

I am Karan Jagtiani, founder of Skyflo.ai.
You can find me at karan.social

Drop a comment if you have a good first issue or runbook idea for Skyflo.

Building a Splunk Investigator Agent with Strands Agents and Amazon Bedrock AgentCore

December 31, 2025

Quality Assurance

Cincoze MD-3000 DIN-Rail Machine Vision Computer

December 31, 2025

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

VIDEO PODCAST | Heeding the Uncertainty of the Supply Chain

cfix: Architecting a seamless diagnostic bridge between Linux runtime errors and GitHub Copilot’s LLM-powered intelligence

Iterative Project Management: Methods and Tools

Trending Tags

Skyflo: AI agent for Cloud & DevOps

What Skyflo is

What Skyflo is not

The safety model

1) Human in the loop for mutations

2) Plan → Execute → Verify

3) Everything streams in real time

4) Standardized tool execution via MCP

Supported tools (today)

What real workflows look like

Debug a production issue

Progressive delivery with guardrails

Jenkins investigation

Quick start

Contributing: step into the operator’s seat

1) Run Skyflo locally and watch it work

2) Trace the agentic loop end to end

3) Study closed issues and merged PRs

4) Pick a good first issue, or create one

Get involved

Connect with me

Leave a Reply Cancel reply

Previous Post

Building a Splunk Investigator Agent with Strands Agents and Amazon Bedrock AgentCore

Next Post

Cincoze MD-3000 DIN-Rail Machine Vision Computer

Skyflo: AI agent for Cloud & DevOps

What Skyflo is

What Skyflo is not

The safety model

1) Human in the loop for mutations

2) Plan → Execute → Verify

3) Everything streams in real time

4) Standardized tool execution via MCP

Supported tools (today)

What real workflows look like

Debug a production issue

Progressive delivery with guardrails

Jenkins investigation

Quick start

Contributing: step into the operator’s seat

1) Run Skyflo locally and watch it work

2) Trace the agentic loop end to end

3) Study closed issues and merged PRs

4) Pick a good first issue, or create one

Get involved

Connect with me

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts