Skyflo: AI agent for Cloud & DevOps

Hero banner showing Skyflo.ai as an AI agent for Cloud and DevOps, positioned as a centralized command interface that brings control, visibility, and safety to Kubernetes and CI/CD operations.

Cloud and DevOps work is not hard because the commands are hard.

It is hard because your context is fragmented.

  • Terminals, dashboards, logs, CI, rollouts
  • Different auth models per system
  • Different failure modes per workflow
  • No shared audit trail of what happened and why

Skyflo.ai is my attempt to reduce that fragmentation.

Skyflo is an open-source AI agent for Cloud and DevOps.
It unifies Kubernetes operations and CI/CD behind a natural language interface, with approvals built in.

The important part is not “chat with prod”.

The important part is control.

What Skyflo is

Skyflo is a command center, an agent engine, and a standardized tool layer.

  • Command Center (UI): a chat interface that streams every step in real time
  • Engine: a service that runs a LangGraph workflow and turns intent into safe tool calls
  • MCP Server: a tool server exposing standardized integrations for Kubernetes and CI/CD

You express intent in plain English.

What executes is always a validated tool call.

What Skyflo is not

Skyflo is not “give an LLM kubectl and pray”.

If you want autonomous mutation in production without approvals, you are optimizing for demos.
You are not optimizing for reliability.

Skyflo is designed for operators who want automation without losing control.

  • SREs
  • DevOps engineers
  • cloud architects
  • platform teams
  • security minded operators

The safety model

Diagram illustrating Skyflo’s safety model, showing the agent workflow phases Plan, Execute, and Verify, with feedback loops and explicit verification to enforce controlled, human-approved operations.

Skyflo is built around a simple policy:

The agent can propose and prepare. You approve mutations.

That policy shows up in four places.

1) Human in the loop for mutations

Any WRITE operation requires explicit approval.

Examples:

  • apply, rollout promote or cancel
  • Helm upgrade or rollback
  • actions that stop or cancel builds

Read only workflows can run end to end.

2) Plan → Execute → Verify

” width=”800″ height=”450″>

This is the only loop that matters in operations.

Skyflo runs an iterative workflow:

  1. Plan: interpret intent and perform lightweight discovery when needed
  2. Execute: call tools, then pause for approval if the next step is a write
  3. Verify: validate outcomes against intent, then continue or stop

3) Everything streams in real time

Operators do not trust black boxes.

Skyflo streams what it is doing as it does it:

  • model output
  • tool progress
  • tool results
  • workflow events
  • approvals and decisions

This is operational safety.
It is also how you debug the agent.

4) Standardized tool execution via MCP

Integration work is where most agent projects fail.

Skyflo uses MCP so tools are:

  • discoverable
  • typed
  • validated
  • documented
  • separable from the agent logic

The Engine does not know kubectl.
It knows tools.

Supported tools (today)

Unified agent diagram showing Skyflo’s MCP server exposing standardized integrations for Kubernetes, Jenkins, Helm, and Argo, allowing a single agent to safely operate across core Cloud and DevOps tools.

Skyflo ships with standardized tools for:

  • Kubernetes: discovery, get and describe, logs and exec, safe apply and diff flows
  • Argo Rollouts: status, pause and resume, promote and cancel, progressive delivery visibility
  • Helm: search, install, upgrade, rollback with dry run and diff first safety
  • Jenkins: jobs, builds, logs, SCM context, identity, secure auth, CSRF handling

Write operations always require approval.

What real workflows look like

These prompts map to real on call muscle memory.

Debug a production issue

Show me the last 200 lines of logs for checkout in production. If there are errors, summarize them. Then check if any rollout is in progress.

What you should see:

  • discovery of namespace, deployment, and pods
  • logs and events
  • rollout state inspection
  • a structured summary

Progressive delivery with guardrails

Canary rollout auth-backend in dev through 10/25/50/100 steps. Pause at 25% if error rate increases.

What you should see:

  • a rollout plan and read only checks first
  • an approval gate before any mutation
  • verification after each step

Jenkins investigation

Pull logs for the last failed build of job X. Extract the first failing stage and tell me what changed since the last green build.

What you should see:

  • build discovery
  • log retrieval
  • structured extraction
  • a concrete next step

Quick start

Install Skyflo.ai into a Kubernetes cluster:

curl -sL https://raw.githubusercontent.com/skyflo-ai/skyflo/main/deployment/install.sh -o install.sh
chmod +x install.sh
./install.sh

Skyflo supports multiple LLM providers via LiteLLM, including self hosted models.

Contributing: step into the operator’s seat

The best way to contribute to Skyflo is to first use it the way it is meant to be used.

Not by reading issues.
Not by scanning PRs.
By running it, observing it, and tracing how decisions flow through the system.

Think of this as stepping into the contributor’s shoes.

1) Run Skyflo locally and watch it work

Start by cloning the repo and running Skyflo on your own machine or cluster.

Do not rush to change anything yet.

Use it as an operator would:

  • issue a few realistic prompts
  • watch how intent becomes a plan
  • observe where approvals are enforced
  • see how tools are discovered and executed
  • follow the streamed events in the UI

At this stage, the goal is intuition, not contribution.
You should be able to explain what the agent is doing at each step without reading the code.

2) Trace the agentic loop end to end

Once you are comfortable with the surface, go deeper.

Add logs and traces to the engine.

Follow a single request through the entire lifecycle:

  • intent parsing
  • planning and discovery
  • tool selection and execution
  • approval gates
  • verification and termination

This is where most understanding is built.

You will see where state lives, how LangGraph nodes transition, and why certain steps are deliberately slow or blocked.
You will also see why “just let it run” is not acceptable in production.

3) Study closed issues and merged PRs

Before picking something new, look at what has already shipped.

Read closed issues and their corresponding PRs.

Focus on:

  • what problem was being solved
  • what safety constraints shaped the solution
  • how tools were extended or restricted
  • how streaming, approvals, and verification were preserved

This gives you a strong signal of project standards.
You will quickly see what kinds of changes are welcomed and which ones are rejected.

4) Pick a good first issue, or create one

At this point, picking an issue becomes straightforward.

There are almost always good first issues available.
If you cannot find one that matches your understanding, create one yourself.

Good issues usually come from observations like:

  • a workflow that feels clunky when used
  • a missing verification step
  • a tool that exposes too much power
  • a safety check that should exist but does not

Open the issue.
Propose the shape of the solution.
Then start implementing it.

If you follow this path once, you will not just contribute to Skyflo.

You will understand how production grade agent systems are built, debugged, and kept safe.
That experience is far more valuable than shipping another isolated feature.

Get involved

Call-to-action banner inviting readers to get involved with Skyflo.ai, encouraging contributors to join the open-source mission and help shape a safer AI agent for Cloud and DevOps.

Pick an issue and start contributing. Or create a new issue and start a discussion.

Connect with me

Personal banner highlighting karan.social, directing readers to connect with Karan Jagtiani, the founder of Skyflo.ai, and follow his work in Cloud, DevOps, and agentic systems.

I am Karan Jagtiani, founder of Skyflo.ai.
You can find me at karan.social

Drop a comment if you have a good first issue or runbook idea for Skyflo.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

Building a Splunk Investigator Agent with Strands Agents and Amazon Bedrock AgentCore

Next Post

Cincoze MD-3000 DIN-Rail Machine Vision Computer

Related Posts