If you give an AI system the ability to act, you give it risk.
In earlier posts, I covered how to secure MCP servers and agentic AI systems. This post focuses on a narrower but more dangerous layer: AI skills. These are the tools that let models touch the real world.
Once a model can call an API, run code, or move data, it stops being just a reasoning engine. It becomes an operator.
That is where most security failures happen.
Terminology
In generative AI, “skills” describe the interfaces that allow a model to perform actions outside its own context.
Different vendors use different names:
- Tools: Function calling and MCP-based interactions
- Plugins: Web-based extensions used by chatbots
- Actions: OpenAI GPT Actions and AWS Bedrock Action Groups
- Agents: Systems that reason and execute across multiple steps
A base LLM predicts text; A skill gives it hands.
Skills are pre-defined interfaces that expose code, APIs, or workflows. When a model decides that text alone is not enough, it triggers a skill.
Anthropic treats skills as instruction-and-script bundles loaded at runtime.
OpenAI uses modular functions inside Custom GPTs and agents.
AWS implements the same idea through Action Groups.
Microsoft applies the term across Copilot and Semantic Kernel.
NVIDIA uses skills in its digital human platforms.
In the reference high-level architecture below, we can see the relations between the components:
Why Skills Are Dangerous
Every skill expands the attack surface. The model sits in the middle, deciding what to call and when. If it is tricked, the skill executes anyway.
The most common failure modes:
- Excessive agency: Skills often have broader permissions than they need. A file-management skill with system-level access is a breach waiting to happen.
- The consent gap: Users approve skills as a bundle. They rarely inspect the exact permissions. Attackers hide destructive capability inside tools that appear harmless.
- Procedural and memory poisoning: Skills that retain instructions or memory can be slowly corrupted. This does not cause an immediate failure. It changes behavior over time.
- Privilege escalation through tool chaining: Multiple tools can be combined to bypass intended boundaries. A harmless read operation becomes a write. A write becomes execution.
- Indirect prompt injection: Malicious instructions are placed in content that the model reads: emails, web pages, documents. The model follows them using its own skills.
- Data exfiltration: Skills often require access to sensitive systems. Once compromised, they can leak source code, credentials, or internal records.
- Supply chain risk: Skills rely on third-party APIs and libraries. A poisoned update propagates instantly.
- Agent-to-agent spread: In multi-agent systems, one compromised skill can affect others. Failures cascade.
- Unsafe execution and RCE: Any skill that runs code without isolation is exposed to remote code execution.
- Insecure output handling: Raw outputs passed directly to users can cause data leaks or client-side exploits.
- SSRF: Fetch-style skills can be abused to probe internal networks.
How to Secure Skills (What Actually Works)
Treat skills like production services. Because they are.
Identity and Access Management
Each skill must have its own identity. No shared credentials. No broad roles.
Permissions should be minimal and continuously evaluated. This directly addresses OWASP LLM06: Excessive Agency.
Reference: OWASP LLM06:2025 Excessive Agency
AWS Bedrock
Assign granular IAM roles per agent. Restrict regions and models with SCPs. Limit Action Groups to specific Lambda functions.
References:
- Security and governance for generative AI platforms on AWS
- Execute code and analyze data using Amazon Bedrock AgentCore Code Interpreter
OpenAI
Never expose API keys client-side. Use project-scoped keys and backend proxies.
Reference: Best Practices for API Key Safety
Input and Output Guardrails
Prompt injection is not theoretical. It is the default attack.
Map OWASP LLM risks directly to controls.
Reference: OWASP Top 10 for Large Language Model Applications
AWS Bedrock
Use Guardrails with prompt-attack detection and PII redaction.
Reference: Amazon Bedrock Guardrails
OpenAI
Use zero-retention mode for sensitive workflows.
Reference: Data controls in the OpenAI platform
Anthropic
Use constitutional prompts, but still enforce external moderation.
Reference: Building safeguards for Claude
Adversarial Testing
Red-team your agents.
Test prompt injection, RAG abuse, tool chaining, and data poisoning during development. Not after launch.
Threat modeling frameworks from OWASP, NIST, and Google apply here with minimal adaptation.
References:
DevSecOps Integration
Every endpoint a skill calls is part of your attack surface.
Run SAST and DAST on the skill code. Scan dependencies. Fail builds when violations appear.
References:
Isolation and Network Controls
Code-executing skills must run in ephemeral, sandboxed environments.
No host access. No unrestricted outbound traffic.
Use private networking wherever possible:
Logging, Monitoring, and Privacy
If you cannot audit skill usage, you cannot secure it.
Enable full invocation logging and integrate with existing SIEM tools.
Ensure provider data-handling terms match your risk profile. Not all plans are equal.
References:
- Monitor Amazon Bedrock API calls using CloudTrail
- OpenAI Audit Logs
- Claude Agent Skills – Security Considerations
Incident Response and Human Oversight
Update incident response plans to include AI-specific failures.
For high-risk actions, require human approval. This is the simplest and most reliable control against runaway agents.
References:
- Understand the threat landscape
- Implement human-in-the-loop confirmation with Amazon Bedrock Agents
- OpenAI Safety best practices
Summary
AI skills are the execution layer of generative systems. They turn models from advisors into actors.
That shift introduces real security risk: excessive permissions, prompt injection, data leakage, and cascading agent failures.
Secure skills the same way you secure production services. Strong identity. Least privilege. Isolation. Guardrails. Monitoring. Human oversight.
There is no final state. Platforms change. Attacks evolve. Continuous testing is the job.
About the Author
Eyal Estrin is a cloud and information security architect and AWS Community Builder, with more than 25 years in the industry. He is the author of Cloud Security Handbook and Security for Cloud Native Applications.
The views expressed are his own.
