Software

7 minute read

Solved: Surveiller le cloud (GCP, AWS) avec Centreon? ou AlertManager?

January 27, 2026

🚀 Executive Summary

TL;DR: Organizations face a dilemma monitoring hybrid cloud (AWS, GCP) and on-premise infrastructure: extend traditional Centreon or adopt cloud-native Prometheus/AlertManager. The article compares these approaches, including a hybrid model, to guide selection based on infrastructure dynamics and operational needs.

🎯 Key Takeaways

Centreon, using API-based Plugin Packs, offers a unified dashboard and leverages existing IT skillsets for monitoring stable hybrid environments, but can face latency and scalability issues with ephemeral cloud resources.
Prometheus and AlertManager are designed for dynamic, cloud-native, and containerized workloads, featuring powerful service discovery, PromQL for flexible querying, and an efficient time-series data model.
A hybrid strategy, where Prometheus collects cloud-native metrics and forwards critical alerts via AlertManager webhooks to Centreon, allows leveraging Prometheus’s flexibility while maintaining Centreon’s mature centralized alert management.

Choosing between Centreon and Prometheus with AlertManager for cloud monitoring in AWS and GCP requires a deep dive into architecture, scalability, and integration. This guide compares both solutions, provides configuration examples, and outlines a hybrid approach to help you select the right toolset for your cloud and on-premise infrastructure.

The Challenge: Cloud Monitoring Crossroads

You’re managing a hybrid infrastructure with critical workloads on-premise and across multiple cloud providers like AWS and GCP. Your existing monitoring stack, perhaps built around a traditional tool like Centreon, is robust for your servers and network gear. However, as you scale in the cloud, you face a new set of challenges:

Ephemeral Infrastructure: Cloud resources (VMs, containers, functions) are created and destroyed dynamically. Traditional host-based, static monitoring struggles to keep up.
Service-Oriented Metrics: You need to monitor managed services like RDS, S3, BigQuery, and Pub/Sub, which don’t have an “agent” you can install. Monitoring is done via APIs (e.g., CloudWatch, Google Cloud Monitoring).
Metric Volume and Cardinality: Cloud-native applications, especially those using microservices and containers, generate a massive volume of high-cardinality metrics (e.g., metrics per container ID).
Tooling Mismatch: The question arises—do you extend your existing, trusted tool (Centreon) to the cloud, or adopt a cloud-native stack like Prometheus and AlertManager?

This decision impacts everything from team skillset requirements to the reliability of your alerting. Let’s explore three practical solutions to this common problem.

Solution 1: The Centreon-Centric Approach

For organizations with a significant investment in Centreon, extending it to monitor the cloud is a logical first step. This approach leverages Centreon’s powerful framework and connects it to cloud provider APIs, treating cloud services as just another set of resources to be monitored.

How It Works

Centreon integrates with cloud platforms primarily through its “Plugin Packs” and the underlying Nagios-style plugins. The workflow is typically:

Connectors: You use specific monitoring plugins (like centreon-plugin-Cloud-Aws-Api or centreon-plugin-Cloud-Gcp-Api) that query the cloud provider’s monitoring API (e.g., AWS CloudWatch, Google Cloud Monitoring).
Authentication: The Centreon poller is configured with secure credentials (e.g., an AWS IAM user with specific permissions or a GCP Service Account key) to authenticate against the API.
Service Checks: You define service checks in Centreon that execute these plugins. For example, a check for an AWS RDS instance would call the plugin, which in turn queries the CloudWatch API for metrics like CPUUtilization or FreeableMemory.
State-Based Alerting: Centreon evaluates the returned metrics against predefined WARNING and CRITICAL thresholds and generates alerts based on state changes (OK, WARNING, CRITICAL, UNKNOWN).

Example: Monitoring an AWS EC2 Instance’s CPU

First, you install the necessary AWS plugin on your Centreon poller. Then, within the Centreon UI, you would configure a new host and a service check. The underlying command might look something like this:

/usr/lib/centreon/plugins/centreon_aws_ec2_api.pl 
--plugin=cloud::aws::ec2::plugin 
--mode=cpu 
--aws-secret-key='SECRET_KEY' 
--aws-access-key='ACCESS_KEY' 
--region='eu-west-1' 
--dimension-name='InstanceId' 
--dimension-value='i-0123456789abcdef0' 
--warning-cpu-utilization='80' 
--critical-cpu-utilization='95'

This command checks the CPU utilization for a specific EC2 instance (i-0123456789abcdef0) and will change state if the utilization exceeds 80% (Warning) or 95% (Critical).

Pros & Cons

Pros:
- Unified Dashboard: Provides a single pane of glass for both on-premise and cloud resources.
- Existing Skillset: Your team can leverage their existing Centreon expertise.
- Mature Alerting: Benefits from Centreon’s robust notification, escalation, and dependency logic.
Cons:
- API Polling Latency: Relies on periodic polling of cloud APIs, which can have delays (e.g., CloudWatch metrics can have a 1-5 minute lag).
- Scalability Concerns: Can become cumbersome and slow if you are polling thousands of cloud resources, potentially hitting API rate limits.
- Less Suited for Ephemeral Resources: Auto-discovery of resources is possible but often requires more complex configuration compared to cloud-native solutions.

Solution 2: The Prometheus & AlertManager Stack

This approach embraces the cloud-native ecosystem. Prometheus is a pull-based monitoring system designed for the dynamic, service-oriented world of containers and microservices, making it a natural fit for monitoring cloud environments.

How It Works

The Prometheus stack uses a different paradigm:

Exporters & Service Discovery: Instead of agents, Prometheus “scrapes” metrics from HTTP endpoints. For cloud services, you use specialized exporters (e.g., stackdriver_exporter for GCP, cloudwatch_exporter for AWS) that query the cloud APIs and expose the metrics in a Prometheus-compatible format. Crucially, Prometheus has built-in service discovery for AWS (EC2) and GCP (GCE), automatically finding new instances to monitor.
Time-Series Database (TSDB): Prometheus stores all data as time-series, which is highly efficient for the high volume of metrics from cloud applications.
PromQL: You query and analyze data using the powerful Prometheus Query Language (PromQL), which allows for complex aggregations and calculations on the fly.
AlertManager: Alerting rules are defined in Prometheus based on PromQL expressions. When an alert fires, it is sent to AlertManager, which handles deduplication, grouping, silencing, and routing of notifications to different receivers (Slack, PagerDuty, email, etc.).

Example: Scraping GCP Metrics and Alerting on High CPU

Your prometheus.yml configuration would use service discovery to find and scrape metrics from all GCE instances in a project:

# prometheus.yml
scrape_configs:
  - job_name: 'gcp-gce-instances'
    gce_sd_configs:
      - project: 'your-gcp-project-id'
        zone: 'europe-west1-b'
        port: 9100 # Assuming node_exporter is running on this port
    relabel_configs:
      - source_labels: [__meta_gce_instance_name]
        target_label: instance

Next, you would define an alerting rule in a separate file (e.g., gce_alerts.yml):

# gce_alerts.yml
groups:
- name: gce_instance_alerts
  rules:
  - alert: HighCpuUtilization
    expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
    for: 10m
    labels:
      severity: critical
    annotations:
      summary: "High CPU utilization on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has had a CPU utilization above 90% for the last 10 minutes."

This rule will fire if any instance’s CPU utilization (calculated from the node_exporter metric) remains above 90% for 10 minutes. AlertManager would then take over to route the notification.

Pros & Cons

Pros:
- Cloud-Native Design: Built for dynamic, ephemeral environments with powerful service discovery.
- Powerful Query Language: PromQL is extremely flexible for slicing and dicing metrics.
- Vibrant Ecosystem: A huge number of official and community-built exporters, integrations, and dashboards (e.g., Grafana).
Cons:
- Steeper Learning Curve: Requires learning PromQL and a new operational model (pull vs. push/check).
- Not a Complete Solution: Prometheus focuses on metrics. For logs (Loki) and traces (Tempo), you often need to add other components. Centreon offers a more all-in-one experience.
- Long-Term Storage: Requires a separate solution like Thanos or Cortex for long-term, highly available metric storage.

Head-to-Head Comparison: Centreon vs. Prometheus/AlertManager for Cloud


Feature	Centreon	Prometheus & AlertManager
Architecture	Centralized pollers executing checks (push/active check model). State-based (OK, WARN, CRIT).	Decentralized scrapers pulling metrics from endpoints. Stores data as time-series.
Cloud Integration	Via API-based plugins (Plugin Packs). Requires manual or semi-automated configuration of hosts/services.	Native service discovery for major cloud providers. Uses exporters to query cloud APIs (e.g., `cloudwatch_exporter`).
Dynamic Environments	Can be challenging. Relies on auto-discovery modules or API scripts to keep configuration in sync.	Excellent. Service discovery automatically detects and removes targets as they are created and destroyed.
Alerting	Mature and powerful. Features complex dependencies, acknowledgements, scheduled downtime, and escalation chains built-in.	Highly flexible rules via PromQL. AlertManager handles grouping, silencing, and routing but lacks Centreon’s deep dependency logic out-of-the-box.
Data Model	Stores performance data (RRDtool) and state. Less suited for high-cardinality metrics.	Time-series with labels. Optimized for high-volume, high-cardinality data from sources like containers.
Best For	Hybrid environments with a strong on-premise footprint. Teams invested in a traditional ITIL/NOC workflow.	Cloud-native, containerized, and microservice-based workloads. DevOps teams that value flexibility and integration.

Solution 3: The Hybrid Approach – Best of Both Worlds?

You don’t always have to choose. A hybrid approach can be a powerful strategy, especially during a transition period or in complex environments where each tool plays to its strengths.

How It Works

The goal is to integrate the two systems. A common and effective pattern is to use Prometheus for what it does best (collecting cloud-native metrics) and feed critical alerts into Centreon to leverage its powerful notification engine.

Prometheus scrapes metrics from cloud services and applications.
Alerting rules are defined in Prometheus.
When an alert fires, Prometheus sends it to AlertManager.
AlertManager is configured with a webhook_config receiver that forwards the alert to a custom script or API endpoint on the Centreon side.
This script then uses the Centreon API (or a passive check mechanism like NSCA/Gorgone) to create/update a service status within Centreon.

This way, your Network Operations Center (NOC) can still use Centreon as their single source of truth for alerts, while your DevOps teams can leverage the power and flexibility of Prometheus for cloud monitoring.

Example: Forwarding Prometheus Alerts to Centreon

In your alertmanager.yml, you would define a receiver that points to a webhook listener on your Centreon server:

# alertmanager.yml
route:
  receiver: 'centreon-webhook'

receivers:
- name: 'centreon-webhook'
  webhook_configs:
  - url: 'http://your-centreon-server/path/to/webhook-listener.php'
    send_resolved: true

The webhook-listener.php script would be responsible for parsing the JSON payload from AlertManager and translating it into a passive check result for a corresponding service in Centreon. For example, it could extract the alert’s status (‘firing’ or ‘resolved’) and map it to a Centreon state (CRITICAL or OK).

When to Use This Approach

You have a mature Centreon deployment with complex on-call schedules, escalations, and reporting that you cannot easily replicate.
Your DevOps teams need the flexibility of Prometheus and PromQL for monitoring dynamic cloud applications.
You are in a multi-year transition from traditional infrastructure to the cloud and need a bridge between the two monitoring worlds.

Conclusion: Making the Right Choice

The choice between Centreon and Prometheus/AlertManager is not just about technology; it’s about matching the tool to your architecture, your team, and your operational model.

Go with Centreon if your primary focus is on providing a unified view of a stable, hybrid infrastructure and you value its mature, all-in-one feature set for traditional IT operations.
Choose Prometheus & AlertManager if your infrastructure is heavily cloud-native, containerized, and dynamic. This stack is built for the scale and ephemerality of modern cloud environments.
Consider a Hybrid approach to leverage the strengths of both platforms, using Prometheus for cloud data collection and Centreon for centralized alert management and reporting. This offers a pragmatic path forward for complex organizations.

Ultimately, the best solution is one that provides clear, actionable insights into the health of your systems, regardless of where they run.

👉 Read the original article on TechResolve.blog

☕ Support my work

If this article helped you, you can buy me a coffee:

👉 https://buymeacoffee.com/darianvance

Build vs. Buy – All Things Product Podcast with Teresa Torres & Petra Wille

January 27, 2026

Quality Assurance

Aegis Software Completes Acquisition of Simio

January 27, 2026

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Aegis Software Completes Acquisition of Simio

Build vs. Buy – All Things Product Podcast with Teresa Torres & Petra Wille

Trending Tags

Solved: Surveiller le cloud (GCP, AWS) avec Centreon? ou AlertManager?

🚀 Executive Summary

🎯 Key Takeaways

The Challenge: Cloud Monitoring Crossroads

Solution 1: The Centreon-Centric Approach

How It Works

Example: Monitoring an AWS EC2 Instance’s CPU

Pros & Cons

Solution 2: The Prometheus & AlertManager Stack

How It Works

Example: Scraping GCP Metrics and Alerting on High CPU

Pros & Cons

Head-to-Head Comparison: Centreon vs. Prometheus/AlertManager for Cloud

Solution 3: The Hybrid Approach – Best of Both Worlds?

How It Works

Example: Forwarding Prometheus Alerts to Centreon

When to Use This Approach

Conclusion: Making the Right Choice

Leave a Reply Cancel reply

Previous Post

Build vs. Buy – All Things Product Podcast with Teresa Torres & Petra Wille

Next Post

Aegis Software Completes Acquisition of Simio

Solved: Surveiller le cloud (GCP, AWS) avec Centreon? ou AlertManager?

🚀 Executive Summary

🎯 Key Takeaways

The Challenge: Cloud Monitoring Crossroads

Solution 1: The Centreon-Centric Approach

How It Works

Example: Monitoring an AWS EC2 Instance’s CPU

Pros & Cons

Solution 2: The Prometheus & AlertManager Stack

How It Works

Example: Scraping GCP Metrics and Alerting on High CPU

Pros & Cons

Head-to-Head Comparison: Centreon vs. Prometheus/AlertManager for Cloud

Solution 3: The Hybrid Approach – Best of Both Worlds?

How It Works

Example: Forwarding Prometheus Alerts to Centreon

When to Use This Approach

Conclusion: Making the Right Choice

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts