Cloud Platform Stability: Data Sources, Coverage, and Limitations

What this section measures

The Cloud Platform Stability section tracks publicly reported incidents across five major cloud and infrastructure platforms: GitHub, Cloudflare, Google Cloud, Azure, and AWS. For each platform, the dashboard shows the number of incidents over the available history window, the share of incidents that were change-related, and a stability index derived from each platform's own historical baseline.

The goal is to give a factual, first-party-sourced signal of operational reliability — not a ranking, but a transparent view of what each provider publicly reports about its own incidents. No third-party estimates, synthetic benchmarks, or inferred data are used.

Data sources and per-platform coverage

Each platform is scraped from its own public status channel. Coverage quality varies substantially by provider:

GitHub and Cloudflare — Atlassian Statuspage API

Both platforms publish incidents via the Atlassian Statuspage JSON API at githubstatus.com/api/v2/incidents.json and cloudflarestatus.com/api/v2/incidents.json. The API returns all incident fields in structured JSON with precise start and end timestamps.

Hard API cap: 50 incidents maximum

The Atlassian Statuspage public API is hard-limited to the 50 most recently resolved incidents, regardless of any pagination or limit parameters passed. The page and limit query parameters are silently ignored — every call returns the same 50 incidents. This is an upstream API constraint, not a collection bug.

The pipeline works around this limit by accumulating incidents across runs. Each time the pipeline runs, newly resolved incidents are merged into the local archive using a stable incident ID. Over time, this archive grows into a genuine multi-month history. The window label on the dashboard reflects the actual span of accumulated data, not an intended target.

Google Cloud — incidents.json endpoint

GCP publishes a complete public incident history as a single JSON array at status.cloud.google.com/incidents.json. This returns all historical incidents in one request with structured fields including begin/end times and severity. It is the most complete data source of the five platforms — all available history is fetched and cached on each run.

Azure — HTML scrape of status.microsoft.com

Azure publishes Post Incident Reviews (PIRs) at azure.status.microsoft/en-us/status/history/ as server-rendered HTML. The scraper parses the page structure and extracts timestamps from the PIR body text using pattern matching against the formats Azure uses in its write-ups.

Widespread incidents only

Azure's public status history page shows only widespread incidents that affected multiple services or regions at scale — those significant enough to warrant a formal PIR. Localized service degradations, single-region incidents, and events below Azure's internal disclosure threshold are not reflected. Azure incident counts will therefore appear lower than platforms that report all incidents publicly.

Additionally, Azure's history page is paginated via JavaScript, meaning a plain HTTP scraper can only access the incidents visible on the first page load. As with GitHub and Cloudflare, each pipeline run merges newly visible PIRs into the local archive to extend coverage over time.

AWS — per-service RSS feeds

AWS publishes per-service RSS feeds at status.aws.amazon.com/rss/{service}-{region}.rss. The collector fetches feeds for 17 tier-1 service/region combinations (EC2, S3, Lambda, RDS, DynamoDB, CloudFront, ELB, Route53, SQS, SNS, IAM, EKS, ECS across us-east-1, eu-west-1, us-west-2, and ap-southeast-1).

No historical archive

AWS RSS feeds only contain active or very recent events. There is no public historical incident archive for AWS — the older all.rss feed no longer exists and the new AWS Health Dashboard is a JavaScript SPA with private S3 data backends (HTTP 403). AWS incident counts of zero on the dashboard mean all monitored services were healthy at collection time, not that AWS has no incident history. The dashboard reflects this by showing "active only" for the AWS window label.

Coverage windows and how they grow

The window label on each dashboard card (e.g. ~6 wks, ~5 mo) reflects the actual span of data in the local archive — from the oldest recorded incident to the most recent — not an intended collection target. This label is computed fresh on every pipeline run.

Platform	API / scrape coverage	Archive growth	Scope
GitHub	Last 50 incidents per call (~4–8 weeks)	Accumulates with each run; reaches ~12 mo after ≈12 months of monthly runs	All incidents
Cloudflare	Last 50 incidents per call (~3–6 weeks)	Accumulates with each run; reaches ~12 mo after ≈12 months of monthly runs	All incidents
Google Cloud	Full public history in one request	Already spans years; window grows naturally	All incidents
Azure	First page of PIRs (~5 entries)	Accumulates new PIRs each run; window grows as PIRs are published	Widespread only
AWS	Active incidents only (no archive)	Cannot accumulate — no historical data available publicly	Unknown

Because incident rates vary by platform, the same 50-incident API cap produces very different time windows. A platform with many small incidents may only span weeks; one with few incidents may span many months. This means incident counts are not directly comparable across platforms — a lower count on GitHub does not mean GitHub is more stable than Cloudflare; it may simply mean fewer incidents fit in the collection window.

As the pipeline matures, GitHub and Cloudflare window labels will progress from weeks toward months and eventually reach 12 months once a year of monthly runs have accumulated. GCP already has multi-year history available. AWS will show "active only" until AWS provides a public historical archive.

The stability index

The stability index is a per-platform z-score that normalizes each month's incident count, total outage duration, and change-related incident share against that platform's own history. Higher values indicate a calmer period relative to the platform's norm.

The formula:

z_inc = (incidents - mean(incidents)) / std(incidents)
z_dur = (duration  - mean(duration))  / std(duration)
z_chg = (chg_share - mean(chg_share)) / std(chg_share)

stability_index = 100 − (0.4 × z_inc + 0.4 × z_dur + 0.2 × z_chg)

An index near 100 means the current period is close to the platform's own average. Above 100 means fewer/shorter incidents than normal; below 100 means more/longer incidents than normal. The index requires at least 3 months of history to compute — platforms with shallow data will show N/A until sufficient history accumulates in the archive.

What the index does not measure

The index is normalized to each platform's own baseline, not an absolute standard. A platform that has frequent short incidents will score 100 during its typical period, as will a platform that has rare but long incidents. Cross-platform comparison of index values is only meaningful once each platform has accumulated months of history in the pipeline.

Data & methodology

GitHub: Atlassian Statuspage API — last 50 incidents (API hard cap); all incidents
Cloudflare: Atlassian Statuspage API — last 50 incidents (API hard cap); all incidents
GCP: status.cloud.google.com/incidents.json — full history; all incidents
Azure: status.microsoft.com HTML scrape — current page only; widespread/PIR incidents only
AWS: Per-service RSS feeds — active incidents only; no historical archive; scope unknown
Stability index: Per-platform z-score: 100 − (0.4·z_incidents + 0.4·z_duration + 0.2·z_change_share). Requires ≥3 months of history.
12 mo* caveat: Window is designed to be 12 months, but actual coverage is bounded by each provider's public data availability.