This is an anonymized career engagement. The client is anonymized to protect confidentiality; full case detail and references are shared under NDA.

← Back to Case Studies
US Telco · Telecommunications · 8 weeks · 300+ monitors across 40 regions

Synthetic Monitoring Reset

Reduced synthetic false-positive alerts by 71% through location regionalization and dependency-aware alert correlation across Catchpoint and Dynatrace.

Outcome at a Glance

71%
False-positive reduction
40
Regions tuned
4 hr/wk
On-call time saved

The Challenge

A US telco was running 300+ Catchpoint synthetic monitors across 40 regions, but their on-call engineers had stopped trusting the page. Too many false positives. Real customer-facing outages were getting buried under regional noise from carrier-network conditions outside the company's control.

  • Regional noise: Single-location failures triggering global alerts, even when most users were unaffected.
  • No dependency awareness: Synthetic alerts firing simultaneously when an upstream service degraded, masking the real incident.
  • On-call disengagement: Engineers acknowledging-and-ignoring synthetic pages within seconds — eroding the value of the entire monitoring layer.
  • Drift-prone configuration: Hundreds of monitors hand-edited in the UI, with no source of truth and no review process.

Our Approach

Location regionalization

Reorganized 40 monitoring locations into geographic clusters with consensus thresholds. A single bad node no longer pages globally.

  • • 5-cluster topology (NA-East, NA-West, EU, APAC, LATAM)
  • • Cluster-consensus thresholds (3-of-5)
  • • Carrier-aware location grouping

Dependency-aware correlation

Wired synthetic alerts into Dynatrace's service topology so that downstream synthetic failures suppress against an upstream root-cause incident.

  • • Shared problem context across tools
  • • Suppression rules during upstream incidents
  • • Single-incident view for on-call

Monitors-as-code

Migrated all monitor definitions out of the UI into a git-backed source of truth. Changes go through pull request review.

  • • Catchpoint API + Terraform provider
  • • PR-based change review
  • • Automated drift detection

Alert tier reset

Reclassified every monitor by customer impact. Pageable alerts went only to monitors mapped to a real user-facing outcome.

  • • 3-tier severity model
  • • Page-only on customer-facing flows
  • • Slack-route for warnings

Detailed Results

Before

  • • 71% false-positive rate on synthetic pages
  • • Single-location failures paging globally
  • • Hand-edited monitor configuration
  • • On-call ignoring synthetic alerts

After

  • • 71% reduction in false positives
  • • Regional consensus before global page
  • • Git-backed monitor definitions
  • • ~4 hours/week on-call time recovered

Got synthetic alert fatigue?

If your on-call team has stopped trusting synthetic pages, the fix is rarely "more monitoring." Book the audit and we'll find the leverage points.

Book the audit