What is xMatters and what are its top alternatives?
Top Alternatives to xMatters
PagerDuty is an alarm aggregation and dispatching service for system administrators and support teams. It collects alerts from your monitoring tools, gives you an overall view of all of your monitoring alarms, and alerts an on duty engineer if there's a problem. ...
OpsGenie is a cloud-based service for dev & ops teams, providing reliable alerts, on-call schedule management, and escalations. OpsGenie integrates with monitoring tools & services and ensures that the right people are at the right time. ...
VictorOps is a real-time incident management platform that combines the power of people and data to embolden DevOps teams so they can handle incidents as they occur and prepare for the next one. ...
Healthchecks.io is a monitoring service for your cron jobs, background services and scheduled tasks. It works by listening for HTTP "pings" from your services. You can set up various alert methods: email, Slack, Telegram, PagerDuty, etc. ...
Bigpanda helps you manage and respond to ops incidents faster. All your alerts: organized, assignable, trackable, snoozeable, and updated in real-time. ...
Monitoring systems are often complex and require a strong sysadmin background to properly configure and maintain. Cronitor replaces all this with a simple service that anyone can set up. Receive email/sms notifications if your jobs don't run, run too slow, or finish too quickly. ...
It is an alert aggregation and incident management service for IT and DevOps teams. It is a real-time SaaS platform that combines collaboration with alert management so you can handle critical incidents as they occur. With our quick escalations, the right alerts are delivered to the right people enabling increased agility to your team. Our mobile app and integrations allow you to get alerts through SMS, push notifications, and email so you never again miss a critical alert. ...
It manages security incidents by deeply integrating with existing tools used throughout an organization (Slack, GSuite, Jira, etc.,) It is able to leverage the existing familiarity of these tools to provide orchestration instead of introducing another tool. ...
xMatters alternatives & related posts
related PagerDuty posts
Our primary source of monitoring and alerting is Datadog. We’ve got prebuilt dashboards for every scenario and integration with PagerDuty to manage routing any alerts. We’ve definitely scaled past the point where managing dashboards is easy, but we haven’t had time to invest in using features like Anomaly Detection. We’ve started using Honeycomb for some targeted debugging of complex production issues and we are liking what we’ve seen. We capture any unhandled exceptions with Rollbar and, if we realize one will keep happening, we quickly convert the metrics to point back to Datadog, to keep Rollbar as clean as possible.
We use Segment to consolidate all of our trackers, the most important of which goes to Amplitude to analyze user patterns. However, if we need a more consolidated view, we push all of our data to our own data warehouse running PostgreSQL; this is available for analytics and dashboard creation through Looker.
Data science and engineering teams at Lyft maintain several big data pipelines that serve as the foundation for various types of analysis throughout the business.
Apache Airflow sits at the center of this big data infrastructure, allowing users to “programmatically author, schedule, and monitor data pipelines.” Airflow is an open source tool, and “Lyft is the very first Airflow adopter in production since the project was open sourced around three years ago.”
There are several key components of the architecture. A web UI allows users to view the status of their queries, along with an audit trail of any modifications the query. A metadata database stores things like job status and task instance status. A multi-process scheduler handles job requests, and triggers the executor to execute those tasks.
Airflow supports several executors, though Lyft uses CeleryExecutor to scale task execution in production. Airflow is deployed to three Amazon Auto Scaling Groups, with each associated with a celery queue.
Audit logs supplied to the web UI are powered by the existing Airflow audit logs as well as Flask signal.
Datadog, Statsd, Grafana, and PagerDuty are all used to monitor the Airflow system.