This case study delves into how a leading Biotechnology Company, improved its operational visibility and performance through the implementation of an observability solution.
Landscape
300+ applications
1800+ servers
Process
Current State Assessment
Observability Platform
Incident Management
ITSM Toolset
Tool Evaluation
Monitoring Rollout (Top 10 Apps)
Time
Due Diligence: 1 Month
Implementation: 11 Months
Maintenance: Ongoing
Team Size - 3
Business Challenge
Lack Of Monitoring Implementation
20% of Servers Monitored in Infra Mode.
Insufficient Licenses to Support Monitoring.
Delayed Incident Response
Prolonged Unplanned Downtime of Critical Indents.
Ops teams working in Silo’s led to increased MTTR.
Inefficient Event Management
Manual Process with no SOP’s in place.
The MIM team was out of the loop in most Major Incidents.
Lack of CMDB integration further caused inefficiencies in routing incidents to the right teams and calculating the true impact of Incidents.
What We Did
Assessment Report & Roadmap for Observability
Report Detailing the impact of lack of Monitoring.
Current State of Monitoring, Incident Response & ITSM Tool.
Tool Evaluation (Dynatrace vs DataDog vs Open Telemetry & PagerDuty Vs xMatters).
License & Module requirements for Dynatrace, PagerDuty & ServiceNow.
Foundation For AI-based Observability Platform
Upgraded SNOW to Latest Release.
Enabled RunDeck capabilities in PagerDuty to enable Self-healing, Event Correlation, duplication & Alert suppression.
Updated CMDB for 10 In-Scope Applications.
Rolled Out Agents to over 1800 Servers.
Integrated Dynatrace, PagerDuty, ServiceNow, MS Teams & Webex for automated Incident Response.
Formalized automated MIM workflow.
Implemented Full Stack monitoring for top 10 applications.
User Adoption & Training.
Value Delivered
Increased Infrastructure Monitoring to 100%.
Implemented Full Stack Monitoring to top 10 Apps.
Implemented Self-healing & other AI-based solutions to improve the maturity of Operations.
Formalized a MIM process with fully automated notification & Escalation mechanisms.
License Capacity for Next 5 Years
Trained over 15 teams with ~400 resources in using the platform.
Real-time SLA Reports for both internal OPS teams & Vendor Managed platforms such as SAP, DBA, Infra, and more.