Kyndryl adds AI to prevent IT outages before they happen
Fri, 8th May 2026 (Today)
Kyndryl has introduced a new agentic AI feature in Kyndryl Bridge designed to detect and resolve IT risks before they become outages. The feature is already in use across the Kyndryl Bridge customer base.
Described by Kyndryl as patented, the function sits within the company's open integration platform and is intended to spot patterns that tend to appear before systems fail. It analyses signals from applications and infrastructure, then uses AI agents to trigger actions aimed at preventing disruption rather than responding after an incident has occurred.
The feature has been deployed to more than 1,400 Kyndryl Bridge customers. Across that installed base, the platform generates more than 16 million AI insights each month, according to Kyndryl.
Kyndryl said customers using the system have recorded up to 50% fewer IT incidents, while aggregate annual savings from avoided outages and lower maintenance costs total USD $3 billion. In some early deployments, mission-critical production outages fell by as much as 90%.
How it works
The system carries out AI-assisted root cause analysis across more than 200,000 customer devices. The process is designed to identify conditions that often lead to outages, including application slowdowns, infrastructure contention, configuration changes and operational events that may seem minor in isolation but become more serious when combined.
By correlating those signals across different layers of an IT estate, Kyndryl is aiming to address a problem many large organisations face as systems spread across hybrid environments and multiple suppliers. In those settings, finding the source of a fault can take days or weeks and often depends on manual investigation by technical teams working across separate tools.
The new feature is intended to shorten that process by surfacing likely causes and suggested interventions earlier. Kyndryl said its experts review and validate the generated insights to ensure they fit each customer environment before operational decisions are made.
That review step is significant because many businesses remain cautious about handing control of production systems entirely to automated agents. The use of AI in IT operations has grown as companies seek to cut downtime and reduce support costs, but concerns remain over false positives, poor recommendations and limited visibility into how systems reach certain decisions.
Customer pressure
The launch comes as large enterprises face growing pressure to keep digital systems available while managing increasingly complex estates. Many now run applications across on-premises infrastructure, public cloud services and outsourced platforms, creating a web of dependencies that can make failures difficult to predict.
Traditional monitoring tools can generate huge volumes of alerts without clearly showing which warning signs matter most. Kyndryl's approach is to identify combinations of events that tend to precede disruption and act before those conditions turn into a business-impacting outage.
The feature can handle early detection at scale for more than 10 million incidents a year, according to Kyndryl. The company also said the tool helps organisations complete major incident reports in hours rather than weeks by speeding up root cause analysis.
For Kyndryl, the addition expands the role of Kyndryl Bridge beyond observability and support into more direct intervention in customer operations. It reflects a broader shift in the IT services market, where providers are trying to move from advising customers after problems occur to preventing faults before they affect users or revenue.
Kyndryl, which specialises in infrastructure services and systems management, serves thousands of customers in more than 60 countries. It positions Kyndryl Bridge as a central layer that links operational data across customer environments and turns it into recommendations for technical teams.
The latest feature adds a stronger automation element to that model by using AI agents to take action once risks have been identified. Kyndryl did not detail which remediation steps are automated by default and which require human approval, but said the system is intended to support earlier intervention and reduce operational disruption.
Xerxes Cooper, Global Leader, Kyndryl Delivery, outlined the company's rationale for the launch. "By embedding AI agents in Kyndryl Bridge for proactive risk detection, we are transforming IT operations from reactive outage recovery to proactive, evidence-based prevention," he said. "Correlating millions of observability signals across applications and deep infrastructure helps our customers see and resolve issues before they ever feel them."