Global Microsoft Outage caused by CrowdStrike Update Underscores the Benefits of Microsoft 365 Monitoring
A global IT outage significantly impacted Microsoft 365 and Microsoft Azure services and countless...
A global IT outage affected Microsoft 365 and Azure Services on July 30, 2024. The issue was traced back to a DDoS attack, which caused performance issues with Azure Front Door (AFD) and Azure Content Delivery Network (CDN) .
On July 30, 2024, Microsoft experienced a widespread outage that disrupted Azure and Microsoft 365 services for users worldwide. This comes less than two weeks after the enormous outage caused by the CrowdStrike Windows update, which affected 8.5 Million Windows devices and is estimated to have cost US Fortune 500 companies $5.4 billion.
Microsoft reported this outage on the Microsoft 365 Service Health Page as MO842351.
Figure 1 and 2: Microsoft’s Incident Reporting of Incident MO842351 on the Microsoft 365 Service Health Page.
Figure 3: Microsoft’s Incident Reporting of Tracking ID KTY1-HW8 on the Azure Service Health Page.
An unexpected surge in usage caused performance problems with Azure Front Door (AFD) and Azure Content Delivery Network (CDN). This led to errors, delays, and latency spikes. Microsoft later identified the root cause as a Distributed Denial-of-Service (DDoS) attack, which triggered Microsoft's DDoS protection systems. However, an error in implementing these systems worsened the attack's impact rather than mitigating it.
Outage Duration: Between approximately 11:45 UTC and 19:43 UTC on July 30, 2024, a subset of customers encountered issues connecting to various Microsoft services. The affected services included:
ENow's Microsoft 365 Monitoring tool ensures that URLs are active and accepting traffic. Our Monitoring showed that the Microsoft MFA network was down, from the ENow web server to Microsoft MFA endpoints. We saw one URL in an error state for several hours before clearing. This disruption was visible on ENow's OneLook dashboard, providing detailed insights from the top level to granular details.
Figure 4: ENow Microsoft 365 Monitoring Dashboard showing that Multi-Factor Authentication is in a ‘critical state.’
Figure 5: ENow Microsoft 365 Monitoring Dashboard showing affected Networks.
The ENow Microsoft 365 Monitoring solution reported a connection error, pinpointing the exact location of the failure.
A screenshot from our Microsoft 365 Monitoring tool showed that the Microsoft endpoint at IP address 13.107.246.69 was non-responsive and not accepting connections on port 443.
Figure 6: ENow Microsoft 365 Monitoring Dashboard showing the granular endpoint data.
Quickly identifying the source of a Microsoft 365 outage is essential. Early detection allows IT teams to start resolving the issue immediately, preventing it from worsening if it's in your realm of control. For instance, immediate alerts about server failures enable rapid traffic rerouting and failover protocol initiation, minimizing user impact.
If it's an external issue, as it was in this instance, it allows you to pivot and promptly notify users of the problem, affected services, and workarounds, if there are any. Sharing expectations around updates and resolutions will reduce support tickets. It also saves IT pros from unnecessary troubleshooting and interruptions to other projects and priorities.
Effective communication is crucial during a significant IT outage, especially one impacting Microsoft 365 services. Once an outage is detected, whether internal or external, promptly informing users is vital. Transparency and timely updates, even about negative news, are appreciated. Notifying users through email, social media, and in-app messages about issues can reduce support tickets and build confidence and trust in the IT department.
Acting swiftly during Microsoft 365 outages is crucial for maintaining a positive IT department image. Fast issue identification, prompt user communication, and decisive recovery actions highlight reliability and competence. Effective communication can transform a potential crisis into an opportunity to strengthen user trust.
Effective outage communication and handling require real-time insights. ENow's proactive Microsoft 365 Monitoring provides enhanced visibility, resulting in faster issue resolution, reduced business impact, and increased user confidence in your IT team.
Learn about ENow's Microsoft 365 Monitoring and Reporting Platform, or contact us for a Microsoft 365 Monitoring Demo.
Outages are bound to happen in a cloud world. While Microsoft is responsible for restoring service during outages, IT needs to take ownership of its environment and user experience. It is crucial to have greater visibility into business impacts during a service outage the moment it happens.
ENow’s Microsoft 365 Monitoring and Reporting solution enables IT Pros to pinpoint the exact services affected and the root cause of the issues an organization is experiencing during a service outage by providing:
Identify the scope of Microsoft 365 service outage impacts and restore workplace productivity with ENow’s Microsoft 365 Monitoring and Reporting solution. Access your free 14-day trial today!
A global IT outage significantly impacted Microsoft 365 and Microsoft Azure services and countless...