Office 365 Monitoring: Microsoft Forms Service Incident (Feb. 24, 22)
On February 24, 2022, at ~01:31 am UTC, Microsoft communicated via tweet (@MSFT365status) that they...
In the wake of the embarrassing and widely publicized Midnight Blizzard app breach, are you are aware that Microsoft also suffered from a massive Teams outage on 1/26/24 for almost 13 hours, which substantially impacted global communication abilities?! And then again on 1/29/24?!
It would have been hard for many out there NOT to have felt the pain at some point…
Many sources reported the issue began around 8:45am EST, but Microsoft did not publicly acknowledge the issue until approximately 11:45am EST:
However – the ENow Microsoft 365 Monitoring and Reporting solution picked up on the Teams outage around 9:09am EST, as seen from the first email alert captured below. This represents the connectivity issues at the time, and that the test failed to even connect:
Below is another (actual) screenshot that shows an alert from ENow’s monitoring solution, after detecting the outage:
This second screenshot speaks to the delay effects of the outage, you can see the latency experienced when you were finally able to send a message:
The difference in messaging times on the latency chart during the outage should be clear.
And true to form, the Microsoft Teams X account didn’t publicly post about the issue on X until much later in the day, at approximately 5:46 pm EST:
That’s a slow response time, considering the outage’s size, which affected North and South America, Europe, the Middle East, and Africa – no biggie. Also taking into consideration that the functionality of Teams was severely impacted, hampering thousands of users on mobile and desktop apps in addition to a plethora connectivity issues, login problems (freeze on the loading screen), missing attachments/messages, and message delays. Microsoft’s statement essentially said it was a networking issue that impacted a portion of Teams and had to failover affected datacenters to resolve the issue, which was what then led to even longer delays. Let’s just say ‘a good time was not had by all.’
…That is, unless you are an ENow client 😊 Our Teams Chat test caught the issue, as it was unable to connect to the service using the MS Graph endpoint and the connection kept timing out. For a client using our service, Microsoft’s network issue would’ve prevented the Teams chat test from making a successful connection from the management server as well as any remote probes deployed. The ENow Monitoring and Reporting solution would have alerted to the first sign of trouble, providing IT Support with the affected locations and enabling them to proactively send out an advisory.
Even if the service was able to connect, that doesn’t mean the service delivery wouldn’t have suffered. The latency chart shows Teams chats taking anywhere from 18 to almost 20 times longer than usual. Again, EMS AI alerts kept support teams ahead of the game with the multiple alert mechanisms available.
Sort of like the smaller aftershocks that are triggered by an initial earthquake, Microsoft had another outage on Monday, 1/29/24, the second in 3 days. The incident was tagged TM710900.
Microsoft made a post on the company’s official Microsoft 365 status account X around 9:45 am EST:
Essentially, it seemed to be a continuation of last Friday’s outage which seemed to have been resolved. This disruption affected users in North and South America, Canada, and Brazil, with users reporting connectivity issues along with messaging delays impacting the Teams mobile and desktop clients.
The Microsoft Incident report states, "This Service Health post is in response to some external customer reports and will be updated with further details as we confirm the service's operational health."
Microsoft has yet to update its service health page for the Teams consumer service, which says, "Everything is up and running."
The moral of the story here is – having a monitoring and reporting system in place for instances like this is paramount to minimize workplace productivity disruptions. Proper Microsoft 365 monitoring provides your IT and support teams with an offensive stance, keeping them ahead of the game which minimizes the disruption and downtime that can come when an organization is defensively mitigating the issue after the fact.
As illustrated by these outages, it should be noted that these unforeseen, long, and widespread Teams outages are often not communicated in a timely or detailed manner by Microsoft. Do yourself and your organization a favor and let ENow take care of the Microsoft 365 monitoring for you, because an early warning system will keep you ahead of the curve and minimize the impact to your end users through faster troubleshooting and proactive communication alerts during such a critical time!
This is another instance that underscores the importance of Microsoft 365 monitoring from an end-user experience point of view in order to understand how organizations are impacted during vendor outages. This degree of vagueness leaves administrators wondering what is working and what is not working for employees within their organization. Is Outlook, Outlook mobile, or Outlook on the web having an issue or is it EWS?
With ENow’s Microsoft 365 Monitoring & Reporting Solution, IT Pros are able to go visually monitor their entire environment from a single pane of glass. When an outage occurs, the visual breadcrumb trail enables IT Pros to pinpoint the root cause with confidence.
When an outage does occur, IT Pros can take the guesswork out of the impact and are able to see which services and subset of services are affected with ENow’s remote probes. The end-user experience monitoring probes can be installed where your end users are and cover a wide range of Microsoft 365 apps and other cloud-based collaboration systems (OneDrive, Teams, Zoom, Salesforce).
Lastly, the ENow dashboard provides visibility into the main places IT Pro’s typically go to learn more about outages, namely the Service-Health-Dashboard and Twitter. Thus, providing a single place to obtain information on service outages.
In a cloud-world, outages are bound to happen. While Microsoft is responsible for restoring service during outages, IT needs to take ownership of their environment and user experience. It is crucial to have greater visibility into business impacts during a service outage the moment it happens.
ENow’s Microsoft 365 Monitoring and Reporting solution enables IT Pros to pinpoint the exact services affected and root cause of the issues an organization is experiencing during a service outage by providing:
Identify the scope of Microsoft 365 service outage impacts and restore workplace productivity with ENow’s Microsoft 365 Monitoring and Reporting solution. Access your free 14-day trial today!
On February 24, 2022, at ~01:31 am UTC, Microsoft communicated via tweet (@MSFT365status) that they...
On April 18th, 2022, at ~7:07 am UTC, Microsoft communicated via tweet (@MSFT365status) that they...