Office 365 Outages: Practical tips and resources to prepare for next time
Microsoft’s Office 365 has had a rough couple of months when it comes to service outages. While...
Microsoft's latest outage reveals some attention points for Microsoft.
This past Thursday, May 2nd 2019, Microsoft suffered another outage on (parts of) its cloud services. The outage follows a series of outages, earlier this year, affecting a variety of online services including Azure, SharePoint Online, OneDrive, Intune, Microsoft Teams, etc.
According to the recently published post-incident report, the root of the issue was a faulty DNS update, leaving thousands of users unable to connect to said services for a period of roughly two hours:
“A configuration issue occurred during planned maintenance activity related to a name server delegation change within Azure Domain Name Services (DNS). Specifically, an issue in the update to one of the name servers for DNS zones caused server records to point to a DNS server that contained blank zone data. As a result, the affected DNS infrastructure returned negative responses and users encountered connectivity issues when attempting to access Microsoft services."
If anything, the outage shows there are several areas of improvement for Microsoft. For example, the lack of (correct) communications left a lot of customers wondering what was wrong. This is an issue that keeps reappearing through various outages.
During the early stages of the outage, Microsoft’s various health dashboards showed no issues, forcing customers to turn to Twitter to find out more information about the issue itself:
ENow Software is the leading provider of Office365 Management solutions that helps you save money and increase end user productivity.
Let’s take a look at how ENow’s Office 365 monitoring solution quickly surfaces problems in real-time and allows our customers to successfully diagnose and troubleshoot tricky outages like the SharePoint and OneDrive Online problems.
Shortly after the DNS problem began taking effect, we received some visual indications on the OneLook Dashboard that pointed us in the direction of the problem. The screenshot below shows that there are critical issues for Office 365 Network connectivity as well as a problem with Teams and SharePoint Online. This helps us understand immediately that there is a problem with the Office 365 service.
.
During the May 2nd outage, ENow customers saw that there were failed status notifications for One Drive, SharePoint Online, and Teams.
Drilling down into the SharePoint Online indicator shows that we are not able to connect to the SharePoint Online service.
Additionally, ENow’s Office 365 monitoring solution performs synthetic transactions that test the functionality specific to your tenant. We can see from the image below that because we are not able to connect to the SharePoint Online service, our upload/download test fails.
Users who rely on the Microsoft Service Health dashboard didn’t get a concrete update for several hours. This frustration can be avoided by utilizing ENow’s OneLook Dashboard to save precious time when there is an outage.
ENow customers like Barclays, Facebook and VMware were able to quickly identify and drill down to the root cause of the problem as it was happening.
Watch the video below to see how this took place in real time!
In a cloud-world, outages are bound to happen. While Microsoft is responsible for restoring service during outages, IT needs to take ownership of their environment and user experience. It is crucial to have greater visibility into business impacts during a service outage the moment it happens.
ENow’s Office 365 Monitoring and Reporting solution enables IT Pros to pinpoint the exact services effected and root cause of the issues an organization is experiencing during a service outage by providing:
Identify the scope of Office 365 service outage impacts and restore workplace productivity with ENow’s Office 365 Monitoring and Reporting solution. Access your free 14-day trial today!
Michael Van Horenbeeck is a Microsoft Certified Solutions Master (MCSM) and Exchange Server MVP from Belgium, with a strong focus on Microsoft Exchange, Office 365, Active Directory, and a bit of Lync. Michael has been active in the industry for about 12 years and developed a love for Exchange back in 2000. He is a frequent blogger and a member of the Belgian Unified Communications User Group Pro-Exchange. Besides writing about technology, Michael is a regular contributor to The UC Architects podcast and speaker at various conferences around the world.
Microsoft’s Office 365 has had a rough couple of months when it comes to service outages. While...
Microsoft’s Office 365 has had a rough year when it comes to service outages. While every outage is...