Microsoft introduced a new built-in exchange monitoring system called Managed Availability in Exchange 2013, which automatically takes recovery actions for unhealthy services within the Exchange organization.
Microsoft has been operating a cloud version of Exchange since 2007 and has put all their knowledge into Managed Availability monitoring. Managed Availability is a cloud trained system based on an end user’s experience with recovery oriented computing.
Managed Availability doesn’t mean you don’t have to monitor your on-premises or hybrid Exchange environment in fact, it’s just the opposite. The long and complex exchange monitoring PowerShell cmdlet’s (which we will look at in more detail later) are not the best and most effective method to do so.
Exchange 2013, or even better, the Exchange Diagnostics Service (EDS), collects a lot of performance data by default. Over 3,000 performance counters are compiled over seven days. The folder %Exchange Install Path%\Logging\Diagnostics\PerformanceLogsToBeProcessed collects and merges data onto the daily performance log on a regular basis using the Microsoft Exchange Diagnostics service. You can find this folder under path %Exchange Install Path%\ Logging\Diagnostics\DailyPerformanceLogs which is a .blg file type from the PerfMon. Managed Availability uses these files, among others, to track the health of system components. The performance counters are saved for 7 days or until 5 GB of data is reached by default. You can change these settings in the file called Microsoft.Exchange.Diagnostics.Service.exe.config located in the bin directory of your Exchange installation path:
<add Name="DailyPerformanceLogs" LogDataLoss="True" MaxSize="5120" MaxSizeDatacenter="2048" MaxAge="7.00:00:00" CheckInterval="08:00:00" />
Managed Availability has multiple HealthSet models that are responsible for different services, such as:
Probes run every few minutes against different services, checks the health, and collects data from the server. These results flow in the exchange monitoring component of Managed Availability. An Exchange 2013 multi-role server is defined by hundreds of probes and in most cases, these Probes are not directly discoverable. This means that most of the Probes are defined within the Exchange program code and not changeable. For example, customers reported the AutoDiscoverSelfTestProbe failed when the ExternalUrl for the EWS virtual directory wasn’t set and there were no ways to change the probe settings. Therefore, Microsoft resolved this issue in Cumulative Update 6. The Probes write an informational event to the Microsoft.Exchange.ActiveMonitoring\ProbeResult crimson channel with the following result types:
1 = Timeout
2 = Poisoned
3 = Succeeded
4 = Failed
5 = Quarantined
6 = Rejected
Probes are divided into three categories:
Many Monitors have high thresholds of multiple probe failures before becoming Unhealthy to avoid wrong recovery actions taken by Managed Availability and the Responders. For problems that require manual intervention, take a look at the Microsoft.Exchange.ManagedAvailability\Monitoring crimson channel.
If you would like to take a look at all recovery actions through the Managed Availability Responders, view the Microsoft.Exchange.ManagedAvailability\RecoveryActionResults crimson channel.
This concludes part one of this article. In the second part, we will take a more practical approach to Managed Availability. By using PowerShell we will show you how you can retrieve useful information from the massive amounts of data that Managed Availability collects about your environment.
Part 2 goes over how to check, protect, and maintain Exchange Server and then in Part 3 we dive into local monitoring and overrides.
Watch all aspects of your Exchange environment from a single pane of glass: client access, mailbox, and Edge servers; DAGs and databases; network, DNS, and Active Directory connectivity; Outlook, ActiveSync, and EWS client access.