Skip to main content
Skip table of contents

What is the Health Checker?


Overview

Many of our customers leverage Peer Software's technologies to ensure availability of data in multiple locations.  Hardware and software outages, as well as unforeseen spikes in activity can impact availability of data and Service-Level Agreements (SLAs).  Health Checker from Peer Software is a standalone service designed to track overall performance of their replication environment as well as alert customers to outages and backlog spikes.

Health Checker independently monitors activity when installed on a separate Windows server in your environment.  Job and shares to monitor can be fed to it on a scheduled basis from the Peer Management Center (PMC).  On a fixed schedule (every 60 minutes by default), Health Checker creates control files (files with user-specified extensions) across all shares in your jobs, and then monitors all participants to ensure that file events (such as adds, mods, renames, and deletes) are being replicated throughout the system in a timely fashion.  If an file operation is not replicated within a window of time (30 minutes by default), alerts can be emailed.  In addition, Health Checker can be configured to generate reports on a daily, weekly, or monthly basis that show overall replication performance. 

Where Can Health Checker Be Installed?

Health Checker is a standalone service that can be installed on the same server as Peer Management Center or on a separate server.  You will get the most benefit by installing Health Checker on a separate server, where it can actively monitor your replication environment for outages and issues.  If you want to use Health Checker to track only the time that replication takes, you can install Health Checker directly on the PMC server.

If you are going to use Proactive Monitoring, you can install Health Checker on the PMC server as part of the process of setting up Proactive Monitoring.  See the User Guide for more information about Proactive Monitoring.

How Health Checker Works

Health Checker is based on the concepts of cycles and checks to provide statistical reports and alerts. 

By default, cycles are started each hour and involve Health Checker writing a unique control file of each specified extension to each participant.  Health Checker then begins checking (every 60 seconds by default) all participants for the existence of these files. 

Once Health Checker determines that these files were replicated everywhere, Health Checker sequentially modifies the original files (in the form of modifications, renames, and deletes) and then begins checking (every 60 seconds by default) whether the modifications were replicated to all participants. 

Once all modifications are determined to have been replicated properly, Health Checker logs times for each operation of the cycle to a stats log that will be used to generate reports.  These cycles are repeated every hour so you can compare performance numbers throughout the day. 

If Health Checker performs an operation during a cycle and that operation takes more than 30 minutes (configurable) to replicate, Health Checker can send out an alert via email.  Alerts can be combined into single emails sent once per specified interval (e.g., per hour, 8 hours, 1 day).  Once an alert is sent out, Health Checker continues to check all participants until the operation is replicated everywhere.  This is important for measuring and comparing the time to replicate an operation with other cycles throughout the day.

Health Checker Output

Health Checker operations result in three types of output: 

  • Reports - Health Checker can email reports based on the stats log.  A report contain Excel charts that show replication performance over time.  Reports can be configured to be sent once a day, once a week, or once a month.  The report also includes a configurable number of previous days for comparison purposes.
  • Alerts - Health Checker can email failure alerts to specified contacts.  An alert identifies the files that have failed to replicate within the cycle.  Alerts can be configured to be sent at specific intervals, ranging from every 5 minutes to once a week.
  • Analytics Dashboard - If you are using Health Checker in conjunction with the Analytics Dashboard, replication performance numbers are displayed on the dashboard.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.