قراءة 3 دقيقة

ChkServd alert field guide: reading cPanel service alerts

Decode the chkservd alert subject lines and bodies cPanel sends when a service flaps, with notes on common false positives and how to tune sensitivity.

ChkServd alert field guide: how to read cPanel's service alerts

chkservd is the bit of cPanel's tailwatchd that watches services and emails you when one looks unhappy. Most teams treat its alerts as noise. They are not noise; they have a stable grammar, and once you read them at a glance you can sort signal from flap in two seconds.

What ChkServd is

ChkServd runs inside tailwatchd and polls the registered service list every ~5 minutes. The list lives in /etc/chkserv.d/chkservd.conf and each service has its own driver file under /etc/chkserv.d/<service>. A driver tells ChkServd what port to probe, what banner to expect, and what command to run if the banner is wrong.

ls /etc/chkserv.d/
# apache_php_fpm  cpsrvd  exim     ftpd       imap     mailman  mysql
# named           nscd    pop      queueprocd spamd    sshd
cat /etc/chkserv.d/mysql
# service[mysql]=x,x,x,connect,/etc/init.d/mysql restart,mysql,root

The alert format

Every alert has the same anatomy. Subject:

[chkservd] Service check on cpanel-host -- FAILED: <service> ([reason])

Body, in order:

  1. The service name and the failure timestamp.
  2. The probe ChkServd ran (port, expected banner).
  3. What it actually got (banner mismatch, refused connection, timeout).
  4. The recovery action, if any (Notice: TailWatchd has restarted <service>).

The five alerts you will see most often

[chkservd] Service check -- FAILED: mysql ([connect failed])

MariaDB or MySQL is not accepting connections on 127.0.0.1:3306. Either the daemon is dead (check /var/log/mariadb/mariadb.log for crash) or it is alive but stuck (check mysqladmin processlist).

[chkservd] Service check -- FAILED: <service> is unable to detect a connection on port <N>

The service is running according to systemctl but the TCP probe times out. Three usual causes: firewall rule blocking 127.0.0.1, daemon stuck on a long-running query, or iptables state table full. Tail /var/log/messages and run ss -ltnp first.

Notice: TailWatchd has restarted <service>

Informational, not failure. ChkServd ran the recovery action from the driver file and the service came back. If you see this every hour for the same service, the underlying cause is unresolved and you need to investigate, not silence the alert.

[chkservd] /var/lib/mysql is over <threshold> full

Disk full alarm scoped to the MySQL data directory. Threshold is configurable in WHM > Server Configuration > Tweak Settings > "Maximum percentage of space used by MySQL". Default 95%. When this fires, MariaDB stops accepting writes long before the partition itself runs out.

[chkservd] SSL certificate is expiring on <domain>

cPanel certificate, not Let's Encrypt for the cPanel hostname. Two weeks default warning. If AutoSSL is failing for the cPanel host, this is the only alert that will tell you before things break.

When ChkServd is right vs wrong

It is right almost always when the alert is connect failed on a real port. The service is dead or the listener is bound wrong. It is wrong often when:

  • The probe times out at exactly the same time cron.daily is running. The host is alive; ChkServd just lost the race.
  • The service runs on an odd port and the driver still expects the default. (Same shape of bug as the Imunify360 custom SSH port issue.)
  • The service accepts the connection but takes >5s to send a banner. ChkServd's TCP probe is short-fused.

Tuning ChkServd

WHM > Service Manager lets you toggle which services ChkServd monitors at all and whether it auto-restarts on failure. For per- service probe overrides, edit the driver file under /etc/chkserv.d/. The format is comma-delimited; the cPanel docs linked from the WHM page are the only reliable reference.

# Disable auto-restart for a specific service while keeping monitoring:
sed -i 's/connect,\/etc\/init.d\/mysql restart/connect,\/bin\/true/' \
  /etc/chkserv.d/mysql
/scripts/restartsrv_tailwatchd

ChkServd alerts are the entry point for several incident types we write up in detail:

For the slow log permission flap that often follows a mysql connect failed alert, see the MariaDB slow log permissions quickref.

How ServerGuard uses this

We parse the ChkServd alert subject line into a structured event (service, kind, port, threshold) and route it into the matching use case before paging a human. Most ChkServd alerts resolve themselves in our triage pass before anyone wakes up.

شارك هذه المقالة

XLinkedInEmail
  • قراءة 15 دقيقة

    86 CPU spikes in 24 hours: a multi-cause cascade postmortem

    86 CPU spikes in 24 hours: a multi-cause cascade postmortem The mailbox at 08:00 had 86 ChkServd CPU alerts from , all from the previous 24 hours. Not a single tidy outage with a single cause. A steady drip of "CPU at 95% for the last minut

  • قراءة 12 دقيقة

    cPanel disk full at 96 percent: the backup retention trap

    cPanel disk full at 96 percent: the backup retention trap was at 96 percent. The exact numbers were 931GB used out of 970GB, which left 39GB of headroom on a server that wrote roughly 2GB an hour into mail spools and InnoDB tablespaces alon

  • قراءة 12 دقيقة

    MySQL OOM on cPanel: diagnosing innodb_buffer_pool_size

    MySQL OOM on cPanel: diagnosing innodbbufferpoolsize The page came in at 03:14. cPanel's ChkServd had decided MariaDB was down on , and the on-call inbox was filling up with the alert every cPanel operator eventually learns to dread: A juni