SSH brute force on cPanel: the 8,127-attempt night and the fix
A postmortem on 8,127 failed SSH logins to a cPanel server in six hours from rotating /24s, why lfd alone could not see the pattern, and the layered fix.
SSH brute force on cPanel: the 8,127-attempt night and the fix
The first lfd alert landed at 02:14. Five failed root logins from a
single address in Bulgaria, blocked at the 5/300s threshold, business as
usual. By 02:31 the inbox had nine more of the same alert from nine
different addresses, all in the same /24. By 03:00 the count was over
eight hundred attempts, the lfd alerts had stopped being readable as
individual events, and the SSH daemon on cpanel-host was
spending more CPU on auditd rejections than on legitimate sessions.
By the time the on-call engineer finished the morning audit at 08:00,
/var/log/secure had recorded 8,127 failed login attempts across six
hours from two coordinated /24 ranges, with rotating usernames, never more than ten
attempts from any single source address.
This is the postmortem of why a default CSF + lfd install does not stop that kind of attack, the layered fix we deploy now, and the one guardrail we will not remove no matter what the attack pattern looks like: the admin IP allowlist is non-negotiable.
The night the alerts started
The first alert was unremarkable. lfd's default LF_SSHD threshold is
five failed logins from a single IP in 300 seconds, and the address in
that alert had crossed it cleanly. lfd added a csf.deny entry, sent
the email, and moved on. The first nine minutes were ordinary.
By 02:31 we had fifty failed logins recorded in /var/log/secure
across twelve usernames (root, admin, oracle, postgres,
ubuntu, deploy, git, jenkins, centos, ftpuser, test,
user) and ten source addresses. No single address had hit the five-in-
300-seconds threshold yet. They were rotating fast enough that lfd's
per-IP counters never tripped. Twelve of the alerts that did fire came
from inside the same /24. There was no pattern in the lfd email
summary because each address was being scored independently.
By 03:00 the count was eight hundred and the rate was climbing. csf -t
showed only eleven temporary blocks in place. The attacker was paying
the cost of one IP getting blocked every minute or so and was getting
the other ninety percent of attempts through. The defence was per-IP;
the attack was per-subnet.
That is the realisation that defines this incident: lfd is a
single-attacker tool. It was designed when "brute force" meant one
machine in someone's basement running hydra against your server.
Modern coordinated SSH brute force runs from rented /24s and never
exceeds the per-IP threshold of any one address. The defence has to
move up one octet.
What the log looks like
This is the redacted shape of /var/log/secure during the worst
fifteen minutes. The hostnames have been replaced per our anonymisation
glossary; the attacker addresses below are illustrative — the shape of
the log matters, not the specific source IPs.
Mar 22 02:47:13 server sshd[18421]: Failed password for root from 203.0.113.12 port 51234 ssh2
Mar 22 02:47:14 server sshd[18421]: Connection closed by authenticating user root 203.0.113.12 port 51234
Mar 22 02:47:18 server sshd[18443]: Failed password for invalid user oracle from 203.0.113.47 port 39112 ssh2
Mar 22 02:47:22 server sshd[18445]: Failed password for invalid user postgres from 203.0.113.91 port 44820 ssh2
Mar 22 02:47:25 server sshd[18447]: Failed password for invalid user ubuntu from 203.0.113.118 port 41003 ssh2
Mar 22 02:47:29 server sshd[18451]: Failed password for invalid user deploy from 198.51.100.22 port 60112 ssh2
Mar 22 02:47:33 server sshd[18454]: Failed password for invalid user git from 198.51.100.74 port 41884 ssh2
Mar 22 02:47:37 server sshd[18458]: Failed password for invalid user jenkins from 198.51.100.119 port 38222 ssh2
Mar 22 02:47:40 server sshd[18460]: Failed password for invalid user admin from 203.0.113.203 port 52001 ssh2
Mar 22 02:47:43 server sshd[18463]: Failed password for invalid user centos from 198.51.100.201 port 33910 ssh2
# ... 138 more lines in the same fifteen-minute window ...Two things stand out. The first is that no single address appears more
than nine times in the entire six-hour window, well under any per-IP
threshold a sensible operator would set. The second is that if you
group the lines by the first three octets of the source address,
the first attacker subnet accounts for 4,318 of the 8,127 attempts
and the second accounts for the remaining 3,809. The pattern is
invisible per-IP and obvious per-subnet.
The awk one-liner that turns the log into that summary is short enough to memorise and worth memorising:
grep "Failed password" /var/log/secure \
| awk '{for(i=1;i<=NF;i++) if($i=="from") print $(i+1)}' \
| awk -F. '{print $1"."$2"."$3".0/24"}' \
| sort | uniq -c | sort -rn | head -20Run it during any suspected attack. If the top two entries account for more than half of the failed-login traffic, the attack is distributed and the fix is not per-IP.
Why CSF and lfd alone are not enough
CSF can block CIDR ranges. lfd can read /var/log/secure in real time.
The defaults do not put those two things together because the defaults
are tuned for the threat model of fifteen years ago. The relevant
limits are:
LF_SSHD(default 5) counts failed logins per IP, not per subnet.LF_TRIGGER_PERMcontrols whether a tripped IP is blocked permanently or temporarily, not how broadly.DENY_IP_LIMIT(default 200) caps how many entriescsf.denycan hold before old ones get rotated out. A distributed attack will fill this in an hour and start evicting genuine blocks.
There is no built-in LF_SSHD_SUBNET. The subnet check has to be added
in a cron script, and the script has to know how to talk to CSF.
We layer the defence:
- Admin IP allowlist first, before any other change.
- Move SSH off port 22.
- Add a subnet-level detection script that calls
csf -don/24ranges exceeding a threshold. - Disable password authentication entirely.
- Optionally add 2FA for admin accounts that still need shell access.
Each step is independently useful. Together they take the attack surface from "anyone on the internet can probe SSH" to "the attacker has to find a non-port-22 service, brute force keys instead of passwords, and clear a 2FA challenge". The cost of probing goes up by several orders of magnitude. The cost to a legitimate admin goes up by about thirty seconds the first time and zero seconds after that.
The fix we deploy now (in this order)
Step 1: whitelist your admin IP first
The catastrophic scenario for the rest of this post is that one of the
changes blocks the operator running it. Before touching anything, we
add the admin's static IP to csf.allow with a comment that explains
why it is there.
csf -a 198.51.100.10 "Permanent admin: do not remove"
grep 198.51.100.10 /etc/csf/csf.allowThe second line is not optional. We verify the entry exists in the file, not just that the command returned cleanly. CSF will write to the allow list even when the address is malformed; the verification grep is the only thing that catches a typo before it matters.
If the admin works from more than one static address, every address goes in before step 2. If the admin works from a dynamic address, this step is "set up a jump host on a static IP first" and step 2 waits.
Step 2: move SSH off port 22
This is the change that produces the biggest single drop in attack volume. Most opportunistic SSH scanners only hit port 22. Moving the service to a high non-standard port reduces the noise by more than ninety percent in our measurements. It is not security (anyone who wants to find your SSH port will find it) but it is signal-to-noise relief that lets the rest of the defence breathe.
Pick a port high enough that scanners do not bother. Avoid the
"obvious" alternates (2222, 22222, 2200) because the scanners
that know about port changes know about those. Pick something in the
15600 to 64000 range that no other service on the box uses.
The cPanel-specific change is in three places, in this order:
# 1. Edit sshd config to listen on both old and new during the cutover.
vi /etc/ssh/sshd_config
# Add: Port 15672
# Keep: Port 22
sshd -t # syntax check before restart
systemctl restart sshd
# 2. Open the new port in CSF.
vi /etc/csf/csf.conf
# Edit TCP_IN to include 15672
csf -r # restart CSF
# 3. Test the new port from the admin host BEFORE killing port 22.
ssh -p 15672 admin@cpanel-hostOnce the new port is verified working from every admin workstation
(not "I tested it once and it worked", but "every operator who needs
shell access has logged in successfully via the new port") port 22
gets removed from sshd_config and from CSF's TCP_IN. Imunify360
needs the same change if it is in the firewall path; see our Tier 3
reference on the Imunify360 custom-port settings.
This step is the one most cPanel operators resist because cPanel itself keeps documentation that assumes port 22. The documentation is wrong for any production server.
Step 3: add subnet-level detection
The script below reads the last hour of /var/log/secure, groups
failed-login source addresses into /24 ranges, and calls csf -d on
any range that crossed our threshold. Threshold for our environment is
fifty failed attempts from a single /24 in the last hour; this catches
the kind of distributed attack the 8,127-attempt night was and produces
roughly zero false positives in our weekly review.
#!/bin/bash
# /usr/local/sbin/csf-block-subnets.sh
# Block /24 subnets with more than 50 failed SSH logins in the last hour.
set -euo pipefail
THRESHOLD=50
WINDOW_MIN=60
LOG=/var/log/secure
ALLOWLIST=/etc/csf/csf.allow
# Build the time prefix to grep for the last $WINDOW_MIN minutes.
SINCE=$(date -d "${WINDOW_MIN} minutes ago" "+%b %_d %H:%M")
awk -v since="$SINCE" '
$0 >= since && /Failed password/ {
for (i=1;i<=NF;i++) if ($i=="from") print $(i+1)
}
' "$LOG" \
| awk -F. '{print $1"."$2"."$3".0/24"}' \
| sort | uniq -c | sort -rn \
| awk -v t="$THRESHOLD" '$1 >= t {print $2}' \
| while read -r subnet; do
# Refuse to block any subnet that overlaps the admin allowlist.
if grep -qE "^${subnet%/*}\." "$ALLOWLIST"; then
logger -t csf-block-subnets "skipping $subnet (overlaps allowlist)"
continue
fi
if ! csf -g "$subnet" | grep -q "DENY"; then
csf -d "$subnet" "ssh-bruteforce auto $(date +%F)" >/dev/null
logger -t csf-block-subnets "blocked $subnet"
fi
doneTwo details are load-bearing. The first is the allowlist check before
the block call. The script refuses to block a /24 that overlaps the
admin allowlist even if the data says it should. Locking the operator
out of their own server is the failure mode that matters most.
The second is the csf -g check before the csf -d call. CSF's
csf.deny file is finite (default 200 entries), and blocking the same
subnet twice in quick succession rotates legitimate older blocks out of
the list. We only add a new block if the subnet is not already in the
deny rules.
Cron schedule, every five minutes, with output silenced because the script logs to syslog already:
*/5 * * * * /usr/local/sbin/csf-block-subnets.sh >/dev/null 2>&1Step 4: disable password authentication
Password auth is the entire reason brute force exists. Disabling it turns the 8,127 attempts of this incident into 8,127 attempts that cannot succeed regardless of which password was tried, because there is no password prompt. The cost is a one-time key-distribution exercise.
We generate keys for every admin who needs shell access, copy each
public key into the appropriate ~/.ssh/authorized_keys on the server,
and verify each operator can log in with their key while password auth
is still enabled. Only after every admin has confirmed working key auth
do we change sshd_config:
# /etc/ssh/sshd_config
PasswordAuthentication no
ChallengeResponseAuthentication no
UsePAM yes
PermitRootLogin prohibit-passwordThe order matters. Disabling password auth before every admin has working key auth is the second catastrophic scenario, and it has cost people their weekend more than once. We will not flip the flag until the operator running the change can show us a list of every admin user on the box paired with a confirmed key-auth login from each.
Step 5: optional 2FA via Google Authenticator
cPanel servers can run PAM-based 2FA for SSH with
google-authenticator-libpam. Worth it for any account with sudo
access on a server that handles client data. Not worth it for
read-only operator accounts that already have hardware key-auth and a
narrow IP allowlist; the extra friction does not buy proportionate
security on those.
When we do enable it, the setup walks each admin through
google-authenticator -t -d -f -r 3 -R 30 -W and the QR code, the
emergency codes go into the operator's password manager, and the
auth required pam_google_authenticator.so line goes into
/etc/pam.d/sshd after auth substack password-auth.
What we never recommend
Several "solutions" come up in cPanel forum threads about SSH brute force. We do not recommend any of these.
Disabling SSH "to be safe". You will need it the moment something breaks that the WHM UI cannot fix, and the something-breaks moment will not wait for you to re-enable the service.
Port-knocking. Security through obscurity that breaks every automation tool we use, including the backup runners and the monitoring agents.
Custom one-off iptables rules outside CSF. They survive CSF restarts
inconsistently and they will be invisible to the next operator on the
team. If the rule is worth keeping, it goes in csf.allow, csf.deny,
or csfpre.sh.
"AI-powered SSH protection" SaaS that does not show you the rules it adds. We say this with a self-aware wink because we are building an AI-driven product, and we still think you should never deploy something that hides what it is doing from the operator. Every action our use case takes is logged and reversible. If a vendor will not show you the rules, the vendor is the threat.
Geolocation as a force multiplier
Most legitimate SSH on a single cPanel server comes from two or three
known countries. Blocking everything else via MaxMind GeoIP + CSF's
CC_DENY directive is a fast way to cut another order of magnitude off
attack volume:
# /etc/csf/csf.conf
CC_DENY = "CN,RU,BR,VN,IN,KR"
CC_ALLOW = ""
CC_ALLOW_FILTER = ""The honest limit on this is that GeoIP is a heuristic, not a barrier.
Attackers route through VPN endpoints in your allowed countries and
through rented residential proxy pools that have addresses everywhere.
The 8,127-attempt night included traffic from at least four countries
that any sensible CC_ALLOW list would include. Geolocation reduces
the volume of background scanning; it does not stop a determined
attacker. We deploy it because the volume reduction is real and the
operational cost is near zero, not because we trust it as a primary
control.
The forensic data you should be collecting
When step 3 starts blocking subnets automatically, the post-incident question is always "did any of them succeed before we caught them?". The data that answers that question has to exist before the incident, not after.
What we keep, with retention:
/var/log/securerotated weekly, kept for 90 days, never compressed away during an active incident.- lfd CT (connection tracking) logs at default rotation.
- Apache access logs around any successful SSH login from an unfamiliar address, cross-referenced for control-panel activity.
- Output of
lastandlastbcaptured nightly into a separate retention directory.lastshows every successful login;lastbshows every failure. The two together answer "did anyone get in".
The mandatory follow-up: any time a login succeeds from an address that is not on the operator allowlist, we treat it as a potential compromise until proven otherwise. The proof is matching the timestamp to an explanation: a known operator on a known travel day, a known contractor on a known engagement. If there is no explanation, the account is locked and the keys are rotated before anyone goes back to sleep.
The wider firewall context for this (why CSF and Imunify360 sometimes fight each other in ways that complicate brute-force response) is in our postmortem on the CSF, lfd, and Imunify360 conflict. And the close cousin to SSH brute force, the WordPress xmlrpc and wp-login flood that hits at the same hours from the same kinds of attackers, is the subject of the 27-site xmlrpc abuse postmortem.
The 60-second audit
Four commands. Run them now. If any of them returns more than the expected output, you have ongoing SSH probing and you should not finish reading this post before starting on step 1.
# 1. How many failed SSH logins in the last hour?
grep "Failed password" /var/log/secure \
| awk -v cutoff="$(date -d '1 hour ago' '+%b %_d %H')" '$0 ~ cutoff' \
| wc -l
# 2. Which /24 subnets account for the most failed logins today?
grep "Failed password" /var/log/secure \
| awk '{for(i=1;i<=NF;i++) if($i=="from") print $(i+1)}' \
| awk -F. '{print $1"."$2"."$3".0/24"}' \
| sort | uniq -c | sort -rn | head -5
# 3. How full is csf.deny right now?
wc -l /etc/csf/csf.deny
# 4. Is the admin IP allowlist still in place?
grep -E "198\.51\.100\.10" /etc/csf/csf.allowHealthy numbers for a quiet cPanel box: command one under fifty,
command two showing the top subnet well under twenty attempts, command
three under a hundred entries, command four returning exactly the
expected allowlist line. Numbers that look like the 8,127-attempt night
are command one over a thousand, command two showing two subnets each
above a hundred, command three at or near the DENY_IP_LIMIT cap, and
command four. If this one comes back empty, stop reading and run step
1 of the fix immediately.
How ServerGuard handles this
ServerGuard's use case for distributed SSH brute force is implemented today and is the canonical second scenario in the product hero animation. The detection and the autonomous Safe action are both live; the deeper migration work to key-only auth and 2FA stays in human hands by design.
What the use case does today:
- Detection. SGuard subscribes to lfd's event stream and tails
/var/log/securein real time. The threshold (fifty failed attempts from a single/24in the last hour) is the same one the cron script in step 3 uses; the difference is that SGuard reacts inside thirty seconds rather than waiting for the next five-minute tick. - Action 1, Safe, autonomous. Block the offending
/24via CSF. Before the block call, SGuard verifies the admin allowlist is intact and refuses to add any rule that overlaps an allowlisted address. The refusal is hard-coded and cannot be overridden from the dashboard. - Action 2, Safe, autonomous. Increment a per-server daily counter of subnets blocked and post the summary to the operator's daily digest. The counter is one of the early signals we use to decide whether a server needs the deeper hardening of steps 2, 4, and 5.
- Action 3, Moderate, with approval. Suggest the port-change and the
key-only-auth migration when the daily counter crosses a tuneable
ceiling. SGuard will draft the
sshd_configdiff and the CSF changes; it will not execute them without explicit operator sign-off via the approval flow, because the catastrophic-lockout scenarios in steps 2 and 4 are not risks we want the autonomous layer to take.
The non-negotiable: SGuard's admin IP allowlist is treated as a hard
constraint, not a soft preference. If the data says the attack pattern
is coming from the same /24 as an allowlisted admin address (and
this does happen, when an operator's ISP rotates their static address
into an adjacent block that has been compromised elsewhere) SGuard
refuses to block the subnet and alerts the operator to choose between
moving their static address or temporarily removing the allowlist.
There is no automated path through that decision because there should
not be one.
If you are running a cPanel server today and the 60-second audit above returned numbers that look like an attack, the first three steps of the fix are deployable in under an hour and they will hold against the shape of brute force we have seen consistently across the last three years. The ServerGuard use case automates the detection and the Safe block; it does not automate the parts of the response where being wrong locks you out of your own server.
مقالات ذات صلة
- قراءة 14 دقيقة
Locked out of cPanel SSH: VNC, iptables, and the way back in
Locked out of cPanel SSH: VNC, iptables, and the way back in The terminal hangs. You hit Enter again. Nothing. You try a different SSH client. Nothing. You try from your phone's hotspot, on a different ISP, with a different public IP, and S
- قراءة 14 دقيقة
CSF, lfd, and Imunify360: why your firewall is killing itself
CSF, lfd, and Imunify360: why your firewall is killing itself The page came in at 03:14. A cPanel node on had stopped accepting new connections to wp-login on three sites, then started accepting them again, then stopped. The firewall was al
- قراءة 14 دقيقة
xmlrpc.php abuse and the 27-site one-shot fix on cPanel
xmlrpc.php abuse and the 27-site one-shot fix on cPanel The first time floods one of your servers, you Google the symptom, find a guide called "how to disable xmlrpc.php in WordPress", install a plugin, click a checkbox, and move on. The se