A Bug 14 Years In The Making

This bug had existed since late 2011 at the very least, and the impact on a single node when it happens was catastrophic rm -rf /home. The good news is; Already patched, already in distribution and it required many things to actually happen. Only one server was impacted.

IMPACT: Single seedbox server, only users on that server were impacted. Complete deletion of /home user directories.

The bug is now fully fixed in PMSS, and the exact failure path that allowed this to happen has been removed.

The story goes a bit like this

Users opened tickets about 502 Bad Gateway errors. Typical, just lighttpd crashed? But no this was persistent. Few more from same node.
Logging in, /home showed almost no data. Directories gone. No, it actually is mounted not a for some reason not mounted issue neither. Data drives are intact too.

Here begins a frantic hunt taking most of the day, presuming the worst possible; Severe security issues. The very first instinct, security breach? Node rooted? Absolutely zero indications in any of the logs (auth, ssh etc.) of any kind of sign of external compromise was found. Nothing. Not a beep. Just crickets. Crickets in this case is the typical botnets trying to brute force entries, that's it. Nothing unfamiliar in login IPs, users, no suspicious commands, no user deletions.

Baffled how this could happen, grepping logs etc. found there was a system update in progress around the time this happened, and weird flood of errors on logs. but nothing related. Some script was failing because a require_once() for a library failed. Okay, noted, but that still didn’t explain /home.

Wait, Wait! Why is there a .quota file in /home?

That file is for user accounts, per-user, and should never exist directly under /home. File is empty but exists, should not even exist. Makes no sense.

cron/updateQuotas.php: Takes user list, iterates it. It has rm -rf /home/{$thisUser}/.quota; quota -u {$thisUser}.... Okay, nothing to worry about. Completely harmless, and as it has been always.

It takes data from our listUsers.php script so all fine right. NO; listUsers.php had started throwing PHP fatals with stack traces. Isolating the strings, formulating the resulting commands and there it was rm -rf /home/  thrown in ..... /.quota --- that's it! Malformed input.

Taking the exact error line from the logs, dropped it into $thisUser, and reconstructed the shell command. The result included the smoking gun:

rm -rf /home/ thrown in /scripts/lib/user/userFilesystem.php on line 95/.quota

In a shell, that turns into rm -rf /home/ ... - That’s the moment it was exactly clear what happened.

We had 100% trust our own scripts to return valid data and handle all the sanitization BUT this script was recently changed and something caused it to error on this server: An error about missing file. That script would result in rm -rf /home/ and also removing that said file. Our updateQuotas did not validate input from a trusted source, why would it? -- well because it might throw an error one day in future; and technically that is not internal code (library call we could use), but external (script executed through shell); It returns a string which we minimally parse and not a structured data array (like internal library call would).

The full incident report is in our repo, if you want to read it:  PMSS/docs/incidents/2025-12-08-home-wipe-updateQuotas-listUsers.md

Hardened all the inputs

We hardened all the inputs, and the listUsers.php itself.
Multiple ways of mitigating, including hardening the update process.

Probabilities

The probabilities for this to happen are very slim, as evidenced by 14 years in existence before it happened the first time. The conditions required to trigger this were insanely specific, and the failure path has now been removed. We do not expect this to recur.

Patch Implemented

We have patched this, and similar issues hardened in many places. Many oldest servers already received the update.
We might harden our stance on rolling updates on this, or just update the affected pieces with be-spoke hot fix on servers with older versions. A dozen or two nodes have already been updated.

 

What do you need to do?

Nothing. If you were affected, we are contacting shortly.
If you were not affected, your service is working normally with all your data in place.



Thursday, December 11, 2025

« Back