CrowdStrike: There’s just no excuse


I believe invasive SW like CrowdStrike should rarely need to be installed.

It can make sense to install it in an employer-provided laptop because lots of people like to click on whatever link they receive and install whatever app they see. The employee is in full control of the laptop for long periods of time, and he or she regularly walks into office buildings with what could be a potentially-tampered laptop with a trojan horse inside, ready to be plugged into the company’s internal networks.

But it makes no sense to install it in:

  • POS (Point of Sale) Terminal
  • ATM machine
  • Medical Equipment (!!!)
  • 911 emergency services
  • Airport Ticket terminals
  • etc

These systems are owned, managed and monitored by the company. It’s the job of the IT to lockdown these systems (e.g. no admin/root passwords with either limited or guest accounts; and if you will, regular rollback scripts). Lots of bonus points for locking out or deterring physical access to the machine or its ports.

Some systems, like medical equipment, are not supposed to be connected to the Internet. Airport terminals should only be connected to LANs, which are connected to the internet; but not to the internet directly. And I don’t mean NAT. I mean terminal computers send and receive passenger & airplane info through a custom protocol to an internal, local server that actually communicates to the internet. Terminals should not be physically capable of reaching an internet gateway.

But let’s forget about all that. Let’s say I’m 100% with Windows being used on these systems (I disagree; but at least it’s debatable) and that I also agree with CrowdStrike or similar rootkits installed on these systems (which I absolutely do not).

Here’s what went wrong:

  1. No staging. You don’t just roll an update directly to users. You put it into a pipeline with several steps that will do thorough testing and validation before that same package is distributed to users.
  2. Gradual roll out. You don’t just roll out an update to 100% of users so quickly. It needs to be done in batches. 20%, then 40%, until you reach 100%. And of course, monitor and pause the roll out if issues are found.
    • The speed of the rollout will depend on how urgent an update is. Most of the time you want a slow rollout unless you have a 0-day exploited in the wild where you want a fast rollout but still not instantaneous.
  3. No deploys on Friday (Thursday midnight is still the same thing) unless there’s a 0-day in the wild.
  4. Run a fuzzer. It’s blatantly obvious you’re not running a fuzzer. C’mon guys, security is your thing. You should know what a fuzzer is. Use it.
  5. SIGNATURE VALIDATION! A kernel mode driver must never, and I mean must NEVER load whatever random file it finds in a folder!!!
    • CrowdStrike must have a JSON/TOML/YAML file of all the files that are supposed to be loaded. It must not rely on enumerating all the files in a folder.
      • When updating such JSON file, make sure the operation is transactionally atomic. And leave backups.
    • CrowdStrike must check the file’s signature. By signature I don’t just mean a checksum. I mean a cryptographic digital signature that guarantees “this file was created CrowdStrike”. C’mon guys, this is basic. You screwed up the most basic rule about Secure Boot!
    • If it succeeds, proceed to the next file until loading is complete.
    • If it fails, disable CrowdStrike and initiate recovery mode. A missing file must be considered failure. Although this time it led to a happy ending, deleting a *.sys file must not lead to restoring the system into a bootable state. That is absolutely flabbergasting. It means you have no integrity checks of any kind, no versioning system either to guarantee two ABI-incompatible *.sys files aren’t loaded simultaneously.
    • Recovery mode could either be a Windows Recovery rollback (this option must always be available), or an internet-enabled mode where an update can be downloaded to fix a broken update (this option should be optional).
    • Unit tests should always stress-test Recovery Mode by intentionally corrupting files (both with valid and invalid signatures) to ensure it always works when things go sideways.

So many things went wrong that I’m convinced it’s full of exploits ready to be taken advantage of. Computers are likely much more secure by uninstalling the rootkit than keeping it installed.

Ah that’s it. It’s off my chest. As I said in a tweet, err I mean X post: CrowdStrike is a compendium of what not to do.

It wasn’t “just a fuck up”. It was a cascade of mistakes. And that includes customers installing such SW on devices that should not have to rely on CrowdStrike for security. Because unfortunately lots of mission-critical systems don’t work the way they should or they way I’d want. They work they way they do.