I think by definition an operating system won't boot if anything mandatory causes an error.Are all mandatory drivers kernel-level drivers by definition? If not, would a user-level mandatory driver (without kernel-level privileges) triggering a NPE cause the same BSOD/boot loop?The story I heard is the channel file CloudStrike tested was somehow damaged just before release and ended up being filled with zeros. Since this file contains something akin to executable p-code, the fact that it was full of zeros subsequently caused a null pointer exception. The driver was marked as mandatory for boot, so Windows got stuck at that point and couldn't finish booting.
I couldn't agree more.
Speaking as someone with significant experience in software test/QA my suspiscion is they only did what I refer to as "programmer testing" i.e. that it works under ideal conditions with no thought given to whether it works in less than ideal, real world conditions, edge cases, and corner cases.
Microsoft has claimed the fault lies with the EU insisting they document a particular kernel API designed for security monitoring. According to Fido, blaming that API is a simple example of let no emergency go to waste. Any mandatory driver that generates a null pointer exception would cause the same boot problem whether it accesses the security API or not.
The problem seems to be bad QA rather than the existence of an API that lets companies with bad QA write drivers.
As a different example, it's possible to place a mandatory mount line in fstab that later prevents Raspberry Pi OS from booting. At boot a user level task reads the configuration file and stops the boot process if something goes wrong. Unix has worked this way since the early releases by Bell Labs in the 70s. People have learned not to mess with fstab without checking for mistakes.
There were two Sun SPARC stations, one named zeno and the other pythagoras. I configured zeno to mount a network filesystem from pythagoras and pythagoras to mount a filesystem from zeno. The fstab file was tested by rebooting each machine separately and everything worked. Then one day the power went out.
Weirdly, circular boot dependencies have been one of the main problems that prevented certain companies quickly recovering from CrowdStrike. Systems simultaneously crashed in all availability zones but no zone could be brought up without at least one of the other zones already online.
According to the dog developer such disaster recovery and resilience mistakes are so important to discover before a real disaster that CrowdStrike should be charging a fee for the service. In my opinion it wasn't such a good idea to tag that kernel driver as mandatory.
Statistics: Posted by ejolson — Wed Jul 24, 2024 2:42 pm