A stinging nettle |
Geoff Barrance, summarizing Dr. Koopman, has grasped the documented and reported unsafe hardware architecture of the Toyota Camry 2004, and, unlike others, has faced its sting for the sake of public health.
Here is his report, freshly produced yesterday. Tom, Lisa, and Kevin, I dare you to read this. With your bare hands.
Geoff Barrance Extracts some Nuggets
from
Dr. Koopman’s Testimony at the
Bookout vs. Toyota trial
In an earlier email to Betsy that she quoted, I referred to
Dr. Philip Koopman’s testimony in the Bookout vs. Toyota trial. I said that Dr. Koopman’s testimony was as
important, and as damning of Toyota’s throttle control system design, as that
of Mr. Barr’s. But Mr. Barr’s seems to
have garnered more attention (and got Toyota legal people all steamed up about
what seems to me to be a trivial redaction in the slides that Mr. Barr used in
his testimony). Now I am not downplaying
how damning Barr’s testimony was of the quality of Toyota’s ETCS software –
indeed I find their lack of adherence to state-of-the-practice quality for
safety critical system software to be quite astounding. But I think we should also be aware that, as
I said before (in no uncertain terms!), the hardware in which that software
runs also ignores the requirements for a safety critical system. Even if they’d had perfect software it’d be no
good because the hardware architecture is unsafe.
So is the ETCS a safety critical system? Sure is.
For example the stuck-at-wide-open-throttle condition is the most
obvious case, and is in fact the case in point in the Bookout trial. So, as Dr. Koopman’s testimony stresses, the ETCS needs to be
designed to the standards for safety critical, and it isn’t.
It is a bit of a slog to read through all Dr. Koopman’s
verbatim testimony, in its question and answer form, as he was led through his
presentation to the court, and I don’t have access to the slides he was
using. But he’s a university professor,
so he says the things that I have been thinking rather well, probably a lot more
understandably than my engineering approach conveys. Anyway, I will give some highly pertinent
abstracts. Links to the full text are given
below. His testimony starts on page 14
of the AM transcript, this is all from the AM part.
It starts with Dr. K giving several reasons why he thinks
the ETCS design is[1]
unsafe. I quote him with some condensation
in places and some explanatory insertions by me, indicated by [ ]:
1.
Random hardware and
software faults are a fact of life.
Random has a special meaning … it means even if you think it was
designed perfectly something always goes wrong anyway. The defective safety architecture has an
obvious single point of failure. A
single point of failure is a critical concept in safety critical systems. I will explain where one is and why that is
important. And reading the NASA report,
they came to the same conclusion.
2.
Toyota’s methods to ensure
safety were themselves defective. You
have to exercise great care when you’re doing safety critical software [and
hardware]. You can’t just wing it. And Toyota exercised some care, but they did
not reach the accepted practice in how you need to design safety critical
systems.
Gambling - with lives |
3.
Third opinion is that the
Toyota safety culture is defective. So
safety culture is how the organization as a whole treats safety. Do they take it seriously, do they have
professionals in place to make sure that even if you’re having a bad day you
will not make a mistake that day, that still things are going to work OK. And I saw several signs of a defective safety
culture. And one example that I will
talk about is that when they’re investigating an accident they don’t seem to
take the possibility that the software can be defective very seriously. They
just say, No, you know, that can’t be defective.
4.
My next opinion is that
Toyota’s source code is of poor quality.
… Even at a high level there is some tell-tale signs that that you don’t
need to look at the individual lines of code to know that there are some severe
problems here. One of them is 10,000 global variables. If you talk to a safety person, and that number
is above 100 they will right there say, you know, that’s it. There is no way this can be safe. … [The] academic standard is that there
should be zero.
5.
Toyota’s approach to
concurrency and timing is defective.
That means that when you’re driving a car and the engine is spinning
around and the spark is firing to ignite the fuel, it has to happen in a very
precise time line. … And in a safety critical system you have to meet
deadlines. … If you miss those deadlines the system is generally considered unsafe.
A bit later in his testimony Dr. Koopman is asked about
the defective safety architecture with an obvious single point of failure,
which he mentioned in his first opinion.
He said:
A single point of failure is one
place that if that has a problem the system is unsafe. … this is probably the most important point
of safety critical design. If you have
any single point of failure the system is by definition unsafe. All the safety standards say you cannot have
any single point of failure.
A single point of failure is some
piece of hardware or software that has complete control over whether the system
is safe or not. And so if it fails due
to a random hardware event or a software bug, if it fails, then the system is
unsafe. And it is kind of tricky because
you don’t say, well, I can think of five ways for it to fail, and I protect
against all those five; that is not good enough. It doesn’t matter whether you’re smart enough
to think about how it is going to fail.
When you have millions of vehicles on the road it will find a way to
fail you didn’t think about. So the rule
is simply you cannot have a single point of failure.
He then points to the Analog to Digital (A/D) converter that
takes in (among other data) the two, supposedly independent (but that’s another
story), values from the accelerator pedal.
Note that he is not talking about the software here; the A/D converter
is a piece of hardware. Again I quote in
a condensed form:
So there are two voltages that
indicate accelerator pedal fully depressed.
This is not a fault mode right now, we’re just talking about normal
operation. In this case it goes into the
A/D portion. [It is] converted to
digital bits that say, Hey, the gas pedal is all the way down. [This information] is sent to the both the
sub CPU and the main CPU. And it says
the gas pedal is all the way down. Okay
let’s get the throttle more open because the driver wants to speed up.
If one of these two wires goes bad
then you’re okay because there are two of them.
And this [computer] will, if it’s working properly, notice they don’t
match and invoke one of the failsafes.
If there is a failure here, for some of the failures it will detect that
it has failed. For some of the failures
it will result in the voltages not matching.
But whether we’re smart enough to think about it or not, there is a
single point of failure that there is always the possibility that something in
here will cause the two voltages to be read as though the gas pedal is all the
way down [when it isn’t] without noticing there is a problem. I don’t know of a failsafe that will catch
all possible, all single point faults in the A/D converter. My concern with that is that it makes the
system unsafe. For example there could
be a fault that just the A/D converter just decides to say, Do you know what,
the gas pedal is all the way down, even though it’s not.
So the failsafes [designed by
Toyota] are based on this … analysis that basically says we are never going to
have a situation in which these signals come through in a way – in a way that
is wrong but undetectable. They’re
assuming that you can always detect that something is wrong. Making that assumption limits your fault
model to only faults that are detectable, not any possible fault. So that falls short of the requirements of
the safety standards. It could result in
unintended acceleration by, for example, if you have your foot on the
accelerator and you release it and this [A/D converter] keeps shoving out stale
data. It just stops updating and keeps
doing the old accelerator position that you used to have.
It could fail that way but it can
also fail by spitting out an arbitrary number.
It is a single point of failure.
And when you look at these [arbitrary failures], you say, What is the
worst thing it could do? Well, the worst
thing it could do is probably command wide open throttle. And there is no independent check and balance
to stop doing it, and that makes it unsafe.
[Toyota’s fail safes cannot catch those failures] Because it is
basically trusting that it will be able to detect any difference, and that’s a
restricted fault model.
So there, in plain and unvarnished language, is why the ETCS
in the 2004 Camry is unsafe. Somehow the
NASA report could not bring itself to say this so forthrightly (and one
certainly wonders why) but it did say that it could not prove there was no case
that would result in unintended acceleration, though of course the DOT’s boss
at the time ignored that and said the opposite.
And the NAS report also failed to grasp the nettle and say what I had
been telling them. Politics I guess.
Anyway, thank you Dr. Koopman for explaining it so well.
Links:
Can you grasp that? |
[emphasis added by Betsy]
[1]
Remember that the present tense he is using refers to Toyota in the early
2000’s. We may assume that much has
improved since then. But there are still
a lot of those 2004 vehicles in use today, and nothing has been done to bring
them up to an acceptable standard.