Validation in non-regulated industries/markets

Discussion:

(too old to reply)

Don Y

2024-11-08 04:10:45 UTC

In *regulated* industries (FDA, aviation, etc.), products are
validate (hardware and software) in their "as sold" configurations.
This adds constraints to what can be tested, and how. E.g.,
invariants in code need to remain in the production configuration
if relied upon during validation.

But, *testing* (as distinct from validation) is usually more
thorough and benefits from test-specific changes to the
hardware and software. These to allow for fault injection
and observation.

In *unregulated* industries (common in the US but not so abroad),
how much of a stickler is the validation process for this level
of "purity"?

E.g., I have "test" hardware that I use to exercise the algorithms
in my code to verify they operate as intended and detect the
faults against which they are designed to protect. So, I can inject
EDAC errors in my memory interface, SEUs, multiple row/column
faults, read/write disturb errors, pin/pad driver faults, etc.

These are useful (essential?) to proving the software can
detect these faults -- without having to wait for a "naturally
occurrence". But, because they are verified/validated on non
production hardware, they wouldn't "fly" in regulated
markets.

Do you "assume" your production hardware/software mimics
the "test" configuration, just by a thought exercise
governing the differences between the two situations?

Without specialty devices (e.g., bond-outs), how can you
address these issues, realistically?

David Brown

2024-11-08 09:53:56 UTC

Permalink

Post by Don Y
In *regulated* industries (FDA, aviation, etc.), products are
validate (hardware and software) in their "as sold" configurations.
This adds constraints to what can be tested, and how. E.g.,
invariants in code need to remain in the production configuration
if relied upon during validation.
But, *testing* (as distinct from validation) is usually more
thorough and benefits from test-specific changes to the
hardware and software. These to allow for fault injection
and observation.
In *unregulated* industries (common in the US but not so abroad),
how much of a stickler is the validation process for this level
of "purity"?
E.g., I have "test" hardware that I use to exercise the algorithms
in my code to verify they operate as intended and detect the
faults against which they are designed to protect. So, I can inject
EDAC errors in my memory interface, SEUs, multiple row/column
faults, read/write disturb errors, pin/pad driver faults, etc.
These are useful (essential?) to proving the software can
detect these faults -- without having to wait for a "naturally
occurrence". But, because they are verified/validated on non
production hardware, they wouldn't "fly" in regulated
markets.
Do you "assume" your production hardware/software mimics
the "test" configuration, just by a thought exercise
governing the differences between the two situations?
Without specialty devices (e.g., bond-outs), how can you
address these issues, realistically?

I think perhaps this is confusing systems testing with product testing.
You need to make a clear distinction between the two.

Systems testing is about checking that a /design/ is correct. Much of
that is usually about software testing, but it applies to hardware too.
This will often be done using modified hardware so that you can, for
example, inject /realistic/ faults and check that the hardware and
software function as expected. Depending on the application, you might
also run test boards at high temperatures or otherwise abuse them to
confirm the design.

Production testing is about ensuring that the products made are correct
according to the design. You don't check that the memory works, or the
ECC handler works - you check that you have correctly mounted and
soldered the memory chip and that the memory chip supplier has checked
for production faults.

There are some products where the likelihood of developing partial
faults in the field are high and the consequences of that are serious
but it is useful to be able to keep a partially failed system in action.
There are also products with user-serviceable parts. Then it is often
helpful to have some kind of self-test to identify failing subsystems.

Unfortunately, in some regulated markets, or for some types of "safety
certification", the rule-makers don't understand how this works. The
result is that they insist on extra fault-checking hardware and/or
software that actually decreases the total reliability of the system,
and introduces new parts that in themselves cannot be checked
(systematically, in production, and/or in the field).

How do you deal with it? You follow the rules, even though some of them
were written by muppets.

Niocláiſín Cóilín de Ġloſtéir

2024-11-12 23:27:34 UTC

Permalink

On 8th November 2024, David Brown wrote:
"Unfortunately, in some regulated markets, or for some types of "safety
certification", the rule-makers don't understand how this works. The result is
that they insist on extra fault-checking hardware and/or software that actually
decreases the total reliability of the system, and introduces new parts that in
themselves cannot be checked (systematically, in production, and/or in the
field)."

Professor William H. Sanders from the University of Illinois at
Urbana-Champaign dishonestly boasted in a lecture to us on what he
pretends to be "Validating computer system and network trustworthiness"
that he has solved a problem to guarantee successes against faults and
that NASA had complained that this problem is not solvable. He showed us
this sham supposed solution. So I immediately accused him that this
proposal does not succeed. So he admitted this accusation when he said
"Who checks the checker?" Many papers exist with a similar title to this
question. They are supposedly named after a supposedly common question
about Ancient Roman soldiers.

Regards.

Don Y

2024-11-13 00:21:24 UTC

Permalink

Post by NioclÃ¡iÅ¿Ãn CÃ³ilÃn de Ä loÅ¿tÃ©ir
"Unfortunately, in some regulated markets, or for some types of "safety
certification", the rule-makers don't understand how this works. The result is
that they insist on extra fault-checking hardware and/or software that actually
decreases the total reliability of the system, and introduces new parts that in
themselves cannot be checked (systematically, in production, and/or in the
field)."
Professor William H. Sanders from the University of Illinois at
Urbana-Champaign dishonestly boasted in a lecture to us on what he pretends to
be "Validating computer system and network trustworthiness" that he has solved
a problem to guarantee successes against faults and that NASA had complained
that this problem is not solvable. He showed us this sham supposed solution. So
I immediately accused him that this proposal does not succeed. So he admitted
this accusation when he said "Who checks the checker?" Many papers exist with a
similar title to this question. They are supposedly named after a supposedly
common question about Ancient Roman soldiers.

VALIDATION is simply ensuring the product DELIVERED meets the needs of the
customer to which it is delivered.

TESTING simply ensures that a product meets its specification.

There can -- and often is -- a disconnect between the specification
and "what the customer wants/needs". Because the spec author often
has incomplete domain knowledge OR makes assumptions that aren't
guaranteed by anything.

Because you are trying to prove the device meets the customer's needs,
you have to have THE product -- not a simulation of it or a bunch of
"test logs" where you threw test cases at the code and "proved" that
the system did what it should.

[I patched a message in a binary for a product many years ago. The
customer -- an IBM division -- aborted the lengthy validation test
when my message appeared ("That is not supposed to be the message
so we KNOW that this isn't the actual system that we contracted to
purchase!")]

Because you have to validate the actual product, the types of things
you can do to the hardware to facilitate fault injection are sorely
limited. "Are you SURE this doesn't have any secondary impact on
the product that changes WHAT we are testing?"

E.g., tablet presses produce up to 200 tablets per second. One of
the control systems is responsible for "ensuring" that the *weights*
of the tablets are correct. But, you can't weigh individual tablets
in 5ms. And, you can't alter the weight of a tablet once it is produced!

So, you have to watch the (mechanical) process and convince yourself that
the product coming off the press is /highly likely/ to meet the weight
constraints. You do this by watching the forces exerted on the "powder"
(granulation) in a fixed geometry cavity for each tablet. Obviously,
if more material was present, the force would go up; down if less.

[Alternatively, you can allow the geometry to vary and watch how
LARGE the resulting tablets are -- again, at 5ms intervals]

How do you simulate a PARTICULAR tablet being filled with too much -- or
too little -- granulation? You can't stop the press and add (or subtract)
a little material -- the dynamics of the process would be altered.

But, you can alter the geometry of a particular cavity so that it
takes on more (or less) material. And, further alter it so that
it leaves a visible mark on the affected tablets (e.g., put a score
line on the top surface of THOSE tablets and no score line on the
others.

Now, run the press and make sure ONLY scored tablets are in the "reject"
pile and NO scored tablets are in the "accept" pile. You've validated
the controller's ability to discriminate between good and bad tablet
weights -- without altering the software or hardware in the controller.

This lets issues that may have been omitted in the specification
percolate to a level of awareness that DOES affect the product's
applicability.

"What if the feeder runs out of material? Or, the material has
the wrong moisture content and 'clumps'?"

"What if the mechanism that is responsible for physically sorting
the tablets (at 5ms intervals) fails?"

"What if the tablet press has been improperly configured and
the geometry isn't guaranteed to be fixed (e.g., the process's
compression force is creeping into the "overload" setting
which changes the physical dimensions of each compression
event, dynamically -- so tablet #1 and tablet #2 experience
different compression conditions)"

I was interviewed by a prospective client to work on an
"electronic door lock" system (think: hotels). During the
demo of their prototype, I reached over and unplugged a cable
(that I strongly suspected would inject a fault... that they
would NOT detect!). Sure as shit, their system didn't notice
what I had done and blindly allowed me to produce several
"master keys" while my host watched in horror. All without
the system having a record of my actions!

"Oooops! Wanna bet that wasn't part of the specification?!"

Validation exists for a reason -- separate from and subsequent to
product testing. Because these sorts of SNAFUs happen all the
time!

Don Y

2024-11-13 00:28:59 UTC

Permalink

Post by Don Y
Validation exists for a reason -- separate from and subsequent to
product testing. Because these sorts of SNAFUs happen all the
time!

[Sidetracked by my anecdotes... :< ]

Anyway, the question posed is how to address the "product as delivered"
(in terms of hardware) requirement inherent (and mandated) in validation
for those markets where there are no "rules".

How much can you alter the hardware and still, in good conscience
(and, more practically, in having faith in your results), attest
to the fact that you have verified the product is what it SHOULD
be, despite any deficiencies in the specification(s)? When are
you rationalizing equivalence just because a true "as delivered"
environment is not possible?

[How do you test subsystems, on which you rely, inside an MCU
without a bond-out option? Or, do you simply say that anything
that can't be tested need NOT be tested -- and not even make
an attempt to do so? E.g., Why do we checksum internal FLASH?
Can you simulate a failure -- without altering the hardware -- to
be able to verify that you can detect it?]

David Brown

2024-11-13 08:58:03 UTC

Permalink

Post by Don Y

VALIDATION is simply ensuring the product DELIVERED meets the needs of the
customer to which it is delivered.

By "validation", I would normally think that the /customer/ is accepting
the product as working well enough for their needs. This is roughly
what you wrote, but the emphasis is on who does the validation.

Post by Don Y
TESTING simply ensures that a product meets its specification.

Nope.

Testing only ever shows the presence of bugs - it can never show their
absence. Only in very limited circumstances can you use testing to show
that something meets its specifications - basically, you need to be able
to test the system for every possible valid input (i.e., every input
that is within the usage specifications). That is sometimes possible
for small software functions, but very rarely feasible for larger
functions or anything involving hardware.

And even then, you are not "ensuring" it meets the specifications - you
are "demonstrating" it. The way you "ensure" you meet the
specifications is my having a rigorous enough development procedure
(which will include testing of many types) that you can be highly
confident of correct behaviour. Testing is vital to this, but it is not
remotely sufficient.

Post by Don Y
There can -- and often is -- a disconnect between the specification
and "what the customer wants/needs". Because the spec author often
has incomplete domain knowledge OR makes assumptions that aren't
guaranteed by anything.

Absolutely true. That's part of what makes development fun!

Post by Don Y
Because you are trying to prove the device meets the customer's needs,
you have to have THE product -- not a simulation of it or a bunch of
"test logs" where you threw test cases at the code and "proved" that
the system did what it should.
[I patched a message in a binary for a product many years ago. The
customer -- an IBM division -- aborted the lengthy validation test
when my message appeared ("That is not supposed to be the message
so we KNOW that this isn't the actual system that we contracted to
purchase!")]
Because you have to validate the actual product, the types of things
you can do to the hardware to facilitate fault injection are sorely
limited. "Are you SURE this doesn't have any secondary impact on
the product that changes WHAT we are testing?"

Like I said - you do fault injection and similar testing as a systematic
test of the design and fully replicable parts of the system (like
software). You don't do it on final products.

And customers may be involved in, or have insight into, such fault
injection tests.

You talk about "proving" the device meets the customer's needs, and
"ensuring" it works according to specification. That's just wrong.
This is not a pass/fail binary thing where you have proof of correctness
- its a matter of risk reduction, failure mitigation, and confidence.
You are never trying to make something that is "perfect" in the true
sense - because you can never succeed. What you are trying to do is get
a high confidence that there is a low risk of the product not fulfilling
its specifications, and that the ill-effects of any failings are minimal
within the time and cost budgets.

You achieve this by combining many methods, each of which reduces the
risks. That starts with good specification methods, passes through
design and development phases, test phases (including fault injection
and other systematic tests), production tests, long-term tests,
artificial ageing tests, follow-up testing of deployed devices, and
whatever else suits for the type of project and budget.

Post by Don Y
I was interviewed by a prospective client to work on an
"electronic door lock" system (think: hotels). During the
demo of their prototype, I reached over and unplugged a cable
(that I strongly suspected would inject a fault... that they
would NOT detect!). Sure as shit, their system didn't notice
what I had done and blindly allowed me to produce several
"master keys" while my host watched in horror. All without
the system having a record of my actions!
"Oooops! Wanna bet that wasn't part of the specification?!"
Validation exists for a reason -- separate from and subsequent to
product testing. Because these sorts of SNAFUs happen all the
time!

That's not validation. Validation /does/ exist for a reason - it's what
lets the developer get paid and not sued even if the customer later on
thinks of an extra clause that they should have had in their
specifications. "Validation" is the point when the customer takes on
the risks for the product not actually working according to their needs
(which they might not be fully aware of).

bitrex

2024-11-13 19:18:40 UTC

Permalink

<snip>

OK boss says i gotta build a self driving car huh... ok lets see...
java, that's a given.. alright... *starts typing* public class Car
extends Vehicle {...

Don Y

2024-11-13 19:59:32 UTC

Permalink

Post by bitrex

<snip>
OK boss says i gotta build a self driving car huh... ok lets see... java,
that's a given.. alright... *starts typing* public class Car extends Vehicle {...

Failure to recognize that there WILL be some need to validate your product
is often the undoing of what might otherwise be a successful product.

The effort is usually significant (in Pharma, it begins long before the
product development -- with audits of your firm, its process and procedures,
the qualifications of the personnel tasked with the design/development,
etc.).

For a specific product, you must verify everything documented
behaves as stated: show me that you will not accept invalid input;
show me that the mechanism moves to a safe state when configured
(or accessed) improperly; show me that you can vouch for the information
that your sensors CLAIM and the actions that your actuators purport
to affect; etc. Just stating that a particular error message (or other
response) will be generated isn't proof that it will -- show me HOW you
sense that error condition, how you report it and then give me a real
exemplar to prove that you *can*.

The *customer* ultimately knows how the product will be (ab)used -- even
if he failed to communicate that to the developer at the time the
specification was written (a common problem is the impedance mismatch
between domains: what the customer takes for granted may not be evident
to the specification developer). He will hold its feet to the fire
and refuse to accept the device for use in his application.

[We test drive cars for a reason -- we will eventually BE driving that car!]

So, for a self-driving car, how do you prove you can avoid running over
pedestrians? That you will heed the warning sounds from emergency vehicles?
That you won't drive off a cliff? That you can observe driving constraints
(minimum and maximum speeds, lane closures, etc.) Because the purchaser/driver
of such a vehicle surely wouldn't want to find themselves in one of these
situations! "Ooops! You're entitled to a partial refund..."

E.g., we have one of the few roads in the US that is marked in metric
units (KM/Hr). Will the designer have considered this in his optical
recognition of road signs?

We use blinking yellow arrows to indicate "turn if safe to do so".
We also make left turns AFTER the thru traffic has come to a complete
stop -- except for cases where the "leading left" is observed.
Will the car expect this? Will it know when it is safe to use the
center (reversible/"suicide") traffic lane?

[ISTR places in D.C. where the direction of one-way streets changes
based on time of day. Will the GPS know NOT to route you that way
based on time of day??]

Unless you've considered these issues and responsibilities, you will be
"surprised" at how a profitable DESIGN endeavor can become a money sink!

Don Y

2024-11-13 21:42:43 UTC

Permalink

Post by Don Y
The effort is usually significant (in Pharma, it begins long before the
product development -- with audits of your firm, its process and procedures,
the qualifications of the personnel tasked with the design/development,
etc.).
For a specific product, you must verify everything documented
behaves as stated: show me that you will not accept invalid input;
show me that the mechanism moves to a safe state when configured
(or accessed) improperly; show me that you can vouch for the information
that your sensors CLAIM and the actions that your actuators purport
to affect; etc. Just stating that a particular error message (or other
response) will be generated isn't proof that it will -- show me HOW you
sense that error condition, how you report it and then give me a real
exemplar to prove that you *can*.

E.g., from an FDA document:

"Qualification of utilities and equipment generally includes the following
activities:
• Selecting utilities and equipment construction materials, operating
principles, and performance characteristics based on whether they are
appropriate for their specific uses.
• Verifying that utility systems and equipment are built and installed in
compliance with the design specifications (e.g., built as designed with
proper materials, capacity, and functions, and properly connected and
calibrated).
• Verifying that utility systems and equipment operate in accordance with
the process requirements in all anticipated operating ranges. This should
include challenging the equipment or system functions while under load
comparable to that expected during routine production. It should also
include the performance of interventions, stoppage, and start-up as is
expected during routine production. Operating ranges should be shown
capable of being held as long as would be necessary during routine
production."

Note that this is in addition to validating the *process* to which the
equipment is applied: how do you procure your raw materials, ensure
THEY meet their respective standards, control access to them to prevent
contamination/loss of potency, combine them, store them, distribute them
on the factory floor, assess the performance of the resulting product
E.g. how long for the actives in this product to be present in the
patient's system? how long for a particular coating to dissolve in
the digestive tract? WHERE in the digestive track will that occur
(some products are coated to survive the acidic environment in the
stomach for absorption in the intestines)? etc.

Post by Don Y
The *customer* ultimately knows how the product will be (ab)used -- even
if he failed to communicate that to the developer at the time the
specification was written (a common problem is the impedance mismatch
between domains: what the customer takes for granted may not be evident
to the specification developer). He will hold its feet to the fire
and refuse to accept the device for use in his application.