COTS in space: the 100 percent testing risk

By Tomáš Zedníček
8 years Ago

source: Intelligent Aerospace article

November 13, 2017 By Dan Friedlander, Retired following 44 years in component engineering

The current European Space Agency (ESA) and National Aeronautics and Space Administration (NASA) approach to the use of Electrical, Electronic, and Electromechanical (EEE) commercial off-the-shelf (COTS) components in space applications continues to rely on heavy testing requirements. The official methodology looks more as an addendum to the traditional approach, rather than a new approach based on COTS.

The value of Statistical Process Control (SPC) is more or less disregarded. One hundred percent post-procurement testing is moved from the component manufacturer to the user, regardless of involved difficulties caused by higher complexity, integration, speed, and number of package terminations and lower package geometries of COTS. These parameters make the testing more and more difficult and less efficient, affecting the component electrical and/or mechanical integrity. Testing becomes more and more risky to the components, outweighing the benefit.

I question the rationale behind the benefit of post-procurement 100 percent testing and screening in the different world of SPC. New thinking is required.

Application of SPC
What is Statistical Process Control (SPC)? SPC is a scientific, data-driven methodology for quality analysis and improvement. It is a preventive activity, performed in real-time during the manufacturing process.

Within the traditional MIL EEE Components, the quality of the component is “ensured” by inspection and testing of the product. The relevant acceptance/rejection criteria are established according to how well it meets its design specification. In contrast, SPC uses statistical tools to observe the performance of the production process to detect significant variations before they result in the production of a substandard components. The SPC deals with manufacturing defects, the testing deals with failures sourced from the defects. In other words: “If you want to get rid of mosquitoes, drain the swamp that breeds them.”

The COTS EEE components methodology is based on SPC. The military and space (or MIL) EEE components methodology is based on testing. The statistics do not work for low-volume production. It does not mean, however, that testing by components manufacturers has no value in COTS production, to reach the targeted outgoing quality. At present, no method or combination of methods can guarantee zero defects or failures.

The bathtub curve phases
Traditionally, the bathtub curve is widely used in reliability engineering. The MIL approach describes a “typical” hazard function during the service life of EEE components. The elements of the bathtub curve are shown in the image below. The bathtub curve form is a thinking product, dependent on the inputs building it.

The 1994 Perry directive, or Perry memo on the use of COTS in defense applications by then U.S. Secretary of Defense William Perry, recognized the logic behind the commercial thinking. Reliability must be built into EEE components. One cannot test reliability into EEE components.

The space sector policymakers are extremely slow in adopting the rationale of using COTS in space applications, creating a new affordable methodology, compliant with the present and future global condition. They find logical and less logical arguments to resist a real change in thinking. They ignore the fact that the high-reliability (hi-rely) sector accounts for about 0.3 percent of the whole market.

Some stronger arguments of those resistant to change can be seen from the “bathtub” curve below (graph sourced from Semelab).

We can observe a significant improvement in the early failure phase for commercial EEE components vs. the military EEE components. At the other end, we observe a significant deterioration in the wear out phase for “certain” commercial EEE components vs. the military EEE components. Obviously, the EEE COTS components with a five-year life are not suitable for space applications.

The wear-out phase
The Resistance to Change party members explain the wear out deterioration obstacle by saying: Commercial demand is for more and more performance for less and less financial cost. Many of the end equipment that the components go into have quite short lives (mobile telephones, PCs, satellite receivers, etc.); it is, therefore, possible to reduce the long-term requirement in the design of any semiconductor and assume that a specific component wear out mechanism is acceptable for the specific application.

Theoretically, the above scenario can be true for certain EEE commercial components. Nobody supporting the move to COTS suggests that all COTS components can be used for space applications. Nobody suggests that all space-level or military-level components can be used for space applications. Nobody suggests that selected space/MIL EEE components are not suitable for space applications.

EEE COTS components are not a homogeneous population. A wide range of requirements (e.g., subject service life, criticality, etc.) exists for different commercial applications. In addition, a wide range of component manufacturers are involved in the commercial components industry.

Information as to the robustness of specific EEE components is not publicly available to users. Although theoretically possible, a sane EEE components manufacturer cannot afford to play with the long-term requirement in the chip design and market it on the open market for general use. Anyway, herein, the suggested use of EEE COTS components in space applications is limited to industrial grade, -40 degrees Celsius to +85 degrees Celsius.

Following is lifetime information publicized by Texas Instruments (TI), a longtime semiconductor manufacturer in Dallas, Texas:

CMOS wear out mechanisms and IC design
The current generation of TI industrial-grade embedded processor products is designed to support a 10-year useful operating lifetime of at 105°C junction temperature (TJ). The 10-year lifetime assumes a worst-case situation of 100 percent powered on and run at a constant 105 degrees Celsius TJ temperature.

TI embedded processor (EP) products are designed for reliability so that the onset of the wear out mechanisms occurs beyond the useful life period. Robustness to prominent silicon wear-out mechanisms that are designed for include:

Gate oxide integrity (GOI)
Electro-migration (EM)
Time dependent di-electric breakdown (TDDB)

In addition, mechanisms that cause parametric shift over lifetime, such as Negative Bias Temperature Instability (NBTI) and Channel Hot Carriers (CHC), are also considered within the product design. For most silicon technologies, the critical wear out mechanism is EM.

If the processor runs at 90 degrees C effective temperature instead of the 105 degrees C, x2 increase in useful lifetime can be projected. In other words, a 20-year useful lifetime of the silicon can be achieved provided the application manages the thermal performance to be at an ‘effective’ TJ (junction temperature) of 90 degrees C or below. (For more, visit http://www.ti.com/lit/an/sprabx4a/sprabx4a.pdf.)

The above information meets the requirements of a temperature-controlled space application.

To get a rough idea about industrial applications service-life requirements, TI states:

Many industrial systems require useful lifetimes of 10 years or less, but recent examples of reliability profiles modeled by TI that go above that include:

Telecommunication equipment: 15 years continuous operation.
Industrial controllers in factory electrical supply system: 15 years continuous operation.
Solar invertors: 15 years continuous operation
Water meter: 15 years continuous operation
Electronic Meter: 20 years continuous operation.

The bottom line is that, at present, wear out mechanism of selected industrial-grade EEE COTS components in space applications can be overcome by educated selection. Looking at the present post-procurement 100 percent testing requirements, I do not see the benefit of addressing the robustness of a given EEE component.

The early failure phase
Even a Resistance to Change party member cannot ignore the value of SPC in the EEE COTS components manufacturing process. Among other things, the process builds the reliability into a EEE component. Obviously, that should have some impact on the form of the relevant visual model called bathtub curve (as seen above).

Early failures (“infant mortality”) are highly undesirable and are always caused by defects and blunders: material defects, design blunders, errors in assembly, etc.

The SPC’s purpose is to identify defects caused by assembly or material variations that can lead to failure and to take action to remove the root causes of these defects. Failures are driven primarily by defects.

The SPC approach, eliminating root causes, is generally the best approach and can significantly reduce infant mortalities. The SPC implementation results in better outgoing quality. In addition, the statistics tools allow the manufacturer to monitor variations of parameters relative to the specification limits. Consequently, corrective actions are implemented to readjust the process.

For example, the Cpk is a process capability index which measures how close a process is running to its specification limits, relative to the natural variability of the process. The larger the index, the less likely it is that any item will be outside the specs, causing serious reliability problems. With good quality, even with expected drift over time, generally, the EEE components will continue to have good performance quality over time or high reliability.

As with quality, we see that process variability is the enemy of reliability. In the case of space applications the value of SPC is obvious. It has to be mentioned that in space applications, among other things, the radiation-withstanding capability of a EEE component may be affected by a close to specification limit parameter.

For EEE COTS components, the 100 percent testing/screening value, taking in consideration the value of SPC, the cost, and the risks involved (the risk will be dealt separately), is questionable.

The move from Qualified Product List (QPL) to Qualified Manufacturers List (QML) concept in the military EEE components system gave component manufacturers the authority to modify the MIL-SPEC (now MIL-PRF performance specifications) requirements for screening, based on their own data-supported engineering judgment. This is a step in the right direction.

It is assumed that there is consensus on the high professional level of the TI EEE component manufacturer. Following is the educated decision to deviate from the original MIL requirements for military-grade, or MIL-grade, components.
For QML Class Q devices, Texas Instruments has qualified the modification and/or elimination of several screens and tests as approved by the Texas Instruments Technology Review Board (TRB). These include, but are not limited, to:

Elimination of -55 degree C testing on multiple logic products including the TTL, LS, S, HC, AHC, AHCT, ALS, AS, F, ABT, AC, ACT, and BCT families.
Elimination of -55 degree C testing on specific mixed-signal products including the majority of complementary metal-oxide semiconductor (CMOS) technology-based product families.
Elimination of burn-in or replacing 100% burn-in with lot acceptance on multiple product technologies.
Elimination of Group A acceptance testing on specific DSP and logic product families.
Elimination of 100% temperature cycle and centrifuge for all low pin count ceramic devices (28 pin and below).
Replace 100X pre-cap inspection using the 100X criteria for all product families.

For QML Class V (space-level) devices, of particular note is the absence of read-and-record data collection and delta calculation. Texas Instruments warrants products to meet the DSCC SMD or JAN Slash Sheet as specified with respect to delta limits. Full characterization studies including drift analysis are performed at product release and after any major changes to assure shipped product meets these requirements.

Please note that these exceptions are considered non-value-added (NVA) as qualification performed in accordance with MIL-PRF-38535 has determined that form, fit, function, or reliability is not affected.

The above found non-value-added (NVA) eliminations are only one example of what new thinking can do, when liberated from the chains of a mandatory “cookbook” that is MIL-SPEC.

There are many cases that the military dice are drawn from the commercial production line, namely SPC is applied.

The 100 percent testing/screening risks
It has been shown above that the manufacturing process builds reliability into a EEE component. The heavy testing imposed by the military system is aimed at mitigating the lack of statistical tools due to the low production volume.

The burden of testing/screening (in the space/MIL components systems) is placed on the shoulders of the EEE component manufacturer. This is the right thing to do, knowing that the manufacturer is the best knowledgeable about his designed and produced products. In addition, the manufacturer possesses the relevant costly testing/screening infrastructure, including the highly skilled manpower, needed for the task.

In the commercial EEE components domain, the manufacturer is freed from strictly following a set of screening steps imposed by an outside entity (e.g., MIL/Department of Defense or DoD). It is up to the manufacturer to decide what is needed to design, produce, and test/screen to meet the targeted quality and reliability.

It is a myth that the commercial manufacturer is less concerned about the quality and reliability, compared to a military one. The commercial manufacturer has a Quality and Reliability Plan, containing the known activities of characterization, evaluation, qualification, testing, inspection, reliability monitoring, etc. In addition, due to the higher production, the manufacturer has a very valued process SPC.

Properly selected EEE COTS components proved themselves in the past 20 years. Extensive design, production, and operational use of COTS (as procured) in military applications in harsh environments have resulted in successful experiences. Based on my experience with multiple satellites, the intensive use of EEE COTS components (within new thinking methodology for post procurement testing) in space proved to be successful.

Current official, COTS-penalizing policies place a heavy burden of post procurement testing/screening on the shoulder of the user, following the traditional way of thinking: “cookbook” (MIL). The user may perform the relevant activities by himself or may outsource it.

It seems that the policymakers underestimate the importance of space/military EEE components availability security. They continue to resist new thinking. The COTS methodology is probably the best candidate to solve the availability challenge.

The “100 percent” post-procurement testing/screening requirement is the most questionable penalty on the selection of COTS components. Traditionally, any post-procurement activity is intended to raise the confidence level for using an EEE component in the given space application. The official requirement for most post-procurement activities is based on 100 percent testing/screening. Let us call these activities “upscreening”.

Usually, the risk of damaging the parts during testing is ignored by policymakers. Requirements of 100 percent environmental testing (e.g., temperature cycling, burn in) lead to requirements of 100 percent electrical testing. As mentioned above, EEE component manufacturers are doing whatever they consider necessary to reach their quality and reliability goal. Obviously, the selection of a specific manufacturer or component is an important cornerstone, perhaps the most important, to build the needed confidence to use a specific EEE COTS component for space applications. The selection process requires a high professional level.

The user or policymaker, especially a low-volume demander from the space sector, has little to no chance to change a commercial manufacturer’s COTS methodology. The old proverb, “If you can’t beat them, join them,” applies and the solution is: Adopt the selected manufacturer’s methodology as is.

Nevertheless, post-procurement activities can be shaped to fit reality. It is worth paying attention to the NASA warning in official document PEM-INST-001: “There are numerous data indicating that improper handling and testing of the parts can introduce more defects than are screened out.”

The inevitable movement to COTS only enhances the criticality pointed out by this statement. Huge technological developments result in more mechanically delicate component packaging. The package leads pitch is smaller and smaller, the number of leads is higher and higher, leads coplanarity is critical for soldering to PCB, the package is smaller and smaller, etc. The handling of such components without compromising their integrity is a risky activity. The handling required for 100 percent testing/screening is intensive. It is imperative to minimize the handling of the components.

In addition, the efficiency of electrical testing, especially for highly integrated components and especially when performed by other than the component manufacturer, is questionable. Burn-in is also a 100 percent screening step requiring handling, burn-in boards, special equipment, and electrical testing for verification.

MIL-STD-883, Method 1015, Burn-in Screen, states that “burn-in is performed for the purpose of eliminating marginal devices, those with inherent defects or defects resulting from manufacturing aberrations which are evidenced as time and stress dependent failures. In the absence of burn-in, these defective devices would be expected to result in infant mortality or early lifetime failures under use conditions.”

As seen above, failures originate in the manufacturing process of the die. Manufacturing defects may result in failures. Today’s COTS components are manufactured in a rigorous, statistical process-controlled, high-volume production regime – and that results in substantially better outgoing parts quality.

SPC targets failures prevention, burn-in targets failures detection. Both approaches are aimed at failures elimination from the outgoing EEE components. In both cases, there is a high risk that the 100 perent testing/screening risk outweigh the benefit. In addition, as we have seen above in the TI case and many others, the burn in-requirement is slowly revisited and found non-value-added (NVA).

Post-procurement testing flow
It is not within the scope of this article to go in details about a proper post-procurement testing flow for the use of selected EEE COTS components in space applications. The general direction for COTS should be:

Elimination of 100% post-procurement testing/screening on the flight parts;
Focus on sampling, test samples not actually flown;
Keep a revised Destructive Physical Analysis (DPA) flow to address process issue and the package related ones.

In summary, Albert Einstein wrote: “We cannot solve our problems with the same thinking we used when we created them.”

Author biography
The author, Dan Friedlander, graduated Engineering School/Tel Aviv University with a degree in physics (1965-1969). He has 44 years of experience in Component Engineering at MBT/Israeli Aerospace Industries (1969 to 2013), as Head of Components Engineering. As such, he was responsible for all aspects of EEE components – including policymaking, standardization at corporate level, approval, etc. – for military and space applications. Now retired, Friedlander is an industry consultancy (2013 to present).

featured image source: Intelligent Aerospace

Categories: Aerospace & Defence, Capacitors, Inductors, Resistors

Related Content