Fault Simulation for Structural Testing of Analogue Integrated Circuits

being a Thesis submitted for the Degree of

Doctor of Philosophy

in the University of Hull

by

Stephen James Spinks BEng. (Hons.)

February 1998
Abstract

In this thesis the ANTICS analogue fault simulation software is described which provides a statistical approach to fault simulation for accurate analogue IC test evaluation. The traditional figure of fault coverage is replaced by the average probability of fault detection. This is later refined by considering the probability of fault occurrence to generate a more realistic, weighted test metric. Two techniques to reduce the fault simulation time are described, both of which show large reductions in simulation time with little loss of accuracy.

The final section of the thesis presents an accurate comparison of three test techniques and an evaluation of dynamic supply current monitoring. An increase in fault detection for dynamic supply current monitoring is obtained by removing the DC component of the supply current prior to measurement.
# Table of Contents

## Chapter 1 - Mixed-Signal and Analogue IC Testing Overview

1. Introduction ................................................................................. 1-1  
2. The Problems in Testing Analogue ICs ..................................... 1-1  
3. Conclusion ................................................................................. 1-2

## Chapter 2 - Literature Review and State of the Art

1. Introduction ................................................................................. 2-1  
2. Structural Testing of Analogue ICs ............................................ 2-1  
   2.1 Introduction ........................................................................... 2-1  
   2.2 A Review of Fault-Based Analogue Circuit Testing ............... 2-2  
      2.2.1 Supply Current Monitoring Techniques ............................. 2-2  
         2.2.1.1 Introduction ............................................................. 2-2  
         2.2.1.2 Supply Current Monitoring of Analogue ICs ............... 2-2  
         2.2.1.3 Built-In Current Sensors for Supply Current Monitoring .... 2-3  
         2.2.1.4 Other Test Techniques based on Supply Current Monitoring ... 2-4  
         2.2.1.5 Practical Application of Supply Current Monitoring ........ 2-5  
         2.2.1.6 Supply Current Monitoring-Based Fault Diagnosis Techniques .... 2-5  
      2.2.2 Other Structural Test Techniques ...................................... 2-5  
      2.2.3 Structural DFT and BIST Techniques ............................... 2-6  
3. Fault Modelling for Analogue IC Testing .................................. 2-7  
   3.1 Introduction ........................................................................... 2-7  
   3.2 IC Failure Mechanisms and Defect Analysis .......................... 2-7  
   3.3 Bridging and Break Faults .................................................... 2-8  
      3.3.1 Defect Analysis of Interconnect Faults ............................ 2-8  
      3.3.2 Analysis of Defect Analysis Results ................................ 2-10  
      3.3.3 A Review of Simulation Fault Models Used for Fault Simulation Based on Interconnect Defects .......... 2-11  
         3.3.3.1 Fault Modelling Based on the Circuit Level ................. 2-11  
         3.3.3.2 Gate Open Fault Modelling ..................................... 2-14  
         3.3.3.3 Fault Modelling using IFA Results .............................. 2-14  
   3.4 Shorts Occurring due to Defects in the Gate Oxide Layer ......... 2-16  
      3.4.1 Introduction ................................................................... 2-16  
      3.4.2 Gate-Channel GOS Fault Models ................................... 2-17  
      3.4.3 Gate-Diffusion GOS Fault Models .................................. 2-17  
   3.5 Hierarchical Fault Modelling ............................................... 2-17  
      3.5.1 Aim .............................................................................. 2-17  
      3.5.2 Literature Review ........................................................... 2-18  
4. Inductive Fault Analysis ........................................................... 2-19  
   4.1 Introduction ........................................................................... 2-19  
   4.2 A Review of IFA Results for Fault Simulation ...................... 2-19  
   4.3 Alternative Approaches to IFA using Realistic Fault Mapping ... 2-21  
5. Analogue Fault Simulation ....................................................... 2-22  
   5.1 Introduction ........................................................................... 2-22  
   5.2 Early Fault Simulation Work for Fault Diagnosis .................. 2-22
5.3 A review of Analogue Fault Simulation Systems for IC Testing .................. 2-23
6. Summary, Conclusions and Justification of Work .................................. 2-26
7. Aim of Work and Structure of Thesis .................................................. 2-27

Chapter 3 - The ANTICS Analogue Fault Simulation Software

1. Introduction .................................................................................................. 3-1
2. ANTICS - An Overview ............................................................................ 3-1
   2.1 Why Use a Commercial Simulator for Fault Simulation? ................. 3-2
   2.2 The HSPICE Circuit Simulator [HSPI96] ........................................... 3-3
3. Simulating Process Parameter Deviations .................................................. 3-4
   3.1 Introduction .......................................................................................... 3-4
   3.2 Monte Carlo Simulation ...................................................................... 3-4
   3.3 Statistical Modelling Techniques ......................................................... 3-5
   3.4 Monte Carlo Simulation using HSPICE .............................................. 3-5
   3.5 MCRAND ............................................................................................ 3-5
4. Fault Injection ............................................................................................... 3-7
   4.1 Introduction .......................................................................................... 3-7
   4.2 ANAFINS ............................................................................................. 3-7
5. Repeated Simulation .................................................................................... 3-9
   5.1 Introduction .......................................................................................... 3-9
   5.2 ANAFAME ........................................................................................... 3-10
6. Post-processing Analysis ............................................................................ 3-10
   6.1 Introduction .......................................................................................... 3-10
   6.2 Test Equipment Modelling .................................................................. 3-11
   6.3 Process Tolerance ................................................................................ 3-11
   6.4 ANACOV ............................................................................................. 3-12
   6.5 Detection Algorithms .......................................................................... 3-13
      6.5.1 Fixed Mode ..................................................................................... 3-13
      6.5.2 Threshold Mode ............................................................................. 3-14
      6.5.3 Digital Mode .................................................................................. 3-15
      6.5.4 Data Mode ..................................................................................... 3-15
      6.5.5 Alldata Mode ................................................................................ 3-15
      6.5.6 Threshdata Mode ........................................................................ 3-16
   6.6 Measure Analysis ................................................................................... 3-17
   6.7 Fault Analysis and Detectability Measures ........................................ 3-17
7. Conclusions .................................................................................................. 3-18

Chapter 4 - Probabilistic Fault Simulation

1. Introduction .................................................................................................. 4-1
2. Probability of Detection Definition ............................................................ 4-3
3. Hypothesis Testing ...................................................................................... 4-4
4. Goodness of Fit Test ................................................................................... 4-6
   4.1 The Kolmogorov-Smirnov Goodness of Fit Test ............................... 4-6
5. Setting a Suitable Test Limit Function ....................................................... 4-7
6. Incorporation of Probabilistic Test Methods into ANACOV .................. 4-8
7. How Many Monte Carlo Runs are Sufficient? ....................................... 4-8
   7.1 Experimental Work .............................................................................. 4-9
   7.2 Discussion of Results .......................................................................... 4-12
References

List of Publications
Table of Figures

Chapter 2 - Literature Review and State of the Art

Figure 2-1 - Catastrophic Fault Model from [Milo89] .................. 2-11
Figure 2-2 - Catastrophic Fault Model from [Bell91][Camp92] .......... 2-12
Figure 2-3 - Catastrophic Fault Model from [Silv96] ..................... 2-12
Figure 2-4 - “Soft” Short Fault Model from [Sach95] .................... 2-16
Figure 2-5 - GOS Defect Locations ........................................ 2-17

Chapter 3 - The ANTICS Analogue Fault Simulation Software

Figure 3-1 - The ANTICS Software .......................................... 3-1
Figure 3-2 - MCRAND ......................................................... 3-6
Figure 3-3 - MCRAND Distribution types ................................. 3-7
Figure 3-4 - ANAFINS ......................................................... 3-8
Figure 3-5 - Example Fault Model Definition ............................. 3-8
Figure 3-6 - ANAFAME ......................................................... 3-10
Figure 3-7 - ANACOV ......................................................... 3-12
Figure 3-8 - Fixed Mode ...................................................... 3-14
Figure 3-9 - Threshold Mode .................................................. 3-14
Figure 3-10 - Alldata Mode ..................................................... 3-16

Chapter 4 - Probabilistic Fault Simulation

Figure 4-1 - Partial Overlap Considerations .............................. 4-2
Figure 4-2 - Histogram of Fault-free and Fault Distributions for 1 Sample Point of Dynamic Supply Current .......................... 4-3
Figure 4-3 - Probability of Detection Definition ........................ 4-4
Figure 4-4 - The KS Test Statistic .......................................... 4-7
Figure 4-5 - Test Input and Supply Current of Fault-Free Multiplier Circuit .................................................. 4-9
Figure 4-6 - Catastrophic MOSFET fault model: Rs=1Ω, Ro=100MΩ, Co=1fF .......... 4-10
Figure 4-7 - The Convergence of the Mean of Sample Point 1 from the Fault-free Circuit for IDDD Supply Current Monitoring ............................................. 4-10
Figure 4-8 - The Convergence of the Standard Deviation of Sample Point 1 from the Fault-free Circuit for IDDD Supply Current Monitoring .......................... 4-11
Figure 4-9 - Convergence of Probability of Detection for a Gate-Source Short Fault on XA61.M9 for IDDD Supply Current Monitoring at 3 Sample Points ....... 4-11

Chapter 5 - Techniques for the Reduction of Fault Simulation Time

Figure 5-1 - Open Loop Opamp: Input, Output Voltage and Supply Current ........ 5-4
Figure 5-2 - Closed-Loop Opamp: Input, Output Voltage and Supply Current .......... 5-5
Figure 5-3 - Example of a Fault Detectable in the Multiplier Supply Current .......... 5-5
Figure 5-4 - Percentage Misclassification of Sample Points ....................... 5-6
Figure 5-5 - Percentage Error in Average Distance Confidence Measure ............. 5-6
Figure 5-6 - Fault Classification and Percentage Probability of Detection Error .......... 5-12
Chapter 6 - Improved Test Metrics for Fault Simulation

Figure 6-1 - Possible Test Outcomes ................................................................. 6-3
Figure 6-2 - Fault Classification Results ...................................................... 6-6
Figure 6-3 - Unweighted Average Probability of Detection Results ............... 6-6
Figure 6-4 - Weighted Average Probability of Detection Results ................. 6-7
Figure 6-5 - Fault Model Resistance Cumulative Density Function ............... 6-10
Figure 6-6 - GSS fault on transistor XA57.XOP1.M25 ....................................... 6-11
Figure 6-7 - GDS fault on transistor XA57.XM9.M1 ........................................ 6-11

Chapter 7 - An Evaluation of Structural Test Techniques

Figure 7-1 - Initial Probability of Detection Classifications .......................... 7-3
Figure 7-2 - Weighted Average Probability of Detection Results .................. 7-3
Figure 7-3 - Absolute value circuit: a) input and output, b) supply current and c) strobe points ................................................................. 7-7
Figure 7-4 - Sample and Hold Circuit: a) Input and output, b) Supply current and c) Strobe points ................................................................. 7-7
Figure 7-5 - Initial Fault Classification Results ............................................. 7-8
Figure 7-6 - Unweighted Average Probability of Detection Results ............... 7-9
Figure 7-7 - Weighted Average Probability of Detection Results ................. 7-9

Appendix A - Circuit Schematics and Descriptions

Figure A-1 - Unbuffered CMOS Opamp .............................................................. A-1
Figure A-2 - Open Loop Configuration ............................................................ A-2
Figure A-3 - Closed Loop Configuration .......................................................... A-2
Figure A-4 - IREF1 Current Reference Cell ................................................... A-3
Figure A-5 - VREF1 Voltage Reference Cell ................................................... A-4
Figure A-6 - OPA1/OPA2 Opamp Subcircuits ................................................. A-5
Figure A-7 - IOPAD Input/Output Cell ............................................................ A-6
Figure A-8 - Top level Diagram of Multiplier ............................................... A-7
Figure A-9 - MULT1 Multiplier Cell ................................................................. A-8
Figure A-10 - Sample and Hold Circuit .......................................................... A-10
Figure A-11 - Top Level Absolute Value Circuit .......................................... A-11
Figure A-12 - ABSVAL Absolute Value Cell ............................................... A-12
Chapter 1
Mixed-Signal and Analogue IC Testing Overview

1. Introduction

Before the 1970s, electronic circuits were generally built using discrete components. With the advent of Integrated Circuits (ICs), digital ICs started to dominate, however, more recent advances in fabrication technology enabled analogue and digital circuits to be integrated on the same silicon. In recent years there has been a large increase in these mixed-signal circuits, mainly due to demands for higher levels of integration, for example for the telecommunications market. The current trend is for more complex devices producing high performance and reliability at lower unit costs, with the ultimate goal as entire systems fabricated on a single mixed-signal IC.

Economic factors are demanding higher quality levels for ICs, in particular there is much pressure to reduce the number of defective ICs shipped. Further to this, ICs are increasingly being used in safety critical systems. At the same time as higher quality levels are becoming increasingly important, the highly competitive nature of the semiconductor industry is also requiring a shorter time to market for ICs. Testing has been shown to play a crucial part in both of these issues. In particular, although the analogue proportion of a mixed-signal device may be small in comparison to the digital section, the analogue section has been shown to dominate the overall device test time. Analogue testing has become a bottleneck in the manufacture of ICs and accounts for a high proportion of the total manufacturing device cost.

2. The Problems in Testing Analogue ICs

ICs are tested at many stages of the design and production process. For example a design may be simulated to ensure that all specifications are satisfied and process control checks may be made during manufacturing. However, this work is exclusively concerned with the verification of final silicon after the device has been fabricated. Currently, this is achieved using electrical testing which is generally performed in two stages. Initial wafer probe testing is performed while the devices are still part of the wafer on which they were formed. The aim of this testing stage is to perform relatively simple tests on devices to eliminate as many faulty devices as possible prior to packaging. Those devices which pass a wafer probe stage are then packaged and subjected to a final electrical specification test.

Currently analogue and analogue portions of mixed-signal ICs are tested in a different manner to digital and digital portions of mixed-signal ICs. Testing in the digital domain generally takes a structural approach, that is, tests are used to detect manufacturing defects directly rather than the functional error produced. Research into digital test has produced well-established fault models which are accepted as producing meaningful test quality measures using efficient fault simulation techniques. This has led to design and
test automation providing established automatic test pattern generation (ATPG) methodologies, and design for test (DFT) techniques.

The above is not true of the analogue test domain, due mainly to the continuous nature of analogue devices. Analogue and analogue parts of mixed-signal ICs are generally tested against a functional specification which leads to many problems. Firstly, the complex nature of analogue circuits means that they must often be tested over a wide range of input magnitudes, frequencies and different functions. Secondly, a suitable set of specifications must be generated from the set of all possible specifications for a device. Test time may be reduced by minimising the specification set, but one consequence may be reduced device quality. Specification testing for analogue devices therefore requires very complex and diverse measurements which requires a lengthy test time. Another consequence of specification testing is that analogue testers must perform a number of different functions to a high degree of accuracy over a wide range of signal conditions. This makes analogue test equipment very complicated and hence expensive.

One particular problem in IC testing is that although the size of ICs has increased in terms of the number of transistors, the number of pins has not increased by a corresponding amount. In the digital test field, DFT and Built-In Self Test (BIST) techniques have been established, such as scan-based techniques, which aid testability by improving observability and controllability. These structured techniques and tools have been well accepted as design methodologies. Analogue and mixed-signal designs often contain embedded analogue macros which makes the propagation of a test signal and test response to and from the macro difficult. Although some analogue DFT and BIST techniques have been developed, they are more ad hoc and applicable to only certain classes of circuits. For example a DFT technique for testing active analogue filters has been presented [Soma90], but is of limited applicability. Other problems are encountered with techniques used to improve testability of analogue circuits, in particular parasitics may be introduced which lower device performance. Including DFT and BIST schemes generally requires additional input and output pins which may not be available.

3. Conclusion

In this chapter some of the problems in testing analogue and mixed-signal ICs have been discussed. Testing these devices is a crucial part of the manufacturing process in terms of quality and time-to market. Although structured methodologies and automation have been applied to digital testing, approaches to analogue testing have been more ad-hoc due to the diverse and continuous nature of analogue circuits. Testing analogue and analogue parts of mixed signal ICs is also much more expensive than testing their digital counterparts due to the higher cost of the testers and the increased test time.

These issues have prompted much research interest into testing analogue and mixed-signal ICs. In particular the use of structural test techniques for analogue circuits has been investigated. Structural testing of analogue circuits is inherently tied to issues such as defect analysis, fault modelling and fault simulation. These topics are examined in the literature review in the next chapter.
Chapter 2
Literature Review and State of the Art

1. Introduction

In this chapter a literature review is presented on issues pertaining to structural testing of analogue and mixed-signal ICs. Section 2 reviews literature on structural or “fault-driven” test techniques investigated for analogue and mixed-signal IC testing. In particular, several test techniques use approaches based on monitoring the supply current through a device.

In order to evaluate structural test techniques via circuit simulation, one requirement is the modelling of faulty devices so that the effect of a fault can be examined and compared to the fault-free case to determine if it will be detected. There has been much research interest in the modelling of faults for simulation of faulty devices, which is reviewed in section 3. A review of techniques used for the analysis of IC defects and fault models used by the authors to measure test quality is also presented.

Section 4 provides a literature review of Inductive Fault Analysis (IFA) which is a means of generating a list of defects which are likely to occur from a device layout. Data from IFA on the relative occurrence of different fault classes have been published and are described here.

In order to evaluate structural test techniques via simulation, an approach to fault simulation is required. Literature on fault simulation of analogue circuits is reviewed in section 5 and several fault simulation approaches are described.

A summary of the literature review is presented in section 6 with the main points which are particularly relevant. These points are used in section 7 to justify the aims of the work presented in this thesis.

2. Structural Testing of Analogue ICs

2.1 Introduction

The high cost of testing analogue and analogue parts of mixed signal ICs due to the problems discussed in chapter 1 has led to interest in alternative approaches to the analogue test problem, and in particular structural testing of analogue circuits. A structural or fault-driven test of an IC differs from a functional test in that it aims to detect manufacturing defects directly rather than the specification error that is produced. By utilising tests which are outside those normally used for functional testing, structural test techniques may provide a more efficient test set with simpler and more easily applied tests. It is particularly advantageous to detect faults at the wafer-probe stage, to avoid the need for packaging and expensive specification tests on faulty devices.
Considering specification-based tests using structural test evaluation methodologies has also been shown to be cost-effective. In [Mil94] and [Chao92] a fault-driven evaluation of analogue testing has been shown to reduce specification test sets, by ordering tests according to their ability to detect faults, and hence reduce average device testing time.

A comprehensive review of structural analogue IC test methodologies is given in the next section. In particular, much interest has been shown in structural tests based on the supply current. Several structural DFT and BIST schemes have also been described.

### 2.2 A Review of Fault-Based Analogue Circuit Testing

#### 2.2.1 Supply Current Monitoring Techniques

##### 2.2.1.1 Introduction

Quiescent Supply Current Monitoring (SCM) has proved to be a useful test technique in digital CMOS processes. The main reason for its introduction over the standard output voltage single-stuck-at (s-s-a) fault testing was its ability to detect a greater class of faults including delay faults, gate oxide short (GOS) faults and high impedance bridging faults, which do not manifest themselves as s-s-a faults [Bake90]. Results are presented in [Hawk86] which show that quiescent SCM (IDDQ) testing detected devices with GOS faults which initially passed a functional test but failed after temperature and voltage stresses. Therefore IDDQ testing can be used as a reliability indicator to detect devices likely to fail. SCM also has the advantage that there is no need to propagate the effect of a fault to the output of a device since the power supply is a primary output. The success of IDDQ monitoring in the digital domain has led to research interest into supply current monitoring for analogue and mixed-signal ICs.

##### 2.2.1.2 Supply Current Monitoring of Analogue ICs

Supply current monitoring work is presented in [Bell91] [Camp92] [Ecke93a] [Ecke93b] for CMOS analogue and mixed-signal ICs. Initial work [Bell91] [Camp92] [Ecke93a] focused on simulation of simple static DC tests for CMOS macros (e.g. comparators, opamps) considering catastrophic defects. Examination of the relative changes in supply current between faulty and fault-free devices revealed that only a small percentage of faults exhibited order of magnitude changes. This prompted the use of dynamic supply current testing (IDDQ testing). High fault coverage results are presented in [Ecke93a] for a 3-stage band pass filter and 2-Bit analogue to digital converter (ADC) using a Pseudo-Random Binary Signal (PRBS) and complementary signal sets as input stimuli, chosen due to their rich frequency content. Later work included other fault models, in particular the gate oxide short (GOS) fault model (see section 3.4). In [Ecke93b] results are presented for the IDDQ testing of a 2-bit flash ADC showing higher fault coverage for the supply current monitoring than output voltage monitoring for all cases of GOS faults.

Other dynamic supply current simulation work is also described in the literature. In [Brac92], dynamic supply current monitoring is applied to a tightly coupled mixed-
signal ASIC. The effect of MOS transistor catastrophic device faults in response to fixed amplitude pulses with variable temporal parameters was investigated. An example of the fault effects is given but there are no post-processing calculations and hence no figures of fault coverage.

[Harv93] presents an investigation into static SCM and output voltage measurement for Sallen Key highpass and biquad filters. The investigation used catastrophic device faults with voltage and current detection tolerances of 100mV and 0.1mA to allow for process deviations and component tolerances. The results show that static SCM gave a lower fault coverage than the output voltage measurements for the given thresholds. Higher fault coverage was obtained by monitoring an additional internal voltage node as well as the output voltage which was in every case higher than measuring the output voltage and the supply current. The paper concludes that although IDDQ testing may not produce a high fault coverage for catastrophic faults it shows more sensitivity to oxide integrity which was also investigated, consistent with [Hawk86] for digital circuits.

RMS supply current monitoring work is described in [Supa93], where sinusoidal input signals are applied to a CMOS band pass filter and 2-bit ADC and the RMS of the supply current monitored. In this example, only open and short faults in passive components are considered. An arbitrary threshold value of RMS supply current was chosen (with a ±10% tolerance window) and a binary decision taken at each point which was then compared with the fault-free case. All faults considered were detectable using three input frequencies for the band pass filter and for the 2-bit ADC, 2 amplitudes were required to detect all faults. The work is extended in [ZwoI96b] where results show high fault coverage for a 3-bit ADC. The validity of using a 10% tolerance window was investigated using Monte Carlo simulation of process parameters and found to be of a satisfactory magnitude.

In [Miur94] a structural test of a CMOS comparator is presented. Various tests were performed including a static SCM test, static output voltage test and a delay test using a threshold of 10% around the fault-free circuit. The investigation used MOS transistor catastrophic faults. It was concluded that IDDQ testing compared favourably to the other tests, providing 94.5% fault coverage. It was also noted that it was necessary to use both upper and lower limits for the measured variables to define test limits.

In [Papa94] [Papa94a], a comparison of fault coverage obtained using output voltage and supply current is described for a bipolar opamp simulated as a DC comparator, linear amplifier and a multivibrator. Both hard (short and open) and soft (transistor beta) device faults were considered. The increased variability of the supply current under process parameter deviations was considered by using a 20% threshold on the supply current and only a 10% threshold on the output voltage. The papers state that using DC tests (forward and reverse saturation) on the comparator, SCM was 20-40% more effective than output voltage monitoring. Using an AC signal as an input to the multivibrator, RMS SCM was 10-20% more effective.

2.2.1.3 Built-In Current Sensors for Supply Current Monitoring

Practically, for a mixed-signal circuit, current changes in analogue portions of a device are masked by digital noise and high currents in I/O pad drivers. The use of Built-In
Current (BIC) sensors as a DFT technique has been proposed to overcome these problems. In [Ecke93a] [Ecke93b], two BIC sensors are proposed based on MOS and bipolar transistor current mirrors. The application of these current mirrors to an ADC circuit and the resulting functional effect is presented in [Ecke93b]. A similar approach for a self-testing CMOS opamp is presented in [Roca92]. The circuit is designed such that a small supply current variation will produce a larger voltage variation at a circuit node. However, the effect of process parameter deviation is not discussed and this may be a drawback to this approach, since process variations may also produce large variations at the monitored circuit node and mask fault detection. [Argu94] describes a built-in dynamic current sensor which provides a 1-bit digital signature stream. The design may be used for analogue and digital blocks in the same circuits using different interfaces since CMOS analogue circuitry has a higher quiescent current. In [Miur95] [Miur96] a BIC sensor based technique that measures the integral of the supply current during a clock period is described. Upper and lower threshold values are set as part of the sensors to produce a pass-fail decision. A BIC monitor for fully balanced analogue circuits based on current conveyors is proposed in [Sidi96].

2.2.1.4 Other Test Techniques based on Supply Current Monitoring

Other techniques based on supply current monitoring have been described in the literature. [Beas93] presents a technique whereby the positive and negative supply rails are simultaneously pulsed to their mid-voltage point and the dynamic supply current measured. Temporal and spectral analysis is performed on the current measurements. The technique is applicable to analogue and digital CMOS circuits and two examples are given; a full adder and a folded cascode opamp. Results are presented for a selection of the total number of faults (opens, shorts and GOS faults) but process tolerance effects are not considered.

[Silv95] describes a novel test technique based on the cross-correlation between the output voltage and the supply current. Results for catastrophic and parametric faults are presented for a Sallen-key filter, observing both supply current (IDDD) and output voltage (VOUT) and the cross correlation of IDDD and VOUT. One advantage of using a cross correlation is that it allows both signals to be processed simultaneously rather than separately. Fault coverage was found to be 75% considering IDDD, 65% for VOUT, 90% for both and 93.5% for the cross correlation. Tolerance bounds were established using Monte Carlo simulations with 5% deviations in passive components and MOS transistor SPICE model parameters VTO, KP, TOX, XJ and RSH. The work is extended in [Silv96] to include a phase locked loop circuit and an investigation of different input stimuli. In [Silv96a] the use of polarised cross correlation where the signal is quantized to either a 0 or 1 is shown to detect a high proportion of catastrophic faults. The implementation of the cross-correlation technique within the IEEE P1149.4 mixed-signal test bus framework is described in [Silv97].

[Pova95] presents a comparison of output voltage and supply current using both temporal and frequency domain analysis obtained using the Fast Fourier Transform. Monte Carlo simulation with 10-20% deviations in model parameters was used to generate a tolerance bound for the fault-free circuit. Results are given for a unity gain configuration of a CMOS opamp; SCM in the frequency domain produced the highest detection (98% fault coverage of catastrophic faults). Although frequency analysis was
shown to increase the fault coverage of GOS faults with supply current monitoring, it was not found to be as high as output voltage measurements.

The concept of high observability of the supply current is used in [Binn94] [Binn95] for transient response testing of embedded analogue macros. A DFT technique is described whereby a low impedance load is switched to the output of the macro under test, which has the effect of amplifying the output voltage of the macro through the supply current. Similarly the high observability in the supply current is used in [Robs96] for a test technique based on using the Wiener Hopf equation to generate the impulse response of an analogue system obtained from the supply current. The test circuit described is a 3 opamp Tow Thomas biquad circuit. By monitoring the supply current through each opamp and applying arithmetical operations with switched current test circuitry the impulse response is obtained. Two points on the impulse response were found to be sufficient to detect faulty circuits and these are used as the inputs to a window comparator with upper and lower threshold limits.

2.2.1.5 Practical Application of Supply Current Monitoring

The practical application of $I_{\text{DD}}$ monitoring to a mixed-signal ASIC is described in [TayI93] using dynamic SCM equipment. It was concluded that $I_{\text{DD}}$ testing detected some faults undetectable by voltage testing, reduced test time and increased fault coverage. Furthermore, it was found that certain devices failed a dynamic SCM test but passed an initial functional test. However after burn-in testing, these devices were found to fail, showing that $I_{\text{DD}}$ testing could be used to improve reliability by reducing early lifetime failures. This is consistent with work presented in [Hawk86] for digital circuits with GOS faults.

2.2.1.6 Supply Current Monitoring-Based Fault Diagnosis Techniques

SCM has also been applied to the analogue fault diagnosis problem, which can be considered a superset of pass-fail testing. In [Yu94] an Artificial Neural Network (ANN) approach is used to diagnose 100% of GOS faults in a CMOS opamp using $I_{\text{DD}}$ monitoring. A fault diagnosis technique is also presented in [Somay94] based on power supply ramping and the use of dynamic supply current as an input to an ANN.

2.2.2 Other Structural Test Techniques

Other structural test techniques have also been evaluated and are described in the literature. Early work using a DC voltage monitoring fault dictionary approach to testing is described in [Quat90], based on earlier fault diagnosis work presented in [Hoch79] and [Band85]. The aim of the work in [Quat90] is to reduce the number of test nodes required for high fault coverage since early work on diagnosis often assumed high accessibility to internal nodes.

Time domain testing of analogue and mixed-signal circuits has been investigated and although more complicated, has produced higher detection rates than simple DC testing. In particular, time domain techniques using digital inputs have the advantage that they may easily be produced either by an existing digital tester or a digital BIST scheme. The approach in [Cors93] uses complementary signals based on the circuit frequency
response as inputs for testing linear circuits. In the fault-free case, the circuit is driven through non-trivial states to zero in its final state. An example of the application of this technique to an opamp with feedback is given, with the output response sampled at one sample point, showing high fault coverage for parametric and catastrophic faults.

[A'ain94] describes another transient response technique with an AC input and output voltage measurement. A ramped power supply voltage was used to increase catastrophic fault coverage from 91% to 100% for a CMOS opamp circuit. Transient response testing using a digital pulse train is described in [Evan91], however low fault coverage is reported.

2.2.3 Structural DFT and BIST Techniques

Several structural test BIST schemes have also been described. A time domain monitoring technique using a PRBS input along with the Weiner Hopf equation to obtain an estimate of the unit impulse response has been investigated and a full PRBS-based BIST scheme developed [Lear93] [Russ93] [Russ94]. Later work centered on reducing the number of correlation samples required for high fault coverage to reduce the BIST scheme overhead [Robs95]. The pass/fail decision is implemented using a window comparator, generating a digital test output. High fault coverage was obtained with this scheme for an opamp as the circuit under test.

In [Ohle91], Hybrid BIST (HBIST), is proposed by Ohletz for mixed-mode circuits with complex digital kernels and analogue peripheral subcircuits as input and output interfaces. The design is compatible with digital scan design and uses Built-In Logic-Block Observers (BILBOS) from the digital test scheme as part of the analogue test approach. During the analogue test, the digital BILBOS are reconfigured as linear feedback shift registers for signal generators, to generate piecewise-constant stimuli with variable amplitudes, and for response capture. High hard fault coverage results (>95%) were obtained for a bipolar and CMOS opamp. In order to interface the analogue test circuitry to the digital BILBOS, ADCs and digital to analogue converters (DACs) are required which thus limits the applicability of this technique to circuits where these functions are already present. An improvement to the original HBIST technique is presented in [Damm95] which uses modulo-2 addition for response compaction rather than the standard BILBO approach. It is shown that this improves fault coverage for non-linear circuits under process parameter and temperature variations. One limitation of the HBIST method is overcome in [Damm96] where an improvement is made by using rectangular multifrequency test stimuli, which does not require a DAC and is thus applicable to more classes of circuits.

The BIST of analogue circuits produces the problem of on-chip test pattern generation, since on-chip generation of complicated test stimuli can require a prohibitive area overhead. Several BIST approaches have been considered which use simple test stimuli, in particular DC signals. In [Dufa96] a DC test approach is presented whereby circuit nodes are monitored using built-in voltage sensors. The voltage sensors have a programmable range of acceptability to provide a variable pass-fail window thus allowing for process parameter deviations. Initially this test approach is applied to an opamp, but this is later extended to a biquad filter which is reconfigured in test mode. High catastrophic fault coverage for both circuits was obtained.
As an alternative to on-chip test signal generation circuitry, the oscillation-test methodology has been proposed [Arab96]. Circuit poles of functional building blocks are relocated using additional circuitry in order to destabilise the circuit and produce oscillation. An example is given for single opamps, which is then extended to circuits with more than one opamp by forming them into an oscillating chain in test mode. The frequency of oscillation is used for testing and high fault coverage results were obtained considering a tolerance band using Monte Carlo simulation of process parameters. Structural testing of a sigma-delta modulator using oscillation test is presented in [Arab97].

Fault-based concurrent test schemes have also been proposed. In [Russ94] residual multi-frequency testing is proposed, whereby 2 frequencies just outside the bandwidth of a circuit are applied during operation. The test signals are extracted by filters on the circuit output and monitored. In [Wrig93] a fault-based BIST scheme is described for switched current circuits.

3. Fault Modelling for Analogue IC Testing

3.1 Introduction

In order to investigate the fault detection properties of structural test techniques using circuit simulation, simulation fault models of physical defects are required. Currently, there is no standard fault model or fault modeling approach used by researchers for fault simulation of analogue and mixed-signal ICs. This has led to the use of a diverse range of simulation fault models with different parameters and component values. One of the main problems in analogue and mixed signal testing is that of mapping physical defects onto a circuit level description, hence this has been the subject of research interest.

The structure of this section is as follows. Firstly failure mechanisms for analogue ICs and the defect analysis used to obtain data on IC failures are discussed. Next, defect analysis information is used as a basis of an explanation for the structure and the limitations of fault models which have been proposed and used by several authors. This is done firstly for bridging and break defects and then for defects occurring in the oxide layer. Finally, hierarchical fault modelling literature is examined.

3.2 IC Failure Mechanisms and Defect Analysis

There are two main mechanisms that cause devices on a wafer to fail: global, systematic defects and local, random defects. The relative number of devices failing due to these effects is not well defined and may depend on the process used.

Global defects are caused by systematic errors in manufacture, that is they affect a large area of a wafer. They can be caused by global effects such as dopant densities, mask misalignment and oxide gradients. The systematic nature of these defects means that they are easier to detect using Process Evaluation Monitors (PEMs). These are generally fabricated on each wafer along with the product ICs and will thus be affected by global
defects. Hence these defects can be detected using relatively simple measurements on
the PEMs, avoiding device testing.

Local or “spot” defects occur almost randomly across an IC (although some clustering
of local defects has been observed e.g. see [Brul91]). The cause of these defects is
generally contamination of some kind, for example dust particles. Their random nature
means that every device must be tested for the presence of these defects. Currently the
main method for this type of defect detection is electrical testing. From production
experience it has been observed that the “back end” processing steps (those involving
gate-oxide and polysilicon/metal interconnection) are affected the most by local defects
[Brul95].

The detection of local defects during electrical testing is considered more important due
to their random nature rather than global defects which affect a large number of chips on
a wafer. For this reason, the majority of researchers have chosen to use fault models of
local defects for structural test evaluation, however global parametric fault effects have
been studied [Miln97].

In order to allow the construction of simulation fault models, information on the
properties and occurrence of defects is required. The method known as defect analysis is
used to obtain information on IC defects occurring during production. More specifically,
defect analysis produces information on the type, size, shape and density (defects per
area) of spot defects, providing a geometric model. Data is obtained either from analysis
of previously fabricated devices or from analysis of special test structures fabricated on
a wafer, and provides information for fault modelling and process statistics for IFA.

3.3 Bridging and Break Faults

3.3.1 Defect Analysis of Interconnect Faults

Early literature on defect analysis is presented based on observations of faults occurring
in production devices. Galley [Gali80] examined failure modes occurring in a 4-bit
microprocessor (digital CMOS process). The results shown in table 2-1 are presented,
showing the majority of faults to be shorts or opens in the metal and diffusion layers, but
no shorts between metal and diffusion. This is reinforced with a consideration of the
manufacturing process which states these faults as being the most likely.

<table>
<thead>
<tr>
<th>Defect type</th>
<th>Percentage of failures</th>
</tr>
</thead>
<tbody>
<tr>
<td>metalization short</td>
<td>39%</td>
</tr>
<tr>
<td>metalization open</td>
<td>14%</td>
</tr>
<tr>
<td>diffusion short</td>
<td>14%</td>
</tr>
<tr>
<td>diffusion open</td>
<td>6%</td>
</tr>
<tr>
<td>metalization/substrate short</td>
<td>2%</td>
</tr>
<tr>
<td>unobservable</td>
<td>10%</td>
</tr>
<tr>
<td>insignificant</td>
<td>15%</td>
</tr>
</tbody>
</table>

Table 2-1 - Defect Observations from [Gail80]
A more detailed analysis is presented in [Bane82] which again gives an indication of failure modes occurring in a CMOS process based on interconnection defects. The table of results presented is shown in table 2-2.

<table>
<thead>
<tr>
<th>Class</th>
<th>Device Failures</th>
<th>Interconnect Failures</th>
</tr>
</thead>
<tbody>
<tr>
<td>I. Most likely</td>
<td>Gate to drain short</td>
<td>Short between diffusion lines</td>
</tr>
<tr>
<td></td>
<td>Gate to source short</td>
<td></td>
</tr>
<tr>
<td>II Less likely</td>
<td>Drain contact open</td>
<td>Aluminium poly cross-over broken</td>
</tr>
<tr>
<td></td>
<td>Source contact open</td>
<td></td>
</tr>
<tr>
<td>III Least likely</td>
<td>Gate to substrate short</td>
<td>Short between Aluminium lines</td>
</tr>
<tr>
<td></td>
<td>Floating gate</td>
<td></td>
</tr>
</tbody>
</table>

Table 2-2 - Failure Modes from [Bane82]

Later defect analysis work has used fabricated test structures to obtain data on spot defects. In general, a spot defect can either cause a short-circuit or an open-circuit depending on whether the affected material is insulating or conducting and is extra or missing. In order to detect these type of defects, two structures are often used: long winding structures ("strings") for opens and comb-comb structures for shorts [Stap84]. These have since been combined to form the “comb-string-comb” structure [BruI91] [Rodr92] [Rodr96] [BruI95]. This has been improved in [Hess94] for the localisation of defects for visual geometrical analysis. Multi-layer defect monitoring is also possible in order to study defects in isolation layers.

Bruls et al. used defect analysis for investigation into resistive bridging defects in the metall layer. The defect monitoring system which they use is described in [BruI91]. Initial work is presented in [Rodr92], which is later extended in [Rodr96]. They fabricated 14 different test wafers in different batches and production processes. Each wafer comprised 400 monitors each of which contained 3 modules (comb-string-comb structures) corresponding to three out of four different design rules studied. The resistance of 400 bridging faults was measured producing the results shown in table 2-3. Due to a large measurement uncertainty, results are specified as upper and lower bounds.

<table>
<thead>
<tr>
<th>Guaranteed Range</th>
<th>Total number of Bridges</th>
<th>Guaranteed Range</th>
<th>Total number of Bridges</th>
</tr>
</thead>
<tbody>
<tr>
<td>( R_b \leq 0.5, \text{K}\Omega )</td>
<td>258 (64.5%)</td>
<td>( R_b \geq 0.5, \text{K}\Omega )</td>
<td>14 (3.5%)</td>
</tr>
<tr>
<td>( R_b \leq 1, \text{K}\Omega )</td>
<td>379 (94.8%)</td>
<td>( R_b \geq 1, \text{K}\Omega )</td>
<td>12 (3.0%)</td>
</tr>
<tr>
<td>( R_b \leq 5, \text{K}\Omega )</td>
<td>394 (98.5%)</td>
<td>( R_b \geq 5, \text{K}\Omega )</td>
<td>4 (1.0%)</td>
</tr>
<tr>
<td>( R_b \leq 10, \text{K}\Omega )</td>
<td>397 (99.3%)</td>
<td>( R_b \geq 10, \text{K}\Omega )</td>
<td>2 (0.5%)</td>
</tr>
<tr>
<td>( R_b \leq 20, \text{K}\Omega )</td>
<td>400 (100%)</td>
<td>( R_b \geq 20, \text{K}\Omega )</td>
<td>0 (0%)</td>
</tr>
</tbody>
</table>

Table 2-3 - Bridging Resistance Range from [Rodr96]
The results show a majority of bridging faults with resistance less than 500Ω although there are some faults with resistance definitely greater than this value, up to 20KΩ. The high measurement inaccuracies means that it is impossible to obtain exact information on the distribution of the resistances. In particular, more information is required for those faults with resistance less than 500Ω. It should be noted that this analysis is only with respect to bridges in the metall layer.

In order to reveal the cause of the high resistance defects, the high resistance bridges were analysed spectroscopically and visually [Rodr92] [Rodr96]. The spectroscopic analysis revealed that high and low resistive bridges were of the same material (aluminium) and thus the resistivity of material can be eliminated as a source of the defect resistance. Visual inspection revealed that the most likely explanation was the geometry of the defects (e.g. a poor contact).

3.3.2 Analysis of Defect Analysis Results

Work such as that presented in [Rodr92] [Rodr96] has shown that accurate simulation fault modeling for bridges in analogue circuits is complicated by the occurrence of high impedance defects which would be incorrectly modelled with a low or zero resistance fault model - the intuitive fault model for shorts causing complete bridging. This effect has led to the definition of hard and soft faults for spot defects in [BruI95] [BruI93] [Sach95]. Both definitions given below are valid for planar as well as lateral defects.

**Hard faults** are defects which either cause the connection of two conducting structures with a low resistance contact or cause the structure to be split into separate parts.

**Soft faults** have a more subtle influence on the structure of the circuit. Extra material soft faults do not form a bridge between two conducting structures but cause the separation distance to be narrowed. Missing material soft defects cause the narrowing of a conductor without actually breaking it. Soft faults cause less predictable electrical effects which may only come to light under certain conditions e.g. temperature, supply voltage variations. Thus a functional production test may not detect these soft faults. Soft faults may also degrade with time.

To enable soft defects to be studied in terms of geometric defect models, a definition is presented in [BruI93] [BruI95] which states that for an extra material defect a soft fault narrows the distance between two conductors to a distance with a maximum of \( d_{\text{max}} \). In practice when the gap between two conductors is narrowed sufficiently it will not be possible to etch away all of the conducting material in the gap which could cause a high resistive contact. An example is given in [BruI93] which suggests this is a cause of the resistive shorts found in [Rodr92] [Rodr96].

In order to characterise the relationship between the layout of ICs and defects occurring within the production process, geometric models have been defined [BruI93] [BruI95] [Stap84]. Using this analysis, expressions are derived in [BruI93] [BruI95] for \( N_{HF} \) and \( N_{SF} \) (the number of hard faults and the number of soft faults respectively) in terms of \( d_{\text{max}} \) for the comb-string-comb defect monitors used. Analysis of the derived expressions shows that even for a small \( d_{\text{max}} \) distance, soft faults are at least present in the same
orders of magnitude as hard faults. As the distance $d_{\text{max}}$ increases, $N_{\text{SF}}$ increases past $N_{\text{HF}}$ becoming the more dominant defect. [BruI93] states that the number of devices with a single soft fault can be from 1-4% assuming a 90% yield. This figure increases as the yield decreases, indicating possible reliability problems. Currently the geometric model analysis described in this section is only theoretical and still requires experimental investigation.

3.3.3 A Review of Simulation Fault Models Used for Fault Simulation Based on Interconnect Defects

3.3.3.1 Fault Modelling Based on the Circuit Level

Many implementations of the catastrophic fault model used to model short and open faults in CMOS ICs are described in the literature. Early work by Milor in [Milo89] uses the fault model shown in figure 2-1, based on failure modes reported in [Bane82] for DC testing of an opamp and lowpass filter. Switches $S_{S1}$ and $S_{S2}$ are normally open but close to simulate a defect and $S_{O1}$ and $S_{O2}$ are normally closed but are opened to represent a defect. Shorts are modelled using a $1\Omega$ resistor, however opens are modelled directly using switches $S_{O1}$, $S_{O2}$. This will produce a floating node if it is implemented literally and cause conventional circuit simulators such as HSPICE to fail (or to connect the floating node to ground via a large resistor).

![Figure 2-1 - Catastrophic Fault Model from [Milo89]](image)

Other work uses a similar fault model (figure 2-2) to investigate static supply current monitoring (for an opamp in [Bell91] and several analogue cells in [Camp92]). However, the open fault model uses a high impedance $R_o=100\,\text{M}\Omega$, since the HSPICE circuit simulator is used.
In [Ohle91], a similar fault model is used but it includes an open gate and drain-source short faults. Shorts are modeled using wires and opens as breaks in connectivity. Similar to [Ohle91], [Harv93] uses wired shorts for gate-source, gate-drain and source-drain shorts. Drain and source open circuit faults are not modelled directly since the paper states that these have the same effect as gate-source shorts since all three will cause the drain-source current to tend to zero. Although this reduces the fault list for fault simulation, it is not applicable in the general case. The paper also includes passive component shorts modelled using wires, floating resistances and diodes by replacing them with high resistances ($10^{14}\Omega$), and floating capacitances modelled as a high resistance to ground.
Whilst the fault models in figs. 2-1 and 2-2 are suitable for DC measurement, they do not include dynamic fault effects required for transient simulation. The physical defect corresponding to an open fault is a break between conductors. However, if the break is small (of the order of the process line widths), capacitive coupling will occur. This is included as part of the open circuit fault model used in [Silv96] for dynamic supply current monitoring. In the model, shown in figure 2-3, $R_s=1\Omega$, $R_o=100\text{M}\Omega$ and $C_o=1\text{fF}$. Shorts on passive components (using $R_s$ in series) are also considered.

The fault models discussed so far have been attributed to a component element and could be described as being local to that element. One advantage of using elemental-based fault models is that implementation by modification of a netlist is quite straightforward. However, considering a circuit netlist, not all possible defects (bridges and breaks) will be modelled by these faults. That is, there exists a super-set of global shorts and opens which contains the set of all elemental short and open faults. Non-elemental faults which are in the set of global shorts include shorts between nodes connected to different components and certain cases of split nodes. The main drawback to considering global shorts and opens is that the number of faults increases rapidly with the size of the circuit. Whilst global shorts may represent a realistic set of defects for a small analogue cell, for a large design a complete set of global faults will be an unrealistic model of possible physical failures.

Local and global catastrophic failures are defined in [Sebe95] and modelled using the source model (0V sources between nodes) or the resistive model with short resistance $R_s=0.01\mu\Omega$ and open resistance $R_o=100\text{M}\Omega$. However, the effect of altering the $R_s$ value (from $0.01\mu\Omega$ to $1\text{K}\Omega$) for one fault considered in a voltage controlled oscillator was investigated and found to have a large effect on the functional output.

A global short fault model is used in [ZwoI96] as part of a hierarchical fault modeling scheme for supply current monitoring of an opamp. Bridges are modeled using a $1\Omega$ resistive short fault model, however, transistor opens are modelled by setting the faulty transistor threshold voltage $V_T=100\text{V}$, thus ensuring that it never conducts. This fault model is used to represent gate, source and drain opens under the hypothesis that the effect will be no source and drain currents. Although collapsing all open faults into one fault class is appealing in terms of simulation time reduction, the accuracy and applicability of this fault model have yet to be verified. In particular, it does not model additional capacitive coupling effects that may exist in open faults.

The previously described fault models use only one set of parameters to represent a defect. However, as shown in sections 3.3.1 and 3.3.2, a defect can be caused by a range of failures of different sizes, shapes and locations and although they may be considered as being the same fault, using the same parameters in the circuit level simulation fault model is not necessarily justified. This has led some authors to use more than one simulation fault model to represent a range of possible defect parameter values. For example a range of resistance values could be used as different simulation fault models to model a short fault. The main disadvantage to this approach is that a new set of fault simulations is required for every additional parameter used, which will increase fault simulation time. Defect analysis is also required to give the correct spectra of parameters that occur in practice, otherwise the fault coverage results may be unrealistic.
Miura uses the catastrophic fault model with ranges of component values in an investigation of supply current testing in [Miur94] and [Miur96]. To take into account different values of bridging resistance, fault model parameters are used based on the resistance of the fault-free channel resistance (table 2-4).

<table>
<thead>
<tr>
<th>Fault Model</th>
<th>Paper [Miur94]</th>
<th>Paper [Miur96]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Short</td>
<td>$R_{short} = 10\Omega$</td>
<td>$R_{short} = 10\Omega$</td>
</tr>
<tr>
<td></td>
<td>$R_{short} = 10K\Omega$</td>
<td>$R_{short} = 100\Omega$</td>
</tr>
<tr>
<td></td>
<td>$R_{short} = 10M\Omega$</td>
<td>$R_{short} = 1K\Omega$</td>
</tr>
<tr>
<td></td>
<td>$R_{short} = 10K\Omega$</td>
<td>$R_{short} = 10K\Omega$</td>
</tr>
<tr>
<td>Open</td>
<td>$R_{open} = 10M\Omega$</td>
<td>$R_{open} = 100K\Omega$</td>
</tr>
<tr>
<td></td>
<td>$R_{open} = 10G\Omega$</td>
<td>$R_{open} = 10M\Omega$</td>
</tr>
</tbody>
</table>

Table 2-4 - Fault Models used in [Miur94][Miur96]

3.3.3.2 Gate Open Fault Modelling

The MOS transistor open gate fault is particularly hard to model since the gate of a transistor with a break defect will essentially be disconnected and floating. One approach to modelling "floating gate" faults would be to use a high impedance fault model. However, since the gate of a MOS transistor already has a very high impedance, this does not accurately model the fault effect. The fault effect is further complicated by capacitive coupling between the gate and substrate and other metal lines. Several approaches to floating gate fault modelling are defined in the literature.

Floating gate modelling with capacitive coupling is presented in [Rodr91] using the capacitance of the polysilicon gate to both the bulk and overlapped metal tracks. The values depend on the polysilicon area from the transistor gate to the break and the area of metal overlap, hence the model is layout and defect location dependent. In [Harv93] the gate open fault is modeled by connecting the gate via a high valued resistance path to ground. In [Miur94] [Miur95] a physical break is used and the initial gate voltage is set at VDD, VDD/2 and 0V, producing 3 fault models. The same initial gate voltages are used in [Muir96] but a 100K\Omega resistor and 0.001pF capacitor are used to model the break. In [Caun95] a 1G\Omega resistor is connected to the gate and either the positive or negative supply rail.

3.3.3.3 Fault Modelling using IFA Results

The previously discussed fault models are derived from netlist schematic considerations and hence a simple fault model is used to represent the defects occurring. Several authors have used an IFA-based test approach (see section 4) to derive a set of realistic defects which gives additional information about the defect type, for example the type of conducting material between which there is a short fault. The IFA process produces a set of realistic defects that must be mapped onto a set of circuit level simulation fault models and hence is possible to represent the different defects using separate simulation fault models since the exact defect is defined as part of the IFA output.
The IFA approach is used in [Harv94] to investigate a phase locked loop circuit. To account for the range in possible defect values, upper and lower bounds are used on the resistances as part of the simulation fault model. The parameter values used for shorts are shown in table 2-5 below; open faults are ignored since they occurred sparsely compared to shorts.

<table>
<thead>
<tr>
<th>Defect Type</th>
<th>Lower Resistance Bound</th>
<th>Upper Resistance Bound</th>
</tr>
</thead>
<tbody>
<tr>
<td>Additional Metal I</td>
<td>0.2Ω</td>
<td>1KΩ</td>
</tr>
<tr>
<td>Additional Metal II</td>
<td>0.2Ω</td>
<td>1KΩ</td>
</tr>
<tr>
<td>Via Short</td>
<td>5Ω</td>
<td>5Ω</td>
</tr>
<tr>
<td>Junction Leakage</td>
<td>100Ω</td>
<td>10KΩ</td>
</tr>
<tr>
<td>Poly - Metal I short</td>
<td>0.2Ω</td>
<td>1KΩ</td>
</tr>
<tr>
<td>Poly - Metal II short</td>
<td>0.2Ω</td>
<td>1KΩ</td>
</tr>
<tr>
<td>Poly - Poly short</td>
<td>20Ω</td>
<td>1KΩ</td>
</tr>
</tbody>
</table>

Table 2-5 - Fault Model Parameters from [Harv94]

IFA has also been applied to other circuits. In [Sach95] and [Bru194], a class AB opamp is investigated and in [Kuij95] a flash ADC is studied. These three papers use the same simulation fault models for defects shown in table 2-6 below. In this case, each defect type has only one fault model parameter rather than the range of resistance values used in [Harv94].

<table>
<thead>
<tr>
<th>Defect type</th>
<th>Model parameter value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Metal Short</td>
<td>0.2Ω</td>
</tr>
<tr>
<td>Poly Short</td>
<td>20Ω</td>
</tr>
<tr>
<td>Diffusion Short</td>
<td>60Ω</td>
</tr>
<tr>
<td>Extra Contact</td>
<td>2Ω</td>
</tr>
<tr>
<td>Oxide Pinhole (thick oxide/junction)</td>
<td>2KΩ</td>
</tr>
<tr>
<td>Gate Oxide Pinhole</td>
<td>2KΩ</td>
</tr>
</tbody>
</table>

Table 2-6 - Fault Model Parameters from [Sach95][Bru194][Kuij95]

Non-catastrophic “soft” local faults (e.g. resistive shorts) are derived from catastrophic faults, since these defects are not considered by the IFA simulator used. The locations of the catastrophic short faults are used with a “softer” fault model given by Sachdev in [Sach95] shown in figure 2-4.
The values of $R$ and $C$ are given by:

\[ R = \frac{\rho_{SiO_2} s}{A} \quad C = \frac{\varepsilon_r \varepsilon_{SiO_2} A}{s} \]  

(2-1)

where

- $A$ is the area of interest (cross section)
- $s$ is the spacing between the two conductors
- $\rho_{SiO_2}$ is the resistivity of the insulator
- $\varepsilon_{SiO_2}$ is the relative permittivity of the insulator ($\approx 4$)
- $\varepsilon_0$ is the permittivity of free space ($=8.85 \times 10^{-14}$ F/cm)

The resulting impedance is given by

\[ Z_{\text{short}} = \frac{R}{1 + j2\pi fRC} \]  

(2-2)

The computed value of $C=0.001\text{pF}$ is used based on a value of $s=0.1\text{um}$. However, a value of $500\Omega$ is used as a resistance value based on the practical results from [Rodr92] rather than the theory. In practice, the small RC time constant in equation (2-2) means that the $Z_{\text{short}}$ impedance will be suitably approximated by $R$.

### 3.4 Shorts Occurring due to Defects in the Gate Oxide Layer

#### 3.4.1 Introduction

Although the previous section has concentrated on spot defects causing bridges and breaks between two conducting layers external to devices, another class of spot defects has also been observed which produce high resistance shorts. Gate Oxide Short (GOS) faults are pinhole defects occurring within a transistor caused either by lithography defects in the gate oxide or excessive voltage producing gate oxide breakdown. Examples of several oxide pinhole defect effects are given in [Syrz87].

Depending on the position of the GOS, the result will either be a short between the gate and source or drain or between the gate and the channel. These 3 possibilities are shown in figure 2-5, shown by A, C and B respectively. Since defective transistors will exhibit different electrical properties, different fault models are required depending on the pinhole location and the transistor types.
3.4.2 Gate-Channel GOS Fault Models

Gate-channel GOS faults have been analysed and several models have been developed. [Syrk89] presents a GOS fault model for n- and p-channel transistors obtained by considering a lumped-element MOS model. The fault model has the ability to model the location of the defect in the gate oxide in two dimensions. The parameters of the fault model therefore depend on the x,y position of the defect, the size and various electrical parameters such as diode breakdown voltages and short resistance. A unidimensional fault model is presented in [Rodr91] where the position of the defect in the channel is only modelled in terms of its distance from the drain and source. Several electrical parameters are also required, although the overall number is much less than [Syrk89]. This fault model is used for investigations into structural testing in [Ecke93b] and [Silv96], both use three channel positions and three values of resistance - requiring nine simulations per transistor. In [Segu95] a further fault model for GOS faults is presented which takes in to consideration the doping of the polysilicon gate relative to the substrate using a diode to model the p-n junction.

3.4.3 Gate-Diffusion GOS Fault Models

A fault model for gate-drain/source GOS faults is presented in [Hao93][Segu95]. Both fault models use resistors external to the transistor to model the short fault along with a diode if the doping of the polysilicon gate and the diffusion are opposite. The value of the short resistance used depends on physical parameters of the transistor such as defect size, diffusion densities etc. No values for the resistance are given in [Hao93][Segu95] although [Sode86] reports measurements on devices with gate to source/drain GOS defects from 0.8-4KΩ. A 2KΩ short resistance is used to model gate to source/drain GOS faults in [Sach95] and [Kuij95] based on IFA of an opamp and flash ADC respectively.

3.5 Hierarchical Fault Modelling

3.5.1 Aim

Circuit level analogue simulation is very computationally expensive and thus in order to reduce the simulation time of large analogue or mixed signal designs, higher level modelling has been successfully applied. Since analogue fault simulation requires repeated analogue circuit simulation, several authors have chosen to look at the feasibility of applying higher level modelling to reduce simulation time [Meix91]
The principle behind hierarchical fault modelling is that the effect of a fault at circuit level is propagated through to a higher abstraction level (e.g. macromodel, behavioural or functional level) where it is represented as a higher level fault model. The higher level fault model is used in subsequent fault simulations resulting in an overall reduction in fault simulation time. Simulation time can also be reduced because the mapping of faults from circuit level to higher level is generally not one-to-one - a large number of circuit level faults can potentially be collapsed into a reduced set of higher level faults.

The basic hierarchical fault modelling procedure consists of several stages. Firstly, the circuit is fault simulated at the lower abstraction level (circuit level). Faults are then grouped into sets with similar characteristics. The fault-free higher level model is then generated and modified to generate faulty models for each fault group. Note that it may not be possible to generate fault models for every fault considered [Nagi92]. The higher level fault-free and fault models are then used as part of a higher level fault simulation. In order to make hierarchical fault simulation efficient, this procedure must be less than the time required for fault simulation at the lower abstraction level. Therefore hierarchical fault modelling is particularly effective for designs which use repeated basic structures such as cell-based analogue design since the fault simulations required in producing the higher level models need only be performed once.

### 3.5.2 Literature Review

This hierarchical fault modeling procedure has been used to generate a functional level model for 3 CMOS opamps in [Meix91] including ac and dc faults. The procedure uses IFA to generate realistic faults (bridging fault only) which are then used for fault model generation. [Nagi92] uses a similar approach but considers a set of realistic faults at the circuit level to generate a similar model. Parametric transistor faults and faults in passive components are also considered. Soma in [Soma91a] examines the effect of circuit level faults in a sample-and-hold circuit. Although a higher level model is not proposed, realistic tests for the circuit are generated based on the faulty behaviours and considering fault equivalence. Similarly in [Kuij95] the hierarchical modeling of a comparator as part of a flash ADC is described. A fault macromodel for a bipolar opamp is presented in [Pan96] capable of modelling dc and ac effects. In [Zwol96] a CMOS opamp is modelled with its feedback components (inverting and non-inverting configurations). This approach also models the supply current of the device making it suitable for analogue fault simulation as part of a SCM test scheme, which is not considered in the other fault models. The main drawback is that the fault model is not universal (i.e. not a standard analogue design macro) and must be generated for specific applications.

Due to the inherent problems in generating higher level models for faulty circuits, some authors have used circuit level fault models for faulty macro blocks and behavioural level models for all others. Although the simulation time is not reduced as much as hierarchical fault modelling techniques, this has the advantage that no fault model development time is required and accuracy is increased around the faulty macro. Harvey [Harv94] presents this approach for a phase-locked loop. In [Aren96] a similar approach is used for a finite impulse response digital to analogue converter. However, in this case
a mixed-mode simulator was used allowing more flexibility in the models for the fault-
free parts of the circuit, for example digital level models for a shift register.

4. Inductive Fault Analysis

4.1 Introduction

As the number and density of transistors on an IC increases, the number of possible
defects that must be accounted for during testing increases. Considering bridging faults
for example, it is clear that the number of possible faults soon increases to an
impractical level if a short between every combination of nodes is considered.
Furthermore this approach would lead to some faults which are completely unrealistic,
for example shorts between two wires which are a large distance apart. Inductive Fault
Analysis (IFA) is the process whereby a list of realistic (most likely) defects, ranked
according to their probability of occurrence, is obtained from a description of the circuit
layout and process defect information. This not only reduces fault simulation time, but
improves test quality and reduces test time since tests can be aimed at faults which are
likely to occur.

IFA is performed using a simulator which generally uses one of two principles. Initial
work on IFA simulators produced Monte Carlo defect simulators which randomly assign
missing or additional pieces of material to different process layers and generate a
transistor-level fault based on the layout defect [Shen85] [Ferg88]. By repeating this
many thousands of times, an indication of the most probable set of faults is generated
along with a rating of the probability of occurrence. However, the Monte Carlo
approach that these defect simulators use is computationally intensive due to the large
number of simulations required which has led to research into alternative approaches.
An alternative heuristic method has been developed which is based on the sensitivity of
the layout to defects [Sous91].

4.2 A Review of IFA Results for Fault Simulation

Of particular interest for research into analogue and mixed-signal testing is the relative
density of occurrence of different defect types. Since it is not always possible to perform
IFA on a circuit, published figures on the density of particular defects can give an
indication as to the relative frequency of defect occurrence in typical processes at the
circuit level. This section provides a summary of published material on IFA for various
circuits and processes.

Furguson and Shen in [Furg88] give results of IFA for a 1.5u digital CMOS standard
cell process. Three circuits were considered: two (5,5,4) counters and a 4 x 4 multiplier.
The results are summarised in table 2-7.

Further IFA results are presented in [Jaco93] for 10 ISCAS benchmark digital circuits
fabricated using standard cell design (see table 2-8).
Chapter 2 - Literature Review and State of the Art

<table>
<thead>
<tr>
<th>Defect</th>
<th>Counter 1</th>
<th>Counter 2</th>
<th>Multiplier</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bridge</td>
<td>43%</td>
<td>39%</td>
<td>39%</td>
</tr>
<tr>
<td>Break</td>
<td>36%</td>
<td>38%</td>
<td>42%</td>
</tr>
<tr>
<td>Oxide Pinhole</td>
<td>7%</td>
<td>9%</td>
<td>8%</td>
</tr>
<tr>
<td>Other*</td>
<td>13%</td>
<td>14%</td>
<td>10%</td>
</tr>
</tbody>
</table>

Table 2-7 - IFA Results from [Furg88]

<table>
<thead>
<tr>
<th>Defect</th>
<th>Percentage</th>
</tr>
</thead>
<tbody>
<tr>
<td>Short</td>
<td>55.6%</td>
</tr>
<tr>
<td>Open</td>
<td>35.8%</td>
</tr>
<tr>
<td>Resistive Shorts (mainly due to gate oxide shorts)</td>
<td>8.9%</td>
</tr>
</tbody>
</table>

Table 2-8 - IFA Results from [Jaco93]

Bruls in [Brul94] presents results of IFA for a CMOS Class AB Opamp with approximately 30 transistors (see table 2-9). Non-catastrophic defects were derived from the catastrophic defects list (shorts and extra contact since the pinhole faults already have a high impedance).

<table>
<thead>
<tr>
<th>Defect</th>
<th>Percentage</th>
</tr>
</thead>
<tbody>
<tr>
<td>Short</td>
<td>37%</td>
</tr>
<tr>
<td>Extra Contact</td>
<td>17%</td>
</tr>
<tr>
<td>Oxide Pinhole</td>
<td>25%</td>
</tr>
<tr>
<td>Gate Oxide Pinhole</td>
<td>11%</td>
</tr>
<tr>
<td>Junction Pinhole</td>
<td>10%</td>
</tr>
<tr>
<td>Open</td>
<td>0%</td>
</tr>
</tbody>
</table>

Table 2-9 - IFA Results from [Brul94]

In [Kuij95], IFA figures are presented for a CMOS comparator used as part of a CMOS 8-bit flash ADC (see table 2-10).

<table>
<thead>
<tr>
<th>Defect</th>
<th>Percentage</th>
</tr>
</thead>
<tbody>
<tr>
<td>Short</td>
<td>95.43%</td>
</tr>
<tr>
<td>Extra Contact</td>
<td>0.18%</td>
</tr>
<tr>
<td>Gate Oxide Pinhole</td>
<td>3.13%</td>
</tr>
<tr>
<td>Junction Pinhole</td>
<td>1.04%</td>
</tr>
<tr>
<td>Thick Oxide Pinhole</td>
<td>0.18%</td>
</tr>
<tr>
<td>Open</td>
<td>0.03%</td>
</tr>
<tr>
<td>New Device</td>
<td>0.01%</td>
</tr>
<tr>
<td>Shorted Device</td>
<td>0.002%</td>
</tr>
</tbody>
</table>

Table 2-10 - IFA Results from [Kuij95]

* This category consists of transistor stuck-on, bridge/break combinations, new transistors, transistors stuck off and exceptions. The vast majority of faults in this category are transistors stuck-on, due to missing poly defects.
In [Sebe95] a table of likely failure modes for a digital CMOS process is presented. The table presents the relative densities of occurrence normalised to a short occurring in the metal 1 layer (see table 2-11). Here, the short is the most common defect, with opens occurring mainly in contacts and vias.

<table>
<thead>
<tr>
<th>Layer(s)</th>
<th>Failure</th>
<th>Relative Density</th>
</tr>
</thead>
<tbody>
<tr>
<td>Diffusion</td>
<td>open</td>
<td>0.01</td>
</tr>
<tr>
<td></td>
<td>short</td>
<td>1.00</td>
</tr>
<tr>
<td>Polysilicon</td>
<td>open</td>
<td>0.25</td>
</tr>
<tr>
<td></td>
<td>short</td>
<td>1.25</td>
</tr>
<tr>
<td>Metal1</td>
<td>open</td>
<td>0.01</td>
</tr>
<tr>
<td></td>
<td>short</td>
<td>1.0</td>
</tr>
<tr>
<td>Metal2</td>
<td>open</td>
<td>0.02</td>
</tr>
<tr>
<td></td>
<td>short</td>
<td>1.50</td>
</tr>
<tr>
<td>Aluminium/Diffusion contact</td>
<td>open</td>
<td>0.66</td>
</tr>
<tr>
<td>Metal1/Polysilicon contact</td>
<td>open</td>
<td>0.67</td>
</tr>
<tr>
<td>Vias</td>
<td>open</td>
<td>0.8</td>
</tr>
</tbody>
</table>

Table 2-11 - Likely Failure Modes from [Sebe95]

Soma uses IFA in [Soma91b] as part of a realistic defect oriented approach using hierarchical fault modeling of a sample-and-hold circuit based on an opamp. The results show that although catastrophic bridge/break faults occurred, they were equally as prevalent as other types of fault such as defects causing incorrect component values.

### 4.3 Alternative Approaches to IFA using Realistic Fault Mapping

Whilst the technique of IFA has been shown to be a useful tool in the application of realistic fault list generation, it has several inherent limitations listed below.

1) IFA is a very computationally intensive and therefore time consuming process. This can limit the size of circuits to which IFA can be applied.

2) The final layout must be available and hence the entire design must be complete. Thus, IFA can only be applied at the end of the design cycle, which precludes its use as part of an integrated test approach since a realistic set of defects (fault list) is not available for fault simulation until the final design phase.

One possible solution to these points is presented in [Ohle96] as “Local Layout Realistic Fault Mapping” (L²RFM). The hypothesis of L²RFM is to use IFA on standard analogue structures such as differential pairs, current mirrors etc., which occur frequently in analogue design. Such structures are readily extracted from a schematic and knowledge of their associated defects is used to generate realistic defects from netlists. In [Ohle96] a fault list for a CMOS opamp is reduced from 45 to 27 faults by considering realistic defects without using IFA on the whole circuit. It is stated that open source faults on transistors connected to the power rails are particularly unlikely since they generally have a set of contacts which makes the structure tolerant if one of them is missing.
[Prie97] presents an initial study into an approach to produce the probability of occurrence of faults in relation to their device structure. The advantages of such an approach would be

1) A set of realistic defects for structures along with their probability of occurrence would be obtained.
2) The layout of structures could be redesigned to produce improved testability.

As part of the work, the authors analyse different transistor structures using IFA and attempt to produce a relationship between the probability of occurrence of certain types of faults and the width, length and number of gates of the transistor. The models produced were used to predict the probability of faults occurring in a fully differential opamp to a good degree of accuracy (compared to results obtained from IFA of the full layout). It was noted that only 1/4 of the faults occurred in the interconnection area which was not considered by the probability model. Further work is proposed applying this technique to simple analogue structures (as in [Ohle96]) to eventually provide a library.

5. Analogue Fault Simulation

5.1 Introduction

Section 3 described simulation fault models used to enable the simulation of defects. In order to evaluate the quality of a structural test technique in terms of its ability to detect a set of faults, Analogue Fault Simulation (AFS) is required. The basic approach used in many AFS systems consists of three parts: fault injection, repeated simulation and post-processing fault detection. During fault injection, fault models are inserted into a fault-free netlist to generate faulty circuits. These faulty circuits are then simulated along with the fault-free circuit during the repeated simulation stage. Faulty circuit responses are then compared with the fault-free circuit response to determine the level of fault detectability. In order to model the possible fault masking effect of deviations in process parameters, a tolerance is usually applied during the detection decision.

This section provides a review of literature describing research into AFS. The first part provides a review of early work on AFS mainly required for fault diagnosis of solid state components. The next section provides a review of AFS systems that have been implemented. The review focuses on the main features and advantages/disadvantages of various systems.

5.2 Early Fault Simulation Work for Fault Diagnosis

Much of the early literature in the area of fault simulation of analogue circuits is on the diagnosis of systems of discrete devices at board level. Most fault diagnosis techniques require the construction of fault dictionaries (lookup tables of faulty responses) which in turn requires AFS hence research work has concentrated on efficient methods of fault simulation.
In [Band85], two approaches for the reduction of fault simulation time are reviewed: the application of Householders formula and the application of complementary pivot theory. Both techniques are based on using matrix operations to reduce the fault simulation time, but they are only applicable to DC analysis. Further, the Householders formula approach is only applicable to linear circuits, hence their usage is strictly limited.

A correctly functioning analogue circuit has a tolerance range associated due to random process variations in its components. Further, a circuit under a fault condition will also have an upper and lower range. During fault simulation the possible effect of fault masking by these tolerance ranges must be considered; hence it is necessary to calculate the movement of the high and low worst cases under fault conditions (fault bands). Fault bands can accurately be accounted for by first computing the nominal shift in values caused by a fault and then calculating the worst cases. In [Pahw82], an efficient approximation to this is proposed by computing the worst case tolerance band and then performing fault analysis on the worst-cases extremes. Faults are modelled by changing the admittance matrix, with shorts as $R = 0$ and opens as $R \to \infty$. The assumption this technique makes is that the sensitivities to component variations will not change too much under these catastrophic fault conditions and an approximation to fault bands will be obtained. The program described performs AC analysis with the assumption that the operating points of devices remain approximately in the linear region under fault conditions which is, unrealistic although good correspondence with the more accurate fault band approach is obtained for two bipolar amplifier circuits.

[Jago79] presents early work on fault simulation for solid state circuits based on the ISPICE circuit simulation program. The paper describes fault simulation for DC operating point analysis only. Only catastrophic failure modes are considered, with shorts and opens modelled using high impedance paths for opens and low impedance paths for shorts. The program allows user-defined values for the faults, since the authors found that using softer fault models (i.e. a lower resistance for open faults and a higher resistance for shorts) produced fewer convergence problems. The simulation process described starts with the simulation of the nominal circuit. The subsequent fault conditions are then inserted into the circuit and the simulator iterates to a new solution. Although this requires direct control over the circuit simulator, re-netlisting for each fault condition is avoided. The post-processing analysis uses a user defined, bi-directional, minimum-maximum voltage threshold which may be defined as a percentage or an absolute value. The paper stresses the advantages of using the test criterion after simulation to allow the option of testing with different limits without re-simulation. An example of application to a 5 volt regulator is given.

### 5.3 A review of Analogue Fault Simulation Systems for IC Testing

In later work the role of AFS has changed to the detection of defects in analogue and mixed signal ICs (go/no-go testing). [Morr89] describes a mixed-mode test system based on a simulation system utilising analogue and digital hardware accelerators. Faults are modelled at the device level and comprise element and net opens and shorts and component deviations. The applicability of net shorts is determined based on intercapacitance calculations, available from the design phase (assuming a layout is available). Macros without faults are modelled at a higher level, including using behavioural digital simulation on analogue macros, transformed via the Z-transform,
such as that used in DRAFTS [Nagi93]. Post-processing and tolerance effects modelling are not described.

The "Anafault" simulator is described in [Sebe93a] [Sebe93b] which is a general purpose analogue fault simulator based on the "Eldo" circuit simulator. Anafault allows up to 8 simulations of one of three analysis types: DC, AC or transient. Two modes of analysis are described in the papers:

1) All simulation types are run and post processing is performed after all simulations are finished. Resimulation is not required if post-processing test limits are to be changed.

2) The simulator cycles through each analysis type in order, performing post-processing detection after each analysis, and stops when a fault is detected. This allows for more efficient fault simulation, since the analyses to be performed can be ordered as real production tests (i.e. simple dc tests first, followed by more complicated tests), and superfluous simulation of faults which are detected early in the simulation cycle is avoided. Changing test limits may however require resimulation.

The Anafault fault injection system allows user-definable simulation fault models as netlist subcircuits as well as providing standard simulation fault models (resistor and source-based catastrophic faults). Component matching faults are implemented by the deviation of a component parameter in opposite directions for the two devices. The fault injection stage detects redundant faults, source loops (which is important for the source-based simulation fault model), isolated nodes, and floating MOSFET gates.

The post-processing capabilities are described as a fixed threshold boundary on the measured parameter (e.g. voltage, current) for all types of analysis. In addition, a temporal/frequency threshold may be applied to allow for poor time or frequency resolution in ATE. Quantization in time of amplitude is also available to facilitate the modelling of A/D converters.

A procedure for the reduction of AFS time for transient analysis is presented in [Verm93a] [Verm93b]. The assumption is made that each simulation is aborted after a predetermined deviation is exceeded at an output. The technique described is as follows:

1) Order the faults and cluster them with respect to their sensitivities and temporal effects to the outputs. This is achieved by using a very rough transient simulation for all faults. Faults in the same group are thus expected to be detectable in approximately the same time period.

2) Create a number of hypernetworks (i.e. multiple copies of the networks with faults in the same group) for simulation. An example shows that for a 2nd order Band Pass Filter with 100 faults, the optimum number of hyper networks is 10.

3) Simulate the hypernetworks. The Saber simulator is used to simulate the networks.

The detection process is contained within the simulation process, so there is no post-processing analysis and the whole simulation must be re-run if detection thresholds are to be changed. No information on detection thresholds, the fault injection process or
fault models supported is given. A decrease from 9.45s to 7.56s was obtained for a 2nd order band pass filter simulated with parametric deviation faults.

A novel approach to AC AFS analysis of linear circuits is presented in [Nagi93] as the "DRAFTS" discretized fault simulator. The method described involves the transformation of the circuit into the discrete time Z-domain. The principle relies on the fact that simulation in the Z-domain is faster than iterative simulation techniques normally used for analogue circuits. The full method is as follows:

1. The state equations for the behavioural level circuit in the complex frequency s-domain are derived from the signal flow graph of the circuit.
2. The bilinear transform is applied to the resulting equations, transforming them into the discrete Z-domain. The solution to a given input will be a discretized output.

This must be performed for the nominal and faulted circuits, since it is shown in the paper that the mapping of faults from the circuit domain to the Z-domain is a one-to-many function. A single fault at circuit level maps to multiple faults in the Z-domain. Consequently, faults cannot be modelled directly in the Z-domain, which would be more efficient. Capacitive short faults (i.e., the introduction of new capacitances) also present a problem since they increase the number of states, forcing the discrete network to be rebuilt. Faults in opamps are modelled using reduced order rational functions which are added to the state equations. For a biquadratic filter circuit, simulation results to within 1.5% of PSPICE are obtained 2 orders of magnitude faster, although in practice the actual speedup depends heavily on the sample rate. Although the fault injection and repeated simulation stages are given, no post-processing algorithms or detection thresholds are discussed in the paper.

A similar approach is presented in [Vari96] for linear circuits using a state space representation of the circuit and polynomial representations of the output waveform. Since the output is represented as a polynomial, this has advantages over similar simulation methods, such as DRAFTS, in that the output at a given time can be obtained simply by substitution rather than calculating intermediate points.

In the previous two approaches described, the concept of working at a higher level of abstraction in an alternative domain produces efficient fault simulation. However, both cases are limited not only to linear circuits, but to circuits which remain linear even under fault conditions. Whilst these may be applicable to fault simulation of, say, passive component errors in a filter circuit, catastrophic fault conditions causing non-linearities are not possible.

[Caun95] describes a methodology for test program verification using Saber as a mixed-signal simulator. The test development methodology is as follows:

1) A functional block diagram level representation of the circuit is simulated. Worst-case analysis can be used at this stage to determine test limits.
2) Tests are described using a graphical test description language.
3) A schematic tool is used to build the test configuration which is then linked to the device schematic.
4) The "virtual ATE" is used to generate tests and record measurements on
instrument models during AFS. In particular device/ATE interactions and test
program behaviours such as impedance matching, settling time, transient effects,
precision and synchronisation are modelled by the virtual ATE and can be
considered at this stage.

To reduce the AFS time, large circuits are partitioned into functional blocks. The
functional block with the fault under test and the surrounding functional blocks are
simulated at the device level with the remainder of the circuitry simulated at behavioural
level. For this approach to be successful in reducing simulation time, a large circuit must
be assumed. The AFS approach described has the following features:

1) Fault injection. - The procedure assumes all faults in a library for a given
component. The fault list is reduced by ignoring equivalent faults.
2) Repeated simulation. - A more efficient simulation is obtained by utilising the
fact that the Saber simulator does not need to recompile the netlist if the only
change to be made is a component value (i.e. a soft fault). Hard faults result in
a change in netlist and hence require a recompilation.
3) Post-processing - Tolerance bounds are applied around the nominal fault-free
responses. Measurement ranges for faulty circuit responses are obtained
considering test equipment accuracy. Three cases can result: *Always Detected*
where the faulty circuit measurement range lies completely outside the good
circuit tolerance range, *Never Detected* where the faulty circuit range lies
completely within the good circuit range and *Uncertainly Detected* where
faulty circuit measurement range lies partly inside and partly outside the fault-
free tolerance interval.

6. Summary, Conclusions and Justification of Work

A wide variety of structural test techniques has been covered in this literature survey,
some of which have shown promising fault detection results. In particular there is much
interest in supply current monitoring-based techniques which have shown high fault
coverage. Structural BIST and DFT techniques have also been proposed which can be
used to reduce the amount of test circuitry required. The techniques proposed have
ranged from simple approaches, such as monitoring DC voltages to more complicated
methods such as correlation techniques and full BIST schemes. Although fault-driven
test approaches may not totally replace functional testing, they may be used to reject a
large number of faulty devices using relatively simple test techniques and to increase
quality and reliability. It is particularly cost effective to do this early in the production
test cycle.

The reliability of modern ICs is becoming increasingly important since they are
frequently becoming an essential part of safety critical systems. It has been shown that
faults undetectable in a functional test can be detected using supply current monitoring.
The structural testing approach detected devices which appeared to be correctly
functioning but were likely to fail during operation hence causing reliability problems.

Due to the diversity in proposed structural test techniques, it is particularly
advantageous to be able to compare simulation results from different test approaches.
An accurate test quality evaluation technique is therefore required. There are however several problems in test evaluation and comparison which exist:

a) There is no standard fault model for analogue ICs. Although the catastrophic fault model is often included in most test evaluation work, it has been modelled using different parameters within simulation fault models. Failure mode and defect analysis results have shown that analogue ICs fail due to a variety of failure modes and in particular, soft faults are shown to occur. This produces problems when trying to derive fault models since the actual parameters of a fault model are continuous, although they are generally either approximated as a single value or require multiple simulations. Complex fault models such as gate oxide shorts and floating gates potentially require a large number of simulations if all combinations of defect location, type and severity are to be simulated. Inductive fault analysis has been proposed as a means of both reducing a fault list and obtaining realistic defects but requires process and layout information and is computationally intensive. Much published work assumes that all faults are equiprobable.

b) Pass/fail detection thresholds used to model process tolerance effects have tended to be arbitrary, particularly in earlier work. Many published results have used fixed thresholds around the fault-free circuit simulation response to define the test limits which is assumed to stay constant for a variety of different test inputs. The magnitude of this threshold has been shown to be crucial to fault coverage results obtained. Some later work has used Monte Carlo simulation to derive more accurate test limits for the fault-free circuit, however the effect of process deviation on faulty circuits is often either not considered or assumed to be the same magnitude as the fault-free case. Even when process parameters are accurately modelled within a fault simulation, results cannot be meaningfully compared unless the distributions are the same. A further problem is that using Monte Carlo simulation greatly increases fault simulation time.

c) Comparison of structural test techniques is hard because a variety of different test circuits have been used, implemented in differing technologies and simulated with different circuit simulators. Only very recently have a set of benchmark circuits for analogue test been published [Kami97].

7. Aim of Work and Structure of Thesis

The literature review has thus highlighted several issues in the analogue IC test domain which should be addressed. Approaches to several of these problems are described and investigated in this thesis. Based on the literature review, there are three main aims to the work here:

1) Firstly, it is apparent that research into structural analogue test techniques requires a flexible fault simulation tool. Therefore, the first objective is the development of such a tool. Particular points are the accurate modelling of the faulty and fault-free tolerance bands using techniques which produce the minimum increase in fault simulation time.
2) The improvement of fault simulation test metrics. The aim is to make the structural test metrics more realistic in terms of fault modelling and probability of fault occurrence since existing simulation results have often made assumptions about these.

3) The investigation of structural test techniques. Firstly to produce an accurate comparison between structural test techniques which has been shown to be difficult (point c) in section 6). Secondly to determine the reasons for undetectable faults and improve fault detectability.

Based on these aims the structure of this thesis is as follows:

In chapter 3 an analogue fault simulator (ANTS) based on the HSPICE circuit simulator is described. In particular, the simulator is capable of performing Monte Carlo fault simulation to model the effects of process spread on the faulty and fault-free circuits. Although Monte Carlo simulation of faulty circuits has been considered in the literature, no fault simulation tool with this capability has been described.

Chapter 4 extends the standard test evaluation of fault simulation using a statistical approach which is integrated into the fault simulation tool. Monte Carlo fault simulation is used to obtain approximations of output variable distributions which produces a more accurate test quality metric than the standard fault coverage figure, although it can be interpreted in a similar manner. Statistical fault simulation also provides an accurate benchmark against which faster but potentially less accurate fault simulation techniques can be compared.

In chapter 5, two novel approaches for the reduction of Monte Carlo fault simulation time are described, since this is the main limitation with this technique. Both techniques are based on reducing the number of Monte Carlo simulation runs. The first technique derives best and worst case parameter sets from the fault-free simulation and uses them to generate test limits for faulty circuits. The second technique uses a rough initial fault simulation for test selection and to eliminate clearly detectable or undetectable faults. The techniques are evaluated with respect to the statistical approach.

Methods of making analogue fault simulation results more realistic in terms of production test quality are investigated in chapter 6. Firstly a fault weighting approach for statistical fault simulation is described which considers the relative frequency of fault occurrence based on published figures of probability of occurrence without using IFA. Secondly, an investigation which considers the value of a short resistance as a statistical distribution input to the Monte Carlo fault simulation approach is presented. Many previous approaches have assumed all faults to be equiprobable and considered fault model parameters to be single-valued rather than a distribution.

The final chapter presents an accurate investigation into supply current monitoring and a comparison of structural test techniques. Many techniques and concepts used in the analysis derive from investigations in previous chapters. The reasons for low detection of certain faults is investigated since this could be used to indicate circuit structures with potential testability problems. A possible method of increasing fault detection for transient supply current monitoring is also presented.
Chapter 3
The ANTICS Analogue Fault Simulation Software

1. Introduction

From the literature review it is clear that research into structural test methodologies requires an analogue fault simulation tool for test quality evaluation. Several analogue fault simulation approaches have been described in the literature (see chapter 2, section 5), but to maintain flexibility and allow research into analogue fault simulation methodologies, a fault simulator for analogue circuits was developed as part of this work. This chapter describes the ANTICS fault simulation software which was developed to allow research into analogue circuit testing. An overview of the software structure is given in section 2. Similar to other fault simulators described in the literature review, ANTICS uses a commercial circuit simulator, HSPICE [HSPI96], to perform simulations as part of the fault simulation procedure. The advantages and disadvantages of this are discussed along with a description of the main features of the HSPICE simulator required for an appreciation of the operation of ANTICS. The programs comprising ANTICS are described in sections 3-5 in terms of their basic function and inputs and outputs and compared with other simulation schemes described in the literature. Section 6 presents a summary of the chapter.

2. ANTICS - An Overview

The structure of the ANTICS fault simulation package is shown below in figure 3-1.
ANTICS consists of 4 programs: ANAFINS, ANAFAME, ANACOV and MCRAND written in C and running on SUN workstations. ANAFINS provides fault injection into an HSPICE netlist using a library of parameterisable fault models. The output from this process is a set of HSPICE input files each of which corresponds to a fault. These files, along with the fault-free circuit netlist are simulated using ANAFAME. ANAFAME provides management of distributed circuit simulation over a network cluster of workstations. The final analysis (post-processing) stage is provided by ANACOV which determines figures of fault coverage test quality based on the output responses. MCRAND is used as a random number generator for Monte Carlo fault simulation, having advantages over the HSPICE built-in random number generation. Inter-program communication is via a fault database which is created by ANAFINS during fault injection and used and modified by ANAFAME and ANACOV.

The ANTICS software suite was written collaboratively, the authors contribution being the programs ANAFAME, ANACOV and MCRAND. A graphical user interface has also been written for ANTICS by Chee [Chee97] using the Cadence DFWII design framework.

2.1 Why Use a Commercial Simulator for Fault Simulation?

Several fault simulation approaches are described in the literature, some of which have been based on existing circuit simulators, such as ANAFAULT [Sebe93a,b] which is based on ELDO, and two approaches based on Saber [Verm93a] [Caun95]. Other approaches have focused on techniques which require the manipulation and transformation of circuit matrix representations directly, such as those described in [Nagi93] [Vari96] [Pahw82], to reduce fault simulation time, but these are only applicable to linear circuits. Whilst a greatly reduced simulation time is reported for circuits such as passive component faults in linear circuits such as filters, such techniques are not applicable to non-linear circuits nor linear circuits which become non-linear under fault conditions. These linear circuit techniques are too limited for use in the general case.

Further to this, the use of an existing commercial "core" simulator as opposed to the development of circuit simulation software has several pros and cons highlighted and discussed below.

Advantages in using a commercial simulator:

1) The main advantage is the avoidance of very high development costs and time required to produce an efficient, accurate circuit simulator. Circuit simulators have been developed and optimised over many years and a suitable simulator could not be produced in a realistic time frame.

2) SPICE-like simulation is widely used and device models are widely available. Although several different implementations and versions are available, the basic principles and device syntax remain very similar and using a SPICE simulator rather than a proprietary version provides some portability.
Disadvantages in using a commercial simulator:

1) It is not possible to manipulate the internal circuit representation directly. Thus, fault injection must be performed by altering the input text file directly rather than by changing the netlist matrix structure within the simulator. This requires a separate input file for every fault all of which must be parsed and converted into an initial connection matrix. These overheads are not generally considered too limiting in terms of simulation time.

2) Fault simulation cannot be stopped at the point when a fault becomes detectable; some unnecessary simulation time will be wasted in simulating faults after fault detection. Although it is possible to stop a simulation on some simulators such as HSPICE when a certain circuit condition is met, these conditions are not general enough to be used as pass-fail thresholds except using the simplest of criteria (e.g. crossing a fixed level). Stopping the simulation of a fault before it has completed has the disadvantage that the fault must be completely resimulated if a new detection criterion or tolerance threshold is to be applied. This is not therefore seen as a drawback if flexibility is to be maintained.

The basic arguments are thus those of simulation control versus development time. It was decided that in this case the advantages greatly outweighed the disadvantages and the HSPICE simulator was chosen as a circuit simulator for use in the ANTICS simulation software similar to those mentioned above using different circuit simulators.

2.2 The HSPICE Circuit Simulator [HSPI96]

The main input file to the HSPICE simulator is a text file, often referred to as a deck, with the lines as its cards. The file consists of a circuit netlist, an input stimulus, component models, simulation control cards and input/output control cards. The SPICE simulator can be used to simulate a circuit in several different analysis modes: including DC sweep, transient and AC small signal. Each of these modes may be used as part of a fault simulation within ANTICS.

The output control cards determine which variable(s) (nodal voltages or branch currents) are to be measured during fault simulation and will appear as part of the HSPICE text output file. As only results appearing in this file can be used in the ANTICS post-processing stage, the selection of outputs plays an important part of analogue fault simulation. ANTICS supports two HSPICE output commands, the functions of which are described below.

```
.print
```

The .print statement prints a nodal voltage or branch current for every point on a DC, AC or transient simulation. This statement is used when the entire waveform is of interest.

```
.measure
```

The .measure statement allows the measurement of a current, voltage, time, frequency or DC sweep level when a user-specified circuit condition occurs. Thus measurements
such as delay time, bandwidth and offset voltages are possible. The `.measure` command also allows transient measurements to be post-processed, producing for example the RMS of a waveform for a given simulation time.

3. Simulating Process Parameter Deviations

3.1 Introduction

Deviations in IC manufacturing processes cause a fluctuation in the expected output response of an IC. The effect of this is that no two chips with the same circuit and layout will produce the same response. Rather, a set of chips will produce a range of values dependent on the severity of the process deviations. This range of values has implications in the post-processing fault simulation stage described in section 6.3.

Random variations in the manufacturing process of ICs are caused by many different factors, for example process gradients of temperature, oxide thickness and implantation densities. Furthermore, not only does fabrication equipment drift over time, but the position of a wafer in a machine may also produce a variation in parameter values. For circuit simulation purposes, these “global” parameter deviations are generally lumped under the term *interdie* variations since they affect all transistors on a given wafer by equal amounts.

In addition to interdie variations, process deviations also cause mismatch errors between transistors on the same chip. These *intradie* variations are caused by effects such as oxide gradients over a chip. Modelling of this type of deviation requires that every transistor is treated separately since the effect of the deviation is a “local” deviation particular to a device. Local and Global Process parameter deviations are summarised in table 3-1.

<table>
<thead>
<tr>
<th>Variation</th>
<th>Effect</th>
<th>Example Cause</th>
</tr>
</thead>
<tbody>
<tr>
<td>Interdie</td>
<td>Global</td>
<td>Equipment drift</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Global process gradient concentrations</td>
</tr>
<tr>
<td>Intradie</td>
<td>Local</td>
<td>Oxide thickness</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Oxide trapped charge</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Implantation densities</td>
</tr>
</tbody>
</table>

Table 3-1 - Process Deviation Effects

3.2 Monte Carlo Simulation

The aim of Monte Carlo analysis is to model the effect of process parameter deviations on a circuit response. Device parameters are replaced with statistical models, i.e. their probability density functions (PDFs), and for each simulation run input parameters are assigned a random value based on the corresponding PDF. By repeating a number of simulations, a model of the expected output response distribution of a set of manufactured ICs under similar process parameter deviations is obtained.
Monte Carlo simulation techniques have the inherent problem of requiring a large number of simulation runs to produce a meaningful distribution. However, they use a much simpler approach than analytical techniques such as response surface methodology [Isma94] and do not require any modelling to be performed.

3.3 Statistical Modelling Techniques

Work on statistical modelling of MOS transistor mismatch is presented in [Pelg89] based on local parameter deviations. This work has been extended in [Mich92] [Mich93] to form the Statistical MOS (SMOS) model based on intradie and interdie variations. In practice manufacturing tolerance effects will not affect component parameters independently. For example a variation in oxide thickness will affect more that one transistor parameter. In order to take this into account, the SMOS model preserves the correlations between the model parameters using principal component analysis to generate correlated random numbers. The SMOS model requires two process fitting constants per model parameter which are derived empirically, a device layout in order to obtain transistor coordinates and a parameter correlation matrix. Since these parameters are not generally available and require extensive measurements from the production environment, the SMOS model is not used here.

3.4 Monte Carlo Simulation using HSPICE

HSPICE has a built-in function for Monte Carlo analysis but this has a number of limitations which may present a problem as part of a fault simulation scheme. Firstly, the same random number sets are used in subsequent Monte Carlo simulations. That is, no random number seed is used to generate different sets between simulations. Secondly, it is hard to determine from the HSPICE output file parameter values used for a given simulation run, which are required for example for best/worst case parameter extraction.

A more flexible and simpler alternative is to use data-driven simulation. A set of data are stored in an input file which is read during simulation. The parameter values in the file are used as parameters in a simulation. By generating this input data with a random number generator, this approach can be used as a Monte Carlo simulation scheme. This principle is utilized for statistical fault simulation in the ANTICS software and is required for several post-processing detection modes described in section 6.

3.5 MCRAND

In order to generate the random number sets for data-driven Monte Carlo simulation, and to overcome the limitations described above, the MCRAND program was developed. The basic inputs and outputs of the program are shown in figure 3-2.
The main input to the program is a command file which consists of a list of HSPICE parameters to be used and their distribution type and parameters associated with them (e.g. mean and standard deviation for a Normal distribution). Three distribution types are available: Uniform, Normal and limit (figure 3-3).

a) **Uniform distribution** - The Uniform distribution has a flat PDF with parameters: centre MEAN and maximum variations +/- SPREAD.

b) **Normal distribution** - The Normal distribution has a PDF with parameters centre MEAN and standard deviation of SPREAD.

c) **Limit distribution** - SPREAD is either added or subtracted from MEAN depending on whether the outcome of a 0-1 uniform distribution is greater than or less than 0.5.
c) Limit Distribution

Figure 3-3 - MCRAND Distribution types

Gaussian and Uniform distributions also have a MULTIPLIER value associated with them that causes MCRAND to repeat the calculation a number of (MULTIPLIER) times with the furthest deviation from the MEAN saved. This produces a bimodal distribution. A MULTIPLIER value of 1 (the default) has no effect.

The MCRAND program generates as its output a file with columns of data suitable for input to an HSPICE data driven analysis.

4. Fault Injection

4.1 Introduction

From the literature review, it is clear that no standard simulation fault model exists for analogue ICs. Moreover, even though some authors have used the same fault model connectivity, they have used different parameters within the model. Unless analogue fault models become standardised, one requirement for fault injection as part of an AFS scheme is that fault models should be user-definable. Similarly, the models themselves should be parameterisable, to allow the same fault model to be injected with different fault model component parameters. Fault injection with user-definition of fault models is described in the literature, for example ANAFAULT [Sebe93a,b] allows user-defined subcircuit definitions to be used as fault model definitions. In [Caun95], a range of standard fault models are used, such as shorts and opens, but these are parameterisable.

The list of faults to be injected will either be obtained from IFA (or a realistic fault mapping technique), from the user or derived automatically from a circuit netlist. In particular, since hierarchical circuit netlists are often used to describe circuits, the ability to specify faults according to their complete hierarchical subcircuit path is desirable.

4.2 ANAFINS

ANAFINS provides the fault injection for the ANTICS software. The basic inputs and outputs generated by the program are shown in figure 3-4.
ANAFINS produces a set of HSPICE input files each containing a fault injected into the fault-free circuit netlist (HSPICE input file). A library of fault models is described in the fault model definition file. ANAFINS allows fault models to be defined with a fault modelling language based on HSPICE commands. For component-based faults, the existing component, instance parameters, device model and nodal connections from the fault-free file are available as parameters which can be used in the fault model descriptions. Therefore fault models can change connectivity, (e.g. the insertion of an extra component), existing component parameters, (e.g. length or width of a MOS transistor), and device model parameters, such as oxide thickness. Additional components required by a fault model can be also added.

The set of fault models to be injected and their corresponding parameters are defined in the fault injection specification file. Although ANTICS does not specifically use IFA, a pre-process to generate the fault injection specification file from IFA results could be added. One of the main features of ANAFINS is that it allows faults to be injected on components described in terms of their full hierarchical instance, including wildcards. Thus it is possible, say, to define a fault injection to be applied to all n-type MOS transistors within a specific subcircuit.
An example fault model definition for a drain open fault is shown in figure 3-5. The first command line of the fault model definition defines a fault called mdop1 applicable only to HSPICE components beginning with the letter m, i.e. MOS transistors. The fault model belongs to the basic fault class of open. Two parameters, %rfault and %cfault are defined in the next two lines, with default values 1.8e-16 and 10 Meg respectively. The default values may be overridden by parameters passed from the fault injection definition. The .connected statement is used to specify that the existing component is present in the faulted netlist with its existing model parameters (#m) and instance parameters (#p). In this case the previous node 3 is replaced by an additional onode, since that is the third parameter. The next two lines define the instances of the circuit components in the simulation fault model. Any valid HSPICE syntax is applicable within the fault model definition. The final .endf command ends the fault model definition.

An example fault injection specification file is shown below corresponding to the fault model.

```
** catastrophic drain open fault
.inject mdop1 rfault=1G cfault=1e-15
.endi
```

This file will apply the open fault model to all instances in all subcircuits to which the mdop fault model can be applied i.e. all MOS transistors. The values of parameters rfault and cfault defined here are used in formation of the fault, in place of the default values. Individual components can be excluded or included using fully hierarchical specific and wildcard component matching.

ANAFINS also generates a fault injection database containing information on all faults which have been injected and their parameters. This file is used by subsequent ANTICS programs, providing them with fault information and allowing inter-program communication.

### 5. Repeated Simulation

#### 5.1 Introduction

The next stage in the fault simulation process is the repeated simulation of the faulted files and simulation of the fault-free file. This is the most time consuming of all analogue fault simulation. There has been much research effort in the area of reducing this simulation time including higher level circuit and fault modeling (described in chapter 2.4) and simulation in alternative domains, such as [Nagi93][Vari96]. The limitations of such approaches have been discussed and for such reasons, they are not implemented within the ANTICS simulator.

One method of reducing the simulation time is to distribute circuit simulations over a network cluster of workstation processors. It should be noted that this does not reduce problem complexity, but rather it reduces the "user" simulation time. Two of the requirements to make fault simulation distribution efficient are:
1) The overhead required to start a simulation on a remote processor must be less than the time it takes to simulate the fault. In general this will be true for non-trivial circuit simulations.

2) Faults should be allocated dynamically as processors become available since simulation time can vary between faulty circuits due to the different circuit structures, processor capabilities and loading of multi-user systems.

5.2 ANAFAME

Considering the points mentioned in section 5.1, the ANAFAME program for simulation was developed. The inputs and outputs are shown in figure 3-6.

![Figure 3-6 - ANAFAME](image)

The hosts database file contains a list of workstation hosts on which the fault simulations are to be performed. Initially, HPSICE simulations of the faulty circuits are repeatedly allocated until there are no free processors and then dynamically as host processors finish faults. The fault-free circuit is also simulated if required.

The fault model database is updated by ANAFAME during the fault simulation so that the status of the simulation can be monitored and the simulation resumed from its previous state if it is stopped.

6. Post-processing Analysis

6.1 Introduction

The final stage in fault simulation is that of results analysis and test evaluation. This is based on the set of faulty circuit responses and the fault-free response. The review section highlighted post-processing as one area of analogue fault simulation in which there are currently many anomalies. In particular, the setting of detection thresholds during structural test investigations has tended to be arbitrary. This section provides an introduction to the main concepts and features of the ANACOV post-processing.
software. These are refined and enhanced during investigations presented in later chapters.

The post-processing part of fault simulation can be thought of in two parts: that which models physical effects of test equipment and that which provides the detectability measure.

### 6.2 Test Equipment Modelling

Post-processing analysis is essentially used as a model of the physical test equipment or BIST scheme that is to be used on a given circuit. Whilst the most accurate method of test evaluation would be to include a complete circuit level model of the test equipment as part of the circuit simulation, this would greatly increase simulation time and would be limiting, not allowing for example more complicated evaluation algorithms. Therefore, circuit level test equipment models are better suited to modelling the physical effect of the test equipment such as loading and bandwidth effects. Other physical effects can be neglected from the netlist and considered in the post-processing algorithm. These include quantization effects from test equipment or a BIST scheme (A to D conversion) and sampling of waveforms. This approach is used in [Sebe93a,b]. Considering these effects in the post-processing stage of AFS, increases flexibility so that parameters can be altered and post-processing reapplied. For example, circuits do not have to be resimulated in order to take into account different quantization resolutions.

This test equipment model is similar to the approach described in [Caun95], where ATE is modelled at digital behavioural level but with electrical interface models for tester inputs and outputs, using a mixed-level simulator.

### 6.3 Process Tolerance

At a more “numerical” level, the post-processing function is used to determine the level of detectability of a fault under the conditions such as noise, tester resolution and process parameter deviations. To account for these, some form of tolerance envelope can be applied to define a region of acceptability for a given measurement. Devices with measurements falling inside the region of acceptability would be classified as fault-free by the test scheme. In its simplest implementation a fixed envelope around the fault-free circuit response may be used to define this region. Similarly, a threshold envelope may also be applied to the faulty circuit to model the range of possible values a circuit under a given fault condition could assume.

In [Jag079], a tolerance envelope is used, expressed as either fixed or as a percentage. A similar tolerance approach is used in [Caun95] with faults classed as always detectable, never detectable or uncertainly detectable depending on the overlap of the faulty and fault-free thresholds. In [Sebe93a,b] a 2-dimensional tolerance box is used to model time or frequency effects as well as voltage or current tolerance. Studies into structural testing have also used a fixed tolerance approach such as those presented in [Papa94] [Bell91] [Ecke93b]. In many publications, tolerance bounds used are not stated which does not allow the quality of structural test techniques to be compared.
The assumption of a fixed envelope around a circuit response is not necessarily valid. Firstly it is assumed that the effects of process parameter deviations will be fixed throughout a transient waveform, frequency response etc. and secondly that parameter deviations on a faulty circuit will be the same regardless of the fault. In practice these assumptions do not hold for the general case. It will be shown in this thesis that the magnitude of the effect of process deviations can vary significantly throughout a transient waveform for example, and that circuits under catastrophic fault conditions have differing sensitivities to process effects and thus different measurement spreads.

Attempts to improve this have focused on using Monte Carlo simulation to obtain test limits under process parameter deviations. A Monte Carlo simulation is used on the fault-free circuit in [Silv96] to obtain test limits. Using a Monte Carlo simulation on the fault-free circuit and the faulty circuit allows process parameters to be considered more accurately than simply using a fixed threshold and in particular allows structural test techniques to be compared. However, although the accuracy is increased, using Monte Carlo simulations greatly increases the fault simulation time.

Further to the analysis described above, the evaluation of some structural test techniques requires even more flexibility, e.g. investigations into IDDQ monitoring or BIST schemes which use a fixed reference level. These considerations along with the other process tolerance methods described here have been implemented in the ANTICS post-processing software and are described in the next section. Chapter 4 describes a probabilistic approach to test quality analysis based on a statistical simulation approach.

6.4 ANACOV

The inputs and outputs of ANACOV are shown in figure 3-7.
The operation of ANACOV is controlled using the analysis specification file. This file specifies which of several detection algorithms is to be used on which analysis type (e.g. transient, AC small-signal, DC sweep), and which HSPICE .print or .measure variable within that analysis. The different detection modes are described in section 6.5. The value of user-defined thresholds is also specified in this file. The strobe input file is used to select specific points on an output waveform at which analysis takes place. This may be any .print output analysis type and is not restricted to transient responses.

The main text output file contains fault coverage, detectability results and additional test statistics. Two other output files may also be generated, depending on the detection mode used. The output strobe file and histogram output files are HSPICE input files which, when simulated produce a graphic display of the strobe points selected and fault coverage graph respectively.

6.5 Detection Algorithms

Several detection algorithms have been incorporated into ANACOV to allow research into various analogue and mixed-signal testing techniques and investigation into AFS approaches. The algorithms compare faulty circuit responses to the regions of acceptability. The region of acceptability may be set by the fault-free circuit response (as in the fixed detection mode), defined in the coverage definition file (e.g. the threshold window used in the threshold and threshdata modes) or defined by Monte Carlo simulation (data and alldata modes). Depending on the detection mode and threshold envelope criteria, a particular sample point within a response may be regarded as detectable or undetectable. The number of detectable points required before a fault is classed as detectable can be set at any value depending on the confidence required in the results.

The detection algorithms implemented in ANACOV are described in the next sections. All detection modes are defined in terms of the upper and lower thresholds for faulty and fault-free circuits. The mathematical detection criteria based on these is given in section 6.7.

6.5.1 Fixed Mode

In the fixed detection algorithm a fixed envelope is applied around the fault-free response to define the region of acceptability, and also around each faulty circuit response. Sample points at which the two envelopes do not overlap are classed as detectable (see figure 3-8). This can be expressed mathematically as follows:

Let \( G[i] \) and \( F[i] \) be a set of samples from the good and faulty circuit responses respectively. The upper and lower bounds are defined using fixed thresholds, \( \delta G_L, \delta G_U, \delta F_L, \delta F_U \) as

\[
\begin{align*}
G_L[i] &= G[i] - \delta G_L \\
G_U[i] &= G[i] + \delta G_U \\
F_L[i] &= F[i] - \delta F_L \\
F_U[i] &= F[i] + \delta F_U
\end{align*}
\]

Lower bound, good response
Upper bound, good response
Lower bound, faulty response
Upper bound, faulty response

(3-1) (3-2) (3-3) (3-4)
6.5.2 Threshold Mode

The threshold detection mode uses two constant, fixed reference levels (GL, GU) to define the region of acceptability. Points from a faulty response lying outside of these levels are classed as detectable as shown in figure 3-9. A fixed envelope may be applied to faulty responses in the same manner as the fixed mode algorithm (equations (3-3),(3-4)).

Examples of typical applications of this detection mode are when evaluating the fault coverage of supply current measurements for I_{DDQ} quiescent supply current monitoring,
and any BIST scheme that uses fixed reference levels e.g. an on-line safety-critical system test.

6.5.3 Digital Mode

The digital detection mode was included to enable a digital ATE set-up to be modelled. When this detection mode is used, a decision is made as to whether the field under test is logic high (1), logic low (0) or floating (X), at a specific point. This digitisation is applied to both the nominal and faulted netlists. The test is thus purely Boolean. The thresholds for logic 1 and 0 are user definable. Sample points lying between the upper and lower logic thresholds (i.e. floating) for either the faulty or fault-free circuit are not used in the comparison. Realistically, this detection mode is only applicable to a transient analysis.

6.5.4 Data Mode

The detection modes discussed in the previous sections all used some form of fixed threshold or fixed reference levels to be applied during the post-processing detection. The assumption that these levels are fixed is not necessarily accurate enough. Therefore algorithms using Monte Carlo analysis have been developed in order to take into account process spread.

The data detection algorithm reads every response from the set of Monte Carlo simulation responses in the fault-free output file. For each sample point the maximum and minimum values are recorded and used to generate the region of acceptability. An additional fixed threshold may also be included to model effects other than process spread. Monte Carlo simulation is not used during fault simulation so that a single nominal response is used from each faulty circuit and simulation overhead is not dramatically increased. A fixed threshold envelope may also be applied to the faulty responses similar to the fixed detection mode.

Defining a set of $N$ fault-free Monte Carlo runs as $M_1, ..., M_N$ and fixed thresholds as $\delta G_L$ and $\delta G_U$, the fault-free upper and lower threshold envelopes are

$$G_L[i] = \min_{j=1}^{N} (M_j[i]) - \delta G_L$$

$$G_U[i] = \max_{j=1}^{N} (M_j[i]) + \delta G_U$$

Faulty thresholds $F_L[i]$ and $F_U[i]$ are defined as in equations (3-3) and (3-4).

6.5.5 Alldata Mode

Although the data detection algorithm considers the effect of process spread on the fault-free circuit, the possible spread in responses of the faulty circuits are only considered as fixed thresholds. To improve the fault simulation accuracy, in the alldata detection mode, Monte Carlo analysis is used within the fault simulation procedure to
generate an upper and lower envelope for each fault. Again, a fixed offset can also be applied to model other non-idealities.

\[ \text{Figure 3-10 - Alldata Mode} \]

The fault-free upper and lower threshold envelopes \( G_U[i] \) and \( G_L[i] \) are defined by equations (3-5) and (3-6).

Similarly for each faulty circuit, defining a set of \( N \) fault-free Monte Carlo runs as \( B_1, \ldots, B_N \) and fixed thresholds as \( \delta F_L \) and \( \delta F_U \), the fault-free upper and lower threshold envelopes are given in equations (3-7) and (3-8).

\[
F_L[i]=\min_{j=1}^{N}(B_j[i]) - \delta F_L \tag{3-7}
\]

\[
F_U[i]=\max_{j=1}^{N}(B_j[i]) + \delta F_U \tag{3-8}
\]

6.5.6 Thresholddata Mode

The **thresholddata** detection algorithm is similar to the **threshold** mode algorithm in that the region of acceptability is defined using two fixed reference levels. However, in the **thresholddata** algorithm, data-driven Monte Carlo responses of faulty circuits are used. The region of acceptability is defined as in section 6.5.2 using threshold levels \( G_L, G_U \). Faulty lower and upper threshold envelopes (\( F_L[i] \) and \( F_U[i] \)) are defined as in equations (3-7) and (3-8).
6.6 Measure Analysis

The detection modes presented above have been illustrated with respect to the .print analysis from HSPICE, that is, a series of output measurements at sample points are assumed. The .measure analysis from HSPICE produces a single measurement as its output which can also be analysed using ANACOV in a similar manner to the .print analysis. For the case of .measure analysis, the fixed and data detection modes are not available, however, the threshold mode can be used in place of these since the upper and lower limits $G_U$ and $G_L$ will be constant for the single measurement. The implementation of the alldata and threshdata detection modes for .measure analysis is described in chapter 4.

6.7 Fault Analysis and Detectability Measures

The previous descriptions of the detection algorithms in section 6.5 have defined envelope regions based on different detection criteria and simulation techniques. They all however, provide a description for a region of acceptability ($G_L[i]<y<f<U[i]$) and a faulty response range ($F_d[i]<y<f<U_f[i]$). Fault detectability is based on the separation of the two regions. A sample point is defined as detectable if the two regions are non-overlapping at that point. Moreover, for each sample point, a confidence measure $x[i]$ can be defined based on the distance between the faulty and fault-free envelopes since a larger distance implies that at a given sample point the circuit under a particular fault condition is more easily detectable.

$$x[i] = \begin{cases} 
G_L[i] - F_U[i] & \text{for } G_L[i] > F_U[i] \\
F_L[i] - G_U[i] & \text{for } F_L[i] > G_U[i] \\
0 & \text{otherwise}
\end{cases} \quad (3-9)$$

The number of detectable points $NP$ out of a total $d$ can be defined as

$$NP = \sum_{i=0}^{d-1} U(x[i]) \quad (3-10)$$

where $U(x) = \begin{cases} 1 & \text{if } x > 0 \\
0 & \text{otherwise}
\end{cases}$

A fault is classed as detectable if $NP > c$ where $c$ is a user defined cutoff value. Typically this value will be 1, but this can be increased if a higher confidence in the results is required. For each fault, the mean separation distance between the good and faulty thresholds for all detectable sample points can also be used as an additional confidence measure. The average distance confidence measure (ACM) is given by:

$$ACM = \frac{\sum_{i=0}^{d-1} x[i]}{NP} \quad (3-11)$$
These figures are used for `.measure` as well as `.print` analysis; for `.measure` analysis $d=1$.

7. Conclusions

In this chapter the ANTICS software, designed for the evaluation of structural test methods for analogue and mixed-signal circuits, has been introduced. ANTICS provides fault injection, repeated HSPICE simulation management and post-processing analysis. In particular, a number of detection modes have been developed including those based on Monte Carlo analysis which can be used for accurate test evaluation taking into account process parameter deviations. This provides an improvement over existing test evaluation methods which have tended to use fixed thresholds of arbitrary sizes. The capability of ANTICS to use the `.measure` output from HSPICE allows a wide range of possible measurements from a device to be evaluated.

The basic principles of test quality evaluation presented here are extended and enhanced in future chapters. In particular the test metrics defined in section 6.6 are used in the evaluation of the accuracy of techniques to reduce AFS time.
1. Introduction

Several fault simulation detection methods for test evaluation have been described in the previous chapter, all of which generate figures of fault coverage. Each method has advantages and disadvantages and may be more applicable to different stages in the test generation and evaluation procedure. Although some of the limitations in defining the limits with the fixed mode fault simulation detection algorithm are resolved using the alldata algorithm, by applying Monte Carlo simulation to the faulty circuits, there are still two problems remaining with fault analysis which have not yet been addressed.

1) The number of fault simulations required to set appropriate test limits and produce the faulty circuits response regions is not defined. The tolerance threshold is set at the current maximum and minimum worst case values from all Monte Carlo responses. Therefore the threshold envelope increases with the number of Monte Carlo simulations, although the rate of increase will decline with increasing number of Monte Carlo runs. For a faulty circuit Monte Carlo response, it is not clear what the confidence in the results will be, that is the confidence that an actual fault response will lie within the Monte Carlo simulation response region for a given number of Monte Carlo runs.

2) The case of “partial overlap” of acceptability and faulty response regions is ignored by the analysis described so far. Although the additional confidence measure of the separation of the envelopes is produced (see equation 3-11), cases where there is no envelope separation are all classed as undetectable even if there is only a partial envelope overlap. This results from the Boolean nature of the test evaluation figure since sample points (and hence the faults themselves) are only classed as either detectable or undetectable. In practice, a case of partial overlap may range from almost totally detectable to almost totally undetectable. Three cases to illustrate this are shown in figure 4-1 for a single sample point or measure result. $G_L$ and $G_U$ are the lower and upper bounds on the fault-free circuit response and $F_L$ and $F_U$ are the lower and upper bounds on the faulty circuit response.
Case a) is clearly detectable, that is there is no overlap between the region of acceptability and the fault region. In case b) there is a small area of overlap and using the previous detection algorithms, this case would be categorised as completely undetectable. However, if the response of an actual faulty circuit lay to the right of the overlapping region, in the region marked r, the measurement would correctly be failed since it would be outside the acceptance region. Therefore in this case the technique is overly pessimistic since the majority of times this fault occurs during manufacture it will be detectable with this measurement. Case c) is again a case of partial overlap, but here the fault is almost totally undetectable. Using the previous algorithms, this fault is classed in exactly the same way as case b).

Considering points 1) and 2), the main problem with the algorithms is that the figure of merit given to a fault during test evaluation is Boolean, that is either detectable or undetectable. In order to overcome the consequences and limitations of this approach, we can extend the test categorisation using the continuous domain. On a simplistic level, the level of detection could be defined according to the distance of partial overlap. This however assumes that the distribution of waveforms is equiprobable over all of the overlap region which will, in general, not be the case. Thus, although this may give an indication of the detectability, it is by no means an ideal figure.

A superior method is to consider the Monte Carlo simulation results as probability distributions, since they should represent the statistical distributions of measurements made on a set of ICs. Statistical approaches to analogue IC test are described in the literature. An approach for the optimal setting of test limits for DC testing is presented in [Wang94] based on obtaining faulty and fault-free probability distributions. In [Puen96], a statistical test technique is presented based on analysing the harmonic content of the supply current. In both cases, the quality of tests are presented as type I and II errors (see section 3). However, these are obtained using simulation of a number of circuits and recording the percentage number of good and faulty devices misclassified. The approach presented in this chapter differs from this in that the test quality is obtained by considering faulty circuit probability distributions.

Example distribution histograms obtained from sampling the dynamic supply current of the multiplier circuit at a single transient point are shown in figure 4-2. The faulty
response is from the multiplier circuit with a gate-source short fault on transistor XA61.M9. Two points should be noted from this graph: firstly that the process spread on the faulty circuit is greater than that of the fault-free circuit and secondly that the histograms overlap one another partially. Extrapolating these histograms produces continuous probability density functions and leads to the definition of the probability of detection for test evaluation.

Figure 4-2 - Histogram of Fault-free and Fault Distributions for 1 Sample Point of Dynamic Supply Current

2. Probability of Detection Definition

Using probability theory, we can extend the fault categorisation to the continuous domain and produce a figure for the probability of detection for a given fault based on the distributions obtained from a Monte Carlo fault simulation scheme.

Considering the case where only one measurement ($\phi$) is made, the responses for faulty and fault-free circuits will represent the distribution obtained from a set of ICs under given process parameter deviations. The distributions can be thought of as conditional probability distributions $p(\phi|G)$ and $p(\phi|F_x)$ for the fault free and fault $x$ ($x = 1..N$) cases respectively (for $N$ faults), see figure 4-3.
The discrimination function $g(\phi)$ is the function chosen which represents the pass/fail test limits used by the test equipment i.e. the region of acceptability. If a measurement lies outside this region then the device will be failed, otherwise the test will be passed. For the case in figure 4-3, the function $g(\phi)$ can be defined as

$$g(\phi) = \begin{cases} 1 & \text{(pass)} \quad \text{if } G_L < \phi < G_U \\ 0 & \text{(fail)} \quad \text{otherwise} \end{cases}$$

(4-1)

where $G_U$ and $G_L$ define the test limits. In general, test limits $G_L$ and $G_U$ and thus $g(\phi)$ will be based on the fault-free distribution $p(\phi|G)$ although they could be set to any appropriate value (see section 5). Note that the dark area under the fault free distribution curve corresponds to fault-free devices which will incorrectly be failed by the test and that the extent to which this happens is determined by the test limits $G_L$ and $G_U$.

Based on a single measurement, the probability of detection $P_{D_x}$ for a fault $x$ is equal to the shaded area under the $p(\phi|F_x)$ curve, that is the region in which the faulty response lies outside the region of acceptability. This can be defined in the general case as:

$$P_{D_x} = \int_{-\infty}^{\infty} p(\phi|F_x)(1 - g(\phi))d\phi$$

(4-2)

which for the case of fig. 4-3 is:

$$P_{D_x} = \int_{g_U}^{G_U} p(\phi|F_x)d\phi + \int_{G_U}^{G_L} p(\phi|F_x)d\phi$$

(4-3)

Thus, we have defined a new test metric $P_{D_x}$ based on the probability that a fault will be detected using a given measurement. This theory is true regardless of the distributions obtained and the discrimination function $g(\phi)$. If more than one measurement is made for fault $x$ then $P_{D_x}$ is taken as the maximum probability of all measurements.

### 3. Hypothesis Testing

The probabilistic approach to test evaluation can be further extended to provide figures of merit in addition to that of probability of detection using standard hypothesis theory. Hypothesis testing is a well known statistical method of testing a hypothesis to a given significance level based on observations.
In this case, the Null Hypothesis, $H_0$, is that the IC is fault-free and the alternative hypothesis, $H_1$, is that the IC is faulty. Based on the measured value $\phi$ and the test limit function $g(\phi)$, we choose to accept or reject the null hypotheses, leading to four possible outcomes, shown in table 4-1.

<table>
<thead>
<tr>
<th>Passed</th>
<th>Failed</th>
</tr>
</thead>
<tbody>
<tr>
<td>Good Circuit</td>
<td>✓</td>
</tr>
<tr>
<td>Faulty Circuit</td>
<td>Type II error</td>
</tr>
</tbody>
</table>

**Table 4-1 - Test Decision Outcomes**

**Type I error** : $H_0$ is true but is rejected.
This corresponds to the case where a good chip is failed. The probability of a type I error ($\alpha$) can be calculated as:

$$\alpha = \int_{-\infty}^{\infty} p(\phi|G)(1 - g(\phi))d\phi$$

(4-4)

This probability is shown as the dark shaded area under the fault free distribution in figure 4-3. For this case the type I error probability is:

$$\alpha = \int_{-\infty}^{G_L} p(\phi|G)d\phi + \int_{G_U}^{\infty} p(\phi|G)\phi d\phi$$

(4-5)

**Type II error** : $H_0$ is false but is accepted.
This corresponds to the case when a faulty chip is passed. The probability associated with this ($\beta_x$) is

$$\beta_x = \int_{-\infty}^{\infty} p(\phi|F_x)g(\phi)d\phi$$

(4-6)

This probability is illustrated in figure 4-3 as the unshaded area under the fault distribution. Based on the test limits $G_L$ and $G_U$ in figure 4-3, the type II probability can be calculated as:

$$\beta_x = \int_{G_L}^{G_U} p(\phi|F_x)d\phi$$

(4-7)

From equation (4-4), it is evident that the probability of a type I error ($\alpha$) is independent of the fault distributions and depends only on the probability distribution of the fault-free circuit and the discrimination function $g(\phi)$. Examining equations (4-2) and (4-6) reveals that $\beta_x$ is simply $1-PD_x$ since $PD_x$ is the probability that fault $x$ will be failed and $\beta_x$ is the probability that it will be passed. Therefore, $\beta_x$ is not an additional confidence measure in the results.
Chapter 4 - Probabilistic Fault Simulation

4. Goodness of Fit Test

The theory presented in sections 2 and 3 is true, regardless of the probability density functions of the faulty distributions. However, in order to calculate the probability of detection for a given fault, we need an estimate of the probability density functions of both the faulty and fault free circuits using the Monte Carlo simulation responses. It is expected in general that the distribution PDF will follow the Normal distribution. The PDF of the Normal distribution is given by:

\[ f(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]  

(4-8)

where \( \mu \) and \( \sigma \) are the mean and standard deviation respectively.

In order to test that the distribution obtained from Monte Carlo simulation fits a hypothesised distribution, it is possible to use a “goodness of fit test”. Several such tests have been developed including the chi-squared test and the Kolmogorov-Smirnov (KS) test. The KS test is generally preferred over the chi-squared test for small sample sizes and continuous data (see [Cono71]) and is the one chosen for use here.

4.1 The Kolmogorov-Smirnov Goodness of Fit Test

The two-sided KS test procedure is as follows (from [Cono71]):

Let \( S(x) \) be the cumulative distribution function based on a set of random samples \( A_1 \ldots A_n \) taken from distribution \( F(x) \). (i.e. the values obtained from the Monte Carlo simulation).

Let \( F^*(x) \) be the hypothesized distribution function (i.e. a Normal distribution with mean and sigma obtained from \( A_1 \ldots A_n \)).

The two sided KS test statistic \( T_1 \) is defined as the greatest distance between \( S(x) \) and \( F^*(x) \). That is:

\[ T_1 = \max_x |F^*(x) - S(x)| \]  

(4-9)

This is shown graphically in figure 4-4. The hypothesis that samples \( A_1 \ldots A_n \) are sampled from a function \( F(x) = F^*(x) \) is rejected if \( T_1 \) is greater than the tabulated KS Test Statistic value at the appropriate significance level. Moreover, the test statistic \( T_1 \) can be used as a confidence level in the distribution function.
The significance level of the test governs the level of "Normality" that must be achieved for the distribution to be classed as Normal. The threshold to which this level is set is a compromise between accepting distributions which are not Normal (type II error) and rejecting distributions which are in fact Normal. A value commonly used in statistical analysis is to test to the 95% significance level, which is used throughout this thesis. Whilst the acceptance of the hypothesis does not prove that the data follows the hypothesized distribution, it indicates that it is not an unsuitable approximation to use.

5. Setting a Suitable Test Limit Function

As discussed in section 2, the discrimination function $g(\phi)$ chosen to describe the region of acceptability determines the probability of a type I error ($\alpha$), also referred to as the test significance. A value of $\alpha$ which is too large will result in an increased probability of detection figure for faults but will cause more fault-free circuits to fail and hence result in unnecessary yield loss. However, if the limits of $g(\phi)$ are too great then the test will be less effective and produce lower values of probability of detection. A suitable test limit is therefore a trade off between type I and II errors.

An approach for the optimal setting of test limits for DC testing is presented in [Wang94]. Test limits are initially set based on faulty and fault-free probability distributions (obtained using statistical simulation) and the a-priori probabilities of occurrence of faulty and fault-free devices. These are then adaptively updated during testing to obtain optimal limits based on the actual failures encountered. However, this approach requires information such as test yield, and an a-priori estimate of the probability of occurrence of every possible fault. Furthermore, the limits are also refined during actual device testing.

In this thesis, the test limits are set to the 3σ points of the fault-free distribution. Thus the probability of a type I error is low, at $\alpha=0.0026$ and very few fault-free circuits are misclassified. The reasoning behind this is that the structural test techniques which are being considered will generally be used as initial tests to eliminate as many faulty devices as possible before a final functional test and eliminating fault-free devices at this stage is undesirable.
6. Incorporation of Probabilistic Test Methods into ANACOV

In order to allow the probabilistic evaluation of circuits, the probability of detection test metric has been incorporated into the ANACOV fault simulation software. The figure of probability of detection is available for any type of HSPICE .measure analysis and standard .print analysis at a set of defined strobe points on a dc, ac or transient waveform. The probabilistic test algorithms are automatically invoked if the alldata detection mode is selected with HSPICE .measure analysis or using the "measpoint" command in the ANACOV analysis definition input file to select a specific strobe point on a .print output. Upper and lower limits for the region of acceptability (OL and OI) are either defined manually by the user in the analysis definition file or calculated by default as the upper and lower 3σ points of the fault-free circuit response.

The KS test described in section 4 is always performed prior to the calculation of probability of detection using equation (4-9), assuming a Normal distribution (equation (4-8)). The pass-fail outcome of the KS test and the associated distance confidence measure given in equation (4-9) are available. For a successful KS test, the probability of detection is then calculated using equation (4-3). If more than one analysis is performed e.g. more than one strobe point or measure analysis then the maximum probability from all tests for each fault is also available as an output. The integration of the Normal PDF is obtained using linear interpolation of a lookup table to reduce processing time.

Since the above procedure assigns a figure of probability of detection to each fault, the traditional test metric of fault coverage is no longer applicable. The equivalent figure of merit for a test is average probability of detection (APD), which is the mean of the probabilities of detection for all faults, defined in equation (4-10), where N is the number of faults.

\[ \text{APD} = \frac{\sum_{x=1}^{N} \text{PD}_x}{N} \]  

This is still on the same scale as fault coverage (0 to 100%) and the results can be interpreted in the same manner. This arises from the fact that fault coverage can be thought of as the average probability of detection where the probability of detection is Boolean.

7. How Many Monte Carlo Runs are Sufficient?

The principle of the probabilistic fault simulation approach is to obtain an estimate of the PDF of a circuit response using Monte Carlo simulation. The distribution variables estimated from simulation will differ from those obtained in practice and the question arises as to the number of Monte Carlo simulation runs required to obtain suitable output distribution estimates and hence estimates of the probability of detection figure. As the number of Monte Carlo simulations increases the return in accuracy diminishes and any additional simulations become less profitable and eventually unnecessary. Equally, too few simulation runs result in an unacceptable loss in accuracy and may produce misleading results. Therefore it is desirable to examine the convergence of
distribution parameters and the probability of detection. This is presented in this section, along with a comparison of results using a “minimum” of Monte Carlo runs and those which use an order of magnitude more as a benchmark.

7.1 Experimental Work

The circuit used in the investigation was an analogue multiplier cell, part of a 3 micron analogue cell library. The circuit is described in Appendix A, section 2, using process parameters in Appendix B, section 2.

Supply current monitoring was used as the test technique under investigation using a 6.5μs piecewise linear test input on inputs VX and VY shown in figure 4-5. The resulting transient supply current was sampled at 6 points on the waveform. Monte Carlo simulations were conducted by varying the SPICE level 2 MOSFET parameters VTO, TOX, UO, LD and polysilicon resistance according to manufacturing process information (Appendix B, section 2). 300 Monte Carlo simulation runs were used to generate the fault-free region of acceptability at each sample point based on the 3σ points.

![Figure 4-5 - Test Input and Supply Current of Fault-Free Multiplier Circuit](image)

The catastrophic fault model used in this investigation consisted of a gate-drain short, a gate-source short, a drain open and a source open fault on each MOS transistor. Shorts were modelled using a 1Ω resistor and opens with a 100MΩ resistor in parallel with a 1fF capacitor. The fault model is shown in figure 4-6. A total of 199 faults were considered.
The graphs in figures 4-7 and 4-8 show the convergence of the mean and sigma of the supply current sampled at sample point (1) on the fault-free response. As expected, the convergence of the mean is much faster than that of the standard deviation (note the break in the axis of the graph of the mean). Figure 4-9 shows the convergence in the probability of detection figure of a gate-source short fault on transistor M9 in the current reference subcircuit XA61 for 3 sample points on the supply current waveform. 1000 Monte Carlo runs were completed in total to act as a benchmark for reduced numbers of simulations.
Figure 4-8 - The Convergence of the Standard Deviation of Sample Point 1 from the Fault-free Circuit for IDDD Supply Current Monitoring

Figure 4-9 - Convergence of Probability of Detection for a Gate-Source Short Fault on XA61.M9 for IDDD Supply Current Monitoring at 3 Sample Points
Comparison results were also obtained for all faults using a Monte Carlo fault simulation this time including a specification test (offset voltage, gain and non-linear distortion) described in appendix A, section 2.2.1.3 and an RMS supply current test. The test input for the RMS supply current test was a 3MHz sinusoid with offset -3.75V and amplitude 0.25V on VX and -3.75V DC on input VY. The RMS of the supply current after 10 cycles was obtained using the HSPICE .measure command.

Initially 30 Monte Carlo simulation runs were used on each faulty circuit, and faults classified as undetectable (PDx<0.5%), partially detectable (0.5%<PDx<99.5%) or detectable (PDx>99.5%). For the specification and RMS supply current tests, faults classed as partially detectable were resimulated using 300 Monte Carlo simulations. For the transient supply current, faults with 1%<PDx<99% were resimulated since there were too many partially detectable faults. In both cases the probability of detection results were compared and the percentage error between 30 runs and 300 runs for each test calculated. Partially detectable faults were chosen under the hypothesis that they were most likely to have the greatest error in probability of detection figure. Faults which failed the KS test were ignored. The results of the comparison are shown in table 4-2.

<table>
<thead>
<tr>
<th></th>
<th>Maximum Error</th>
<th>Average Error</th>
</tr>
</thead>
<tbody>
<tr>
<td>Transient Supply</td>
<td>8.4%</td>
<td>1.6%</td>
</tr>
<tr>
<td>RMS</td>
<td>5.8%</td>
<td>1.8%</td>
</tr>
<tr>
<td>Specification</td>
<td>16.5%</td>
<td>0.6%</td>
</tr>
</tbody>
</table>

Table 4-2

7.2 Discussion of Results

The results show a low average error in probability of detection figures using 30 Monte Carlo simulation runs compared with using 300. For this investigation, the average error was within 2% and the maximum was 16.5%. These figures suggest that using 30 Monte Carlo simulations to obtain figures of probability of detection is a suitable number for the accuracy required. Even though a small increase in accuracy may be obtained using more simulations, this is offset by factors such as the precision of the HSPICE simulator, the accuracy to which the process parameters are known and the accuracy of the distribution approximation. No major improvement in simulation accuracy seems to be obtained when using more simulations and the returns in accuracy quickly diminish as shown in figure 4-9.

8. Conclusions

A probabilistic approach to test evaluation and analysis using Monte Carlo fault simulation has been presented in this chapter. This overcomes some of the limitations in approaches where a Boolean pass-fail detection figure is used, such as the case of partially overlapping waveforms - an example of which has been illustrated. The traditional figure of fault coverage is replaced by the average probability of detection.
The main drawback to the probabilistic approach is that a distribution must be fitted to the Monte Carlo simulation output response. Whilst the Normal distribution is usually a good approximation, certain cases of non-linear behaviour under fault conditions cause non-Normal outputs, the KS test is failed and the probability of detection metric cannot be calculated. In these cases, an alternative such as a detection mode described in chapter 3 is required.

For the limited example considered here (since the same circuit, process parameters and device models were used) 30 Monte Carlo simulations seems to be a reasonable number for this work, considering other inaccuracies in the fault simulation process. One possible improvement would be to examine the convergence of the probability of detection during the simulation of a fault, which would then be stopped once a convergence to certain limits had been reached. However, it would not be possible to implement this within the current ANTICS framework since there is insufficient control over the HSPICE simulator.
Chapter 5
Techniques for the Reduction of Fault Simulation Time

1. Introduction

In the previous sections it has been shown that more than one fault simulation technique can be used to generate a variety of test metrics. It is apparent that a Monte Carlo fault simulation approach produces the most accurate test quality metric. However, for a large circuit, this will require a prohibitively large simulation time. A circuit with \( n \) faults and \( m \) Monte Carlo simulations, requires \( m(n+1) \) simulations in total for a “brute force” approach. Monte Carlo simulations can be avoided using either fixed or data mode analysis within ANACOV but these modes are less accurate. A summary of three detection modes described so far is given in table 5-1. Considering this, it is desirable that alternative fault simulation approaches are developed and investigated which combine the accuracy of a Monte Carlo-based fault simulation with a reduced simulation time. The aim here is to trade off a small amount of simulation accuracy for a large decrease in simulation time.

<table>
<thead>
<tr>
<th>Accuracy</th>
<th>Method</th>
<th>Number of Simulations</th>
<th>Comment</th>
</tr>
</thead>
<tbody>
<tr>
<td>Highest</td>
<td>Probabilistic (chapter 4)</td>
<td>((1+n)\times m)</td>
<td>More post-processing required than full Monte Carlo method. May not be possible if faults fail KS test.</td>
</tr>
<tr>
<td></td>
<td>Full Monte Carlo (alldata algorithm)</td>
<td>((1+n)\times m)</td>
<td>Partial overlap of faulty and fault-free waveforms is not considered. Requires a similar number of simulations as probabilistic method.</td>
</tr>
<tr>
<td>Lowest</td>
<td>Fixed faulty threshold (data algorithm)</td>
<td>(m+n)</td>
<td>Assumes a fixed faulty envelope with the same magnitude as the fault-free circuit.</td>
</tr>
</tbody>
</table>

\( n = \text{number of faults, } m = \text{number of Monte Carlo simulations} \)

Table 5-1 - Summary of Three Fault Simulation Approaches

Much of the literature on reducing the fault simulation time has centred on using hierarchical fault modeling to obtain higher level simulation fault models, such as [Meix91] [Nagi92] [Pan96] [ZwoI96]. Two novel methods of reducing fault simulation time are investigated in this chapter. They differ from approaches presented in the literature because the aim here is to reduce the number of Monte Carlo simulations required whilst maintaining accuracy rather than reduce circuit level simulation time.
The first technique proposed uses Monte Carlo analysis on the fault-free circuit to derive best and worst case parameter sets for fault simulation. This approach is described in section 2. The second, Hybrid fault simulation, uses a single-run initial fault simulation for test selection and to reject clearly detectable or undetectable faults, followed by Monte Carlo analysis to obtain the probability of detection of faults. This work is described in section 3.

2. Best/Worst Case Approximation

2.1 Introduction

The basic principle of the best/worst case fault simulation approach is to obtain the best/worst case parameter sets of the fault-free circuit and use these to generate the best-worst case results for each faulty circuit. One application of this principle to the fault dictionary construction of linear circuits using AC analysis is presented in [Pahw82]. Sensitivity analysis is performed on the nominal circuit and the worst case high and low values are computed. Fault analysis is then performed on the two extreme conditions, which are assumed to also produce the worst cases under fault conditions. Although this approach may be efficient in terms of simulation time, the range of circuits and analyses to which it may be applied is limited.

This section presents a similar approach, except that Monte Carlo analysis is used in place of sensitivity analysis, the simulation is a transient response, and the circuit need not be linear. Monte Carlo simulations on the faulty circuits are avoided by extracting the best/worst case input parameter sets from the fault-free Monte Carlo simulation and using these to obtain the best/worst case values for the faults. The assumption made here is that the best and worst case deviations of the faulty circuits will occur with the same parameter sets as those of the fault-free circuit. More than two sets of input parameters may be required if more than one output parameter is to be measured. For example if both output voltage and supply current are measured then process parameter sets producing the largest deviation in output voltage may not necessarily be those producing the worst cases for supply current.

It should be noted that the upper and lower bounds generated using this technique are not suitable for probabilistic fault simulation, only fault simulations using the maximum and minimum Monte Carlo response values as threshold envelopes.

2.2 Method

The method of deriving the upper and lower limits for the faulty and fault-free circuit responses is as follows:

1) Perform Monte Carlo simulation on the fault-free circuit using process parameter deviations or component deviations. Produce responses for variables of interest e.g. output voltage and supply current.

2) Extract the process parameter sets producing the worst case highest and lowest responses. In general no two parameter sets produce the maximum and
minimum deviations for all sample points, so the two parameter sets producing the most number of sample points with maximum and minimum deviations are chosen. This may be expressed mathematically as follows:

Define a set of N fault-free Monte Carlo runs with S points as:

\[ M_1, \ldots, M_N \]

The fault-free lower and upper threshold envelopes \( G_L[i] \), \( G_U[i] \) are then defined by:

\[
G_L[i] = \min_{j=1}^{N}(M_j[i]) \tag{5-1}
\]
\[
G_U[i] = \max_{j=1}^{N}(M_j[i]) \tag{5-2}
\]

Let:

\[
\text{minparam}[i] = j \text{ such that } M_j[i] = \min_{j=1}^{N}(M_j[i]) \tag{5-3}
\]
\[
\text{maxparam}[i] = j \text{ such that } M_j[i] = \max_{j=1}^{N}(M_j[i]) \tag{5-4}
\]

Then the algorithm used for the selection of best/worst case parameter sets is:

```plaintext
set all counter[1..N] = 0;
for each i where i = 0 to S-1
{
    increment counter[minparam[i]];
    increment counter[maxparam[i]];
}
select the 2 values of j which give
max 2 values of count[j] where j=1..N;
```

This selection algorithm has been incorporated into the ANACOV software so that the upper and lower parameter sets are given automatically. Visual inspection of the response of the selected parameter sets may also be required in order to verify that they are close to the worst cases of the full Monte Carlo simulations for the fault-free circuit.

3) Perform fault simulation using the process parameter deviation sets selected from 2). The responses obtained are assumed to be the upper and lower bounds on the faulty responses.

4) Use ANACOV software to produce figures of fault coverage. The alldata detection mode is used with responses obtained in 3) for the faulty circuits and the full Monte Carlo simulation for the fault-free circuit threshold since this is available from stage 1).
2.3 Simulation Results

In order to obtain an indication of the efficiency and accuracy of the fault simulation procedure, the above algorithm was applied to the fault simulation of an opamp in both closed and open loop configurations and the multiplier circuit. The fault coverage obtained using the method presented above was compared with that obtained using full Monte Carlo Simulation with 30 simulation runs on both the faulty and fault-free circuits which was taken as the benchmark.

The circuit analysis performed was a 1200 point transient response on the unbuffered opamp circuit shown in appendix A, section 1. Both closed loop and open loop configurations (shown in appendix A, figures A-2 and A-3) were investigated. A 435 point transient analysis was used on simulating the analogue multiplier circuit described in appendix A, section 2. Process parameter deviation figures are described in appendix B, section 2 for the opamp circuits and appendix B, section 1 for the multiplier. For all circuits, both supply current and output voltage were used as the measured variables. It was found that the best and worst parameter sets were identical for output voltage and supply current for the two opamp circuits. For the multiplier, however, these parameter sets differed and two parameter sets were required for each output variable.

The inputs to the open and closed loop opamps were transient pulses of 5μs and 3μs respectively. Input and output voltages and supply current for the fault-free circuits are shown below in figures 5-1 and 5-2. The dark traces indicate the best/worst case parameter set results. For the multiplier circuit, a waveform which consisted of transients on each input was used whilst keeping the other input at a fixed value, (see figure 4-5).

![Figure 5-1 - Open Loop Opamp: Input, Output Voltage and Supply Current](image-url)
The MOSFET catastrophic fault model shown in figure 4-6 was used for fault simulation. A fault was classed as detectable if one or more sample points in the transient response were detectable.

### 2.4 Results

For the opamp circuit in open and closed loop configuration, no faults were classified differently between the full Monte Carlo approach and the upper and lower bound approach. For the multiplier circuit, 6 out of 235 possible faults were misclassified with output voltage as the measured variable, and 1 fault was misclassified using the supply current.

![Figure 5-2 - Closed-Loop Opamp: Input, Output Voltage and Supply Current](image1)

An example of a fault detectable in the multiplier circuit using the supply current is shown in figure 5-3. The upper waveform is the fault-free circuit transient current response. The darker 4 traces on this waveform are the responses using the worst case
parameter sets. The lower trace is the circuit response under a short fault condition with the dark traces the responses using the 4 worst cases parameter sets. In this case they correspond well with the worst cases of the full Monte Carlo simulation, forming an almost identical tolerance envelope region.

The accuracy of the technique may also be obtained by comparing the number of sample points detectable for all faults and the average distance confidence measure for all faults. These are available from ANACOV and are described in chapter 3, section 6.7.

![Figure 5-4 - Percentage Misclassification of Sample Points](image)

Figure 5-4 shows the average number of sample points incorrectly classified (average error in NP from equation 3-10) as a percentage of the total number of sample points. Figure 5-5 shows the average percentage error in the average distance confidence measure (ACM from equation 3-11) for faults which were correctly classified.

![Figure 5-5 - Percentage Error in Average Distance Confidence Measure](image)

Although the figure is low when averaged over all faults, for individual faults the percentage difference was very large in some cases: maxima of 130% for supply current
and 240% for output voltage. These figures can be explained by examination of Monte Carlo simulations of faults where it is found that some circuits under fault conditions exhibit a greatly increased sensitivity to process parameters and thus the percentage distance errors are increased.

2.5 Discussion of Results

The results show that for the limited range of circuits, process deviations and measurements considered, the proposed technique produces similar results to using a full Monte Carlo simulation on the faulty and fault-free circuits. One possible way of improving accuracy would be to increase the number of parameter sets used in the fault simulation from 2. However, the low percentage of fault misclassification errors obtained suggests the returns in accuracy would quickly diminish so as to make it unworthwhile.

All misclassified faults in this example were undetectable faults incorrectly classified as detectable. The converse (detectable faults misclassified as undetectable) is not possible since using best and worst cases can only serve to reduce the faulty threshold envelope and never increase it. Therefore the results are suitable as an upper bound on the fault coverage and this approach can also be used to quickly eliminate undetectable faults from a fault list. However, this would not be applicable to eliminating clearly undetectable faults from a probabilistic fault simulation because if the faulty circuit spread was actually greater than that given by the best/worst case approach then the probability of detection would increase and the fault could be incorrectly classified as totally undetectable.

3. Hybrid Fault Simulation

3.1 Concept of Hybrid Fault Simulation

In section 1 several different AFS detection methods were summarized with varying accuracy and simulation time requirements. It is evident that an initial “rough” simulation could be used early in the test stage to select tests which are likely to detect faults, whilst more accurate simulations would be required to verify these later in the fault simulation process. Observations of previous simulation results, show that it is often the case that many faults have probabilities of detection either very close to 0%, or very close to 100%, i.e. almost totally undetectable or totally detectable. Two approaches to increasing the efficiency of analogue fault simulation based on these premises are discussed below.

A) Use a single response fault simulation figure of fault coverage initially during the test pattern generation and test selection stage (i.e. with data ANACOV mode). This stage may require extensive fault simulation, however the results accuracy is not critical and it is an example of a case where absolute simulation precision can be traded for speed. Probabilistic test evaluation can be used after tests have been selected. Hence Monte Carlo analysis is only used once per fault, with the test which is most likely to detect it.
B) Use a single response fault simulation initially to "drop" faults which are clearly detectable due to gross errors and only perform Monte Carlo fault simulation on "marginal" faults. It is generally the case that certain catastrophic faults will be clearly detectable regardless of process deviation effects, e.g. output stuck-at faults and shorts between supply lines for supply current monitoring. Similarly some faults are totally undetectable (i.e. for a given test the faulty and fault-free responses are identical) due to circuit redundancy. Hence, these can also be eliminated from the Monte Carlo simulation. In order to correctly drop faults, a threshold envelope must be used which is much larger or much smaller than the fault-free process spread, for clearly detectable and clearly undetectable faults respectively. This ensures that faults are not incorrectly classified and should be pessimistic, however, it is clear that the exact values remain a trade-off between accuracy and reduced simulation time.

The fundamental question is therefore what threshold envelopes are suitable for B) and what test selection metric should be used for A). At this stage in the simulation, it is assumed that a Monte Carlo simulation has been performed on the fault-free circuit and therefore the mean $\mu_G$ and standard deviation $\sigma_G$ are known for each sample point or measured variable for each test. Therefore, the best estimate of the process spread of a faulty circuit, for the corresponding test, is the same as that of the fault-free circuit, $\sigma_G$.

Both A) and B) above can be considered simultaneously using the DIST test metric defined in equation (5-5). This is based on the separation of the nominal circuit response with respect to the standard deviation of the fault-free circuit.

$$\text{DIST}_{i,j} = \frac{|\mu_{F_j} - \mu_G|}{\sigma_G}$$  (5-5)

where $\mu_G$ and $\sigma_G$ are the mean and standard deviation of the fault-free circuit and $\mu_{F_j}$ is the nominal response of fault $j$. If more than one test measurement is made, for example points on a transient waveform, then the DIST metric is the maximum of these:

$$\text{DIST}_{i,j} = \max_{k=1}^{p} \left( \frac{|\mu_{F_j} - \mu_G|}{\sigma_G} \right)$$  (5-6)

where $P$ is the total number of test measurements made.

The DIST test metric gives an indication of the relative separation of the faulty and fault-free measurements and can thus be used for the test selection criteria. It can also be used to reject clearly detectable or undetectable faults. Assuming a set of possible test inputs have been established the next task is that of test selection and evaluation. This can be achieved using the hybrid fault simulation algorithm described in the next section.

3.2 Hybrid Fault Simulation Algorithm

Using a single fault simulation run for each stimulus from the set, the best stimulus for detection of each fault is obtained using the DIST metric. If the DIST metric is an order
of magnitude greater than the fault free spread used for the region of acceptability then
the fault is clearly detectable and is dropped from the fault list. For example if $3\sigma$ is
used to generate the region of acceptability then a DIST value of 30 is used. Totally
undetectable faults are marked as undetectable and dropped. For the remaining faults,
the best test from the single response simulation (the stimulus with the largest DIST
value) is selected and a Monte Carlo analysis is performed on the faulty circuit. The
distribution of the results is compared to the Normal distribution using the KS test. If
the faulty probability distribution is indeed Normal then probability of detection can be
calculated using equation (4-3), if not, the next best test is tried. Throughout the
algorithm, if more than one test is performed for each stimulus then the maximum
detection figure is always used. The whole algorithm is presented formally below:

BEGIN
Obtain initial set of $n$ best input stimuli
DO FOR each stimulus $i = 1..n$ WHILE undropped faults exist
{
Perform Monte Carlo simulation on the fault-free circuit with stimulus $i$ to get upper
and lower limits
DO FOR each fault $j = 1..N$ where $j$ not dropped
{
Do single simulation with stimulus $i$
Classify fault $j$ according to $DIST_{ij}$ metric:
IF totally detectable ** strike from fault list
{
Drop fault $j$ from fault list
Set $PD_j = 100$
Set $STIM_j = i$
Continue
ELSE
Record $DIST_{ij}$ distance metric
}
}
DO FOR each fault $j = 1..N$ where $j$ not dropped
{ REPEAT
{
Find $i$ such that $DIST_{ij} = \max(DIST_{ij})$  ** find best test stimulus
IF $DIST_{ij} \neq 0.0$
{
Do Monte Carlo simulation using stimulus $i$ and record $PD_j$
Set $STIM_j = i$
Set $DIST_{ij} = 0.0$
}
}
WHILE KS test failed and $\max(j, DIST_{ij}) \neq 0.0$
}
END
OUTPUT $PD_j, STIM_j$ for fault $j = 1..N$

Where:
$N = $ Number of faults on fault list
$n = $ Number of input stimuli
$PD_j = 0\%$ (Probability of detection for fault $j = 1..N$)
$STIM_j = 0$ (Best stimulus for fault $j = 1..N$)
$DIST_{ij} = 0.0$ (Distance metric for fault $j = 1..N$, stimulus $i = 1..n$)
The algorithm presented is applicable to any test technique where the region of acceptability is defined using the fault-free circuit. It is possible for faults to be misclassified if the standard deviation of the faulty circuits is much different to that of the fault-free circuit. The optimum setting of the DIST metric to indicate clearly detectable or undetectable faults is a trade off between reduction of simulation time (number of dropped faults) and accuracy (number of dropped faults which were neither detectable or undetectable). Note that only clearly detectable faults are dropped initially since all stimuli must produce responses which are undetectable for the fault to be classed as undetectable. Undetectable faults are dropped in the second part of the algorithm since Monte Carlo simulation is only performed if the DIST measure is greater than 0.

In this case, test selection occurs due to the fact that only the test stimulus with the highest DIST value is selected for each fault. Therefore this algorithm is optimised so that the highest probability of detection will be obtained since even if a stimulus is the best for only one fault it is still included. It would be possible to reduce the stimuli set after the first stage based on the DIST value. The second part of the algorithm would then be performed on the reduced stimuli set.

### 3.3 Application of Hybrid Fault Simulation

In order to test the efficacy of the hybrid fault simulation algorithm, it was applied to the problem of test selection and evaluation for RMS supply current monitoring of the analogue multiplier described in Appendix A, section 2. The choice of input stimulus was limited to finding the best frequency and offset of a single input sinusoid.

Inputs were applied to VXP (sinusoidal input stimulus with a DC offset) and VYP (the same DC offset only). The AC RMS supply current was obtained using a behavioural voltage source to convert the total AC power supply to a voltage which was measured using the HSPICE .measure command.

Monte Carlo simulations were conducted based on the process information given in Appendix B, section 2. 30 Monte Carlo simulation runs were used to generate the statistical parameters of mean and standard deviation. The region of acceptability was set between the 3σ points of the fault-free response for each case.

The catastrophic fault model used in this investigation was that shown in figure 4-6. A total of 199 faults were considered.

An initial set of input stimuli (frequency and offset pairs) was obtained using weighted AC sensitivity analysis described in [8]. The 6 best stimuli are shown in table 5-2.
Frequency Offset Voltage
1 100Hz -3.75V
2 10MHz -3.75V
3 1.8MHz -3.75V
4 3MHz -3.75V
5 500KHz -3.75V
6 5.6MHz -3.75V

Table 5-2 - Initial Stimulus Set from Sensitivity Analysis

The algorithm in section 6.2 was applied to obtain the probability of detection of each fault. After the initial simulation, it was possible to eliminate 11 clearly detectable faults and 17 clearly undetectable faults. The threshold used for totally detectable faults was that the DIST value was greater than 30; i.e. the distance between the fault-free mean value and the nominal faulty value was more than 30σ, which is an order of magnitude if we consider 3σ to be the fault-free threshold limit. 8 faults at this stage did not converge. The remaining faults were then simulated using the Monte Carlo scheme with the best stimulus. 6 additional Monte Carlo simulations were required for faults that failed the KS test for particular stimuli. A summary of the number of simulation runs compared with a “brute force” method is presented in tables 5-3 and 5-4, showing a considerable saving in simulation time.

<table>
<thead>
<tr>
<th>Simulation Stage</th>
<th>Breakdown</th>
<th>Total</th>
</tr>
</thead>
<tbody>
<tr>
<td>Good Monte Carlo simulation</td>
<td>Stimuli x Monte Carlo runs</td>
<td>180</td>
</tr>
<tr>
<td></td>
<td>6 x 30</td>
<td></td>
</tr>
<tr>
<td>Faulty Monte Carlo simulation</td>
<td>Stimuli x Monte Carlo runs x faults</td>
<td>35820</td>
</tr>
<tr>
<td></td>
<td>6 x 30 x 199</td>
<td></td>
</tr>
<tr>
<td>Total</td>
<td></td>
<td>36000</td>
</tr>
</tbody>
</table>

Table 5-3 - “Brute Force” Method

<table>
<thead>
<tr>
<th>Simulation Stage</th>
<th>Breakdown</th>
<th>Total</th>
</tr>
</thead>
<tbody>
<tr>
<td>Good Monte Carlo simulation</td>
<td>Stimuli x Monte Carlo runs</td>
<td>180</td>
</tr>
<tr>
<td></td>
<td>6 x 30</td>
<td></td>
</tr>
<tr>
<td>Initial single simulation</td>
<td>Stimuli x faults - stimuli for faults which are already detectable</td>
<td>1158</td>
</tr>
<tr>
<td></td>
<td>6 x 199 - 36</td>
<td></td>
</tr>
<tr>
<td>Faulty Monte Carlo simulation</td>
<td>(Faults - eliminated faults) x Monte Carlo runs</td>
<td>4890</td>
</tr>
<tr>
<td></td>
<td>(199 - (11+17+8)) x 30</td>
<td></td>
</tr>
<tr>
<td>Extra Monte Carlo simulations</td>
<td>Faults which failed KS test x Monte Carlo runs</td>
<td>180</td>
</tr>
<tr>
<td></td>
<td>6 x 30</td>
<td></td>
</tr>
<tr>
<td>Total</td>
<td></td>
<td>6408</td>
</tr>
</tbody>
</table>

Table 5-4 - Hybrid Test Algorithm
Fault detection results are presented in chapter 7, section 2, where this test is used as part of a comparison of 3 structural tests for this circuit.

### 3.3.1 Investigation into the DIST Metric

It is of particular interest to investigate a suitable value for the DIST test metric which can be used to eliminate clearly detectable faults whilst not incorrectly classifying undetectable faults. The problem here is that the process parameter spread effect on each faulty circuit is not known at the stage in the algorithm where clearly detectable faults are eliminated, and the cutoff value must be set at a high value. The graph in figure 5-6 shows the number of incorrectly classified and correctly eliminated faults as a function of the DIST cutoff value. Clearly if the separation distance is reduced then the number of faults dropped increases, which would lead to savings in simulation time, but at the expense of an increased number of incorrectly classified faults. The percentage error in the average probability of detection figure is also plotted because it indicates the effect of the overall error caused by the incorrectly classified faults.

![Figure 5-6 - Fault Classification and Percentage Probability of Detection Error](image)

### 3.4 Discussion of Results

From an examination of tables 5-3 and 5-4, it is apparent that the number of input stimuli has the greatest effect in the reduction of simulation time between the “brute force” Monte Carlo simulation and the hybrid simulation. This is because the largest number of simulations are saved by the test selection stage in the algorithm and only using Monte Carlo fault simulation on one “most optimum” stimulus. One side-effect of this is that the overall figure of probability is a lower bound since it is possible that other
evaluation demand different fault simulation techniques. Two approaches based on this have been presented. The optimum test approach will ultimately depend on many factors such as accuracy required, circuit size, and computational power available.
Chapter 6
Improved Test Metrics for Fault Simulation

1. Introduction

The aim of simulation before test is to evaluate the power of a test in terms of the quality level, that is "How good is a test at rejecting faulty ICs and passing good ICs?". Initially a "raw" average probability of detection figure may appear to be a useful measure of test quality. However, two assumptions are made which have implications which must be considered when interpreting these figures. The first assumption is that all faults are equiprobable. For example an average probability of detection of, say, 95% may appear to be high if faults are equally likely, but would be undesirable if some of the faults with low probabilities of detection were the most likely faults to occur. The second is that the fault model is assumed to be an accurate circuit level representation of physical defects. The problem here is that fixed parameters in fault models are used, whereas the actual value of a parameter of a defect has been shown to take on a range of values depending on various physical factors such as location, size, material of defect etc.

These two problems are discussed in this chapter together with possible approaches to making the test metric more realistic. The problem of fault occurrence is considered by incorporating a figure of probability of occurrence for classes of faults which is used as a weighting factor in the overall test metric. To improve the accuracy of the fault modelling, fault model parameters are included as input parameters to Monte Carlo fault simulation, using a range of possible values. Hence, a statistical model is used to represent a range of possible defects. Both approaches are applied to a circuit and the effect on the test quality metric is investigated.

2. Probability of Occurrence

2.1 Overview

One approach to the problem of considering probability of occurrence is to only consider realistic faults in the fault list. Realistic faults are defined in [Sous91] as those which can be traced to a manufacturing defect. In general these will be obtained by using IFA or similar technique prior to fault simulation. For example in [Sebe95] faults are ranked according to their likelihood of occurrence and only likely faults are considered in the fault list.

An extension to this is to consider each fault to have a weighting factor depending on the relative likelihood of occurrence. Thus, faults which are most likely to occur have significantly more effect on the testability metric than those that are less likely. A weighted fault approach is described in [Sous91] [Sara92] for digital circuits based on calculating the weighted fault coverage for each class of fault (such as short, open etc.).
The overall test metric is based on the weighted incidence of each class on the overall fault list. In [Spei93], fault probability of occurrence is considered in the compaction and optimisation of test sets for digital ICs. Similar to these approaches, Olbrich uses a weighted fault coverage figure in [Olbr97] for analogue circuits based on the relative probability of occurrence of a fault \( n \) (of \( N \) faults) as \( W_n \) defined in equation (6-1) as:

\[
FC = \sum_{n=1}^{N} \frac{F_n W_n}{\sum_{n=1}^{N} W_n} \times 100\% \tag{6-1}
\]

where \( F_n \) is a fault detection figure - 1 if the fault is detectable and 0 otherwise. In this section we present a similar approach, but the Boolean fault detection variable is replaced with the probability of detection figure and faults are considered in classes.

### 2.2 Weighted Probability of Detection Theory

Let the probability of occurrence of fault \( i \) from a set of \( n \) faults be \( PO_i \), and the probability of detection of fault \( i \) to be \( PD_i \). The weighted probability metric is therefore

\[
WP = \sum_{i=1}^{n} PO_i PD_i \tag{6-2}
\]

which is the probability that a fault occurs and is successfully detected. The type II error associated with this is given by

\[
\beta = \sum_{i=1}^{n} PO_i \beta_i = \sum_{i=1}^{n} PO_i (1 - PD_i) \tag{6-3}
\]

where \( \beta_i \) is the probability of a type II error for fault \( i \). Equation (6-3) is the probability that a faulty device will be shipped (i.e. a fault occurs and is undetected). This is shown diagramatically in figure 6-1. Note that \( PO_i \) is only dependent on the layout and fabrication process, whereas \( PD_i \) is dependent solely on the test, as shown.
Chapter 6 - Improved Test Metrics for Fault Simulation 6-3

Figure 6-1 - Possible Test Outcomes

The theory above represents the ideal case, but is only valid if all of the possible failure modes are accurately and completely represented by the simulation fault models used. This presents a large problem when the continuous nature of analogue circuits and faults is considered. Moreover, an accurate IFA is required to obtain the exact probability $PO_i$ for every fault and if it is not performed then these metrics cannot be used. This requires a device layout and detailed process statistics. However, it is possible to obtain the relative probability of occurrence for fault classes (such as opens, shorts etc.) from layout based rules, previous defect analysis information from the production environment or an estimate from published figures. Since only relative probabilities of occurrence are considered, there is no indication of the actual defect level and a normalised relative probability of occurrence ($RPO_i$) for each fault, $i$ (where $i = 1..n$), is used to generate the final weighted relative probability metric (WRP) as

$$\text{WRP} = \sum_{i=1}^{n} PD_i RPO_i$$  \hspace{1cm} (6-4)$$

where $RPO_i$ is normalised according to the total number of faults as

$$RPO_i = \frac{RF_{d_i}}{\sum_{j=1}^{c} (RF_j NF_j)}$$  \hspace{1cm} (6-5)$$

and $RF_x =$ relative occurrence of fault class containing fault $x$
$NF_j =$ number of faults in fault class $j$
$c =$ number of fault classes
$d_i =$ class to which fault $i$ belongs
This analysis allows the relative probabilities of fault classes containing more than one fault to be considered or individual faults, in which case NF_j is 1 in every case. The WRP metric is a weighted average probability of detection metric, with a maximum of 100% if every fault is 100% detectable and a minimum of 0% if every fault is undetectable. This may be interpreted in the same manner as fault coverage and average probability of detection, although with the obvious advantage of including a fault occurrence weighting. This approach differs slightly from that presented in [Sous91] [Sara92] [Olbr97] in that the probability of occurrence for each fault is not used, rather it is the class of fault which has the relative occurrence probability associated with it.

### 2.3 Relative Probabilities of Occurrence based on the Literature

Several studies indicating the probabilities of occurrence of different failure modes are described in the literature, based on IFA and manufacturing defect analysis. In common with these, in the analysis that follows only short and open faults (based on bridging and break defects) are considered.

In [Bru194] IFA results for a 30 transistor opamp show 33% of faults to be shorts and no open faults. Similarly, [Kuij95] presents IFA results for an 8-bit ADC with shorts as 95% of the faults and opens as 0.03%. In [Furg88] however, which presents results for IFA of three digital circuits, shorts and opens are reported as approximately equal at around 40%. In [Jaco93] 56% of faults are shorts and 36% are opens for 10 digital circuits. [Sebe95] presents likely failure modes for digital CMOS processes, with shorts 100 times more likely to occur than opens for defects occurring in the diffusion, polysilicon and metal. In [Ohle96], it is stated that open source faults in transistors connected to the supply rails are unlikely, since these structures generally use multiple contacts.

The exact proportions of failures occurring will depend on many factors such as layout, manufacturing processes and design. However, based on the data above, the assumption can be made that shorts are more likely than opens, of which opens in sources or drains connected to the supply are less likely. Considering this, an estimate of the relative probabilities of occurrence are shown in table 6-1.

<table>
<thead>
<tr>
<th>Fault type</th>
<th>Relative probability of occurrence</th>
</tr>
</thead>
<tbody>
<tr>
<td>shorts</td>
<td>100</td>
</tr>
<tr>
<td>opens</td>
<td>10</td>
</tr>
<tr>
<td>supply rail transistor opens</td>
<td>1</td>
</tr>
</tbody>
</table>

**Table 6-1 - Relative Probabilities of Fault Occurrence**

The work that follows uses these estimates in an example, but it should be noted that the table is an approximation based on several different processes. More meaningful results may be obtained using IFA results or prior knowledge of a production process. However, an approximation such as this can be used to improve the test metric whilst still at the circuit level design stage.
2.4 Multiplier experiment and results

In order to assess the effect of the probabilities of fault occurrence on test quality metrics, the multiplier circuit described in Appendix A was used in a probabilistic Monte Carlo fault simulation monitoring the dynamic supply current. Again the level 2 MOS parameters of VTO, TOX, UO, LD and polysilicon resistance were varied according to manufacturing process information (see appendix B, section 2). The catastrophic fault model shown in figure 4-6 was used. The transient simulation input shown in figure 4-5 was used as a test input, with the 6 points on the supply current waveform used as the sample points.

Initially, Monte Carlo fault simulation was run using 30 runs. Probabilities of detection were then calculated at each sample point for each fault using ANTICS with ±3σ limits from the fault-free simulation. The overall probability of detection for a given fault was taken as the maximum probability of detection for all 6 sample points. Faults were classified according to table 6-2. In order to improve accuracy, faults which were partially detectable after the first Monte Carlo fault simulation were resimulated using 300 simulation runs.

<table>
<thead>
<tr>
<th>Classification</th>
<th>Probability of detection range for fault x</th>
</tr>
</thead>
<tbody>
<tr>
<td>Detectable faults</td>
<td>PDₓ &gt; 99.5 %</td>
</tr>
<tr>
<td>Undetectable faults</td>
<td>PDₓ &lt; 0.5 %</td>
</tr>
<tr>
<td>Partially Detectable faults</td>
<td>0.5 % &lt; PDₓ &lt; 99.5 %</td>
</tr>
</tbody>
</table>

Table 6-2 - Probability of Detection Classifications

Two sets of results were calculated: firstly a set with probabilities of fault occurrence equal and secondly with probabilities of occurrence set to those given in table 6-1 using equations (6-4) and (6-5). The initial results are shown in figure 6-2 using the fault classifications in table 6-2. Several faults were non-convergent and one fault failed the KS test; these are included in the graph.

Figure 6-3 shows the initial unweighted average probability of detection results for the three fault classes and the unweighted probability of detection. In order to consider the error introduced by the lack of results for non-convergent simulations and the fault which failed the KS test, minimum and maximum results were obtained. These were calculated by assuming probabilities of detection of 0 and 100% respectively for the "resultless" faults.
Figure 6-2 - Fault Classification Results

Figure 6-3 - Unweighted Average Probability of Detection Results
Figure 6-4 shows the contributions to the weighted average results from each fault class and the overall weighted average probability figure WRP.

2.5 Discussion of Results and Conclusions

It is clear that in this case shorts have the most influence on the overall weighted average probability of detection and that the contribution of the supply rail opens is negligible. This is expected since shorts show a higher degree of detectability and are weighted more highly. In this case, supply rail opens could have been dropped from the fault list without loss of accuracy. The main figure of interest is a comparison between the average value of probability of detection before and after weighting. The graphs show an increase of just over 10% when class weighting is considered, mainly due to the high detection of short faults.

It is also apparent that the discrepancy caused by non-convergent faults and non-Normal fault distributions is reduced after fault weighting is applied since only one of these is in the short fault class. Conversely, if the short fault class had a high discrepancy or error then this would be amplified during the weighting process. This has implications for the accuracy of techniques for reducing simulation time such as those described in chapter 5. For example, considering hybrid fault simulation, the cutoff value of the DIST measurement could be varied so that it was smaller for faults with a lower probability of occurrence. More of these faults would be eliminated initially, but the effect of the error would be minimal. Similarly, faults which are less likely to occur would be appropriate candidates for best/worst case analysis.

Although the results presented in this section are limited in that they have been obtained for one circuit and test only, they show that a consideration of the probability of occurrence using a weighted metric can affect test quality results. In this case a simplified weighting scheme has been used. The theory presented could however be extended, utilising a more accurate technique such as IFA (if a layout and process
statistics were available) or realistic fault mapping. These techniques are preferable, but if neither of these is possible, then a table of relative probabilities such as table 6-1 could be constructed based on prior knowledge of the relative defect levels occurring within a specific production process.

3. A Statistical Approach to Fault Modelling

3.1 Introduction

From the literature review it is clear that no standard fault model exists for analogue integrated circuits. Fault models that have been proposed range from catastrophic opens and shorts to more subtle parametric faults and device-specific faults such as gate-oxide shorts. Accurate simulation fault models are not available from the production environment and have not been developed for use within simulators. Hence, standard practice has been to derive fault models using existing fault-free simulation models. One example is resistive-based catastrophic fault modeling which is widely used.

However, even when the same simulation fault model is used, there is still little agreement on suitable fault model parameters, e.g. short resistance. In practice, the actual resistance of bridging defects obtained will be dependent on factors such as defect size, shape, position, material etc. and the resistance parameter used in the simulation fault model used will be a compromise. It has been shown that for digital circuits the resistance chosen to model a bridging fault has an effect on the faulty circuit operation and hence the fault coverage [Rodr91] [Cham91] [Hao91]. In this section, we will concentrate on the effect of the resistance of bridging defects on CMOS analogue circuits since these defects have been shown to be the dominant failure mode for these circuits and studies are available in the literature.

3.2 A Review of Resistive Short Fault Modelling and Defect Analysis

Section 3.3 of chapter 2 describes the analysis of bridging defects undertaken by Bruls for metal bridging defects. The results are summarised in table 6-3; note that as a consequence of large uncertainty in the measurements, upper and lower bounds are given.

<table>
<thead>
<tr>
<th>Guaranteed Range</th>
<th>Total number of bridges</th>
<th>Guaranteed Range</th>
<th>Total number of bridges</th>
</tr>
</thead>
<tbody>
<tr>
<td>$R_b \leq 0.5,\Omega$</td>
<td>258 (64.5%)</td>
<td>$R_b \geq 0.5,\Omega$</td>
<td>14 (3.5%)</td>
</tr>
<tr>
<td>$R_b \leq 1,\Omega$</td>
<td>379 (94.8%)</td>
<td>$R_b \geq 1,\Omega$</td>
<td>12 (3.0%)</td>
</tr>
<tr>
<td>$R_b \leq 5,\Omega$</td>
<td>394 (98.5%)</td>
<td>$R_b \geq 5,\Omega$</td>
<td>4 (1.0%)</td>
</tr>
<tr>
<td>$R_b \leq 10,\Omega$</td>
<td>397 (99.3%)</td>
<td>$R_b \geq 10,\Omega$</td>
<td>2 (0.5%)</td>
</tr>
<tr>
<td>$R_b \leq 20,\Omega$</td>
<td>400 (100%)</td>
<td>$R_b \geq 20,\Omega$</td>
<td>0 (0%)</td>
</tr>
</tbody>
</table>

Table 6-3 - Bridging Fault Resistance Range from [Rodr96]
Although the results give an indication of the possible range of defect resistances, they suffer from a large inaccuracy. In particular, although the upper part of the defect resistance range is described, there is no information on the lower end of the distribution and so a lower limit to a short fault model parameter cannot be derived.

In terms of a test quality metric, ideally more than one fault model resistance should be evaluated since this will provide an indication of test performance during production on faults which are likely to occur. In order to do this, several authors have chosen to use more than one fault model resistance. The work by Bruls has prompted some authors to use 500Ω to model non-catastrophic defects [BruI94] [Kuij95] obtained from IFA. Other work has used a range of values from 10Ω to 10MΩ [Miur94] [Miur96], and upper and lower values of 0.2Ω and 1KΩ [Harv94].

The disadvantage in using more than one fault model resistance is obviously that each additional fault model requires another set of fault simulations. In addition, simply using extra fault model parameters does not take into account the likelihood of a defect corresponding to that parameter value occurring. For example considering table 6-3 it is clear that the majority of metal 1 short defects in this case have resistances less than 500Ω. Finally, performing additional fault simulations produces a set of results based on a discrete distribution rather than the continuous distribution which will actually be obtained.

Considering these points, one alternative approach is to include the simulation fault model parameter (in this case short resistance value) as an input variable to a Monte Carlo simulation. If the distribution of the fault model parameter represents that found in production then the output from the statistical Monte Carlo simulation will provide an improved test quality measure based on the likely defect distribution.

### 3.3 Multiplier Experiment and Results

In order to investigate the possibility of using a simulation fault model parameter as part of a Monte Carlo fault simulation, gate source and gate drain short faults (figure 4-6) were injected into the analogue multiplier described in appendix A, section 2.2.1. The fault model parameter (short resistance value \( R_f \)) was generated randomly based on the distribution shown in figure 6-5. The distribution was derived empirically based on table 6-3, with the assumption that the vast majority of faults would have a low resistance value, but with a small percentage of faults occurring with resistances up to 20KΩ.

The distribution shown is the cumulative distribution of a Weibul distribution with parameters \( \delta=0.5, \beta=100 \). The Weibul distribution is commonly used in engineering problems and the probability density function is given in equation (6-6). The distribution type was chosen due to its exponential nature and its ability to model a wide range of distribution shapes.

\[
f(x) = \frac{\delta}{\beta^\delta} x^{\delta-1} e^{-(x/\beta)^\delta}
\]  
(6-6)
Monte Carlo fault simulation with 30 runs was performed using process parameter deviations described in appendix B, section 1 and including the $R_f$ short value as a Monte Carlo random input. The measured variable of the multiplier was again the transient supply current, sampled at 6 points on the waveform, (see figure 4-5). A comparison between the results obtained and those where only the process parameters were varied are presented in table 6-4.

<table>
<thead>
<tr>
<th>Probability of Detection</th>
<th>Sigma</th>
<th>Relative sigma</th>
</tr>
</thead>
<tbody>
<tr>
<td>Maximum difference</td>
<td>9.7%</td>
<td>$2 \times 10^6$</td>
</tr>
<tr>
<td>Average difference</td>
<td>0.1%</td>
<td>$1.8 \times 10^5$</td>
</tr>
</tbody>
</table>

Table 6-4

The relative sigma value was calculated as

$$relative\ sigma = \frac{\sigma_1 - \sigma_2}{\sigma_1} \quad (6-7)$$

where $\sigma_1$ is the standard deviation of a sample point with only process parameters varied and $\sigma_2$ is the standard deviation with the resistance value included.

The results show that the probability of detection remained largely unchanged with a maximum difference of 10%, however the standard deviation at sample points of certain faults was found to vary greatly. It was found that although these faults had large changes in $\sigma$, the mean values were already so high as to make the fault totally
detectable. Examination of the simulation results revealed that for some faults the $R_f$ parameter was found to be the dominant cause of the spread in supply current, whereas for other circuits global process parameter deviations dominate. Examples are illustrated in figure 6-6 and 6-7 for two faults. In both cases the top dark response, a), is the process spread where the fault model resistance is not varied and the lighter response, b), is the process spread when the fault model resistance is included as a Monte Carlo input. The fault-free case is also shown. Note that in the case of fig 6-7, the relative spread of b) varies throughout the waveform.

3.4 Discussion of Results and Conclusions

The low change in average probability of detection when using the short fault model resistance as a Monte Carlo input variable can be explained as follows. The dominant cause of spread in the supply current will depend on the proportion of the total supply current which flows through the resistive fault path $R_f$. If the current through this path dominates the overall supply current then the resistive short value will be the dominant cause in supply current spread. Moreover, this can be used to explain the relatively high changes in sigma but low changes in probability of detection since those faults where the short fault path dominates the overall supply current are generally of high
detectability anyway due to an increased mean value. The spread and standard deviations of such faults will increase greatly, but the probabilities of detection will remain high.

In this work it has been shown that the effect of short resistance on test evaluation should not be disregarded, even though in this example it is minimal. In general, the severity of the effect will depend on many factors such as circuit type, fault position, fault model parameter distribution and the severity of process parameter spread. For a well established process which has a low process spread, the fault model distribution may well dominate. Conversely, in the example given, the fault model distribution has a limited effect on the overall figure of test merit.

The example shown is based on a crude model of fault model parameters and as such has inherent inaccuracies. In particular, only one fault model was considered and the distribution was based on results presented for shorts in the metal 1 layer only. More accuracy would be obtained if defect analysis results from the actual production process which would be used to fabricate devices were available. More detailed defect analysis studies in the future may also present an improved study of fault model parameters. However, even if the exact distribution is unknown, considering an approximate distribution will improve the meaningfulness of the simulation results to some extent.
Chapter 7
An Evaluation of Structural Test Techniques

1. Introduction

In this chapter several investigations into structural test techniques are presented based on techniques described in previous chapters. A comparison between transient supply current monitoring, RMS supply current monitoring and specification testing, all evaluated structurally using probabilistic fault simulation techniques, is presented in section 2. The use of supply current monitoring as a test technique for analogue circuits has been proposed by several authors and an evaluation of this test technique for three circuits is presented in section 3. A technique for improving fault detection is presented and evaluated in section 4 based on removing the DC component of the supply current. The results of an investigation into the reason why certain faults show low detectability are also presented. This is of interest since it can be used to indicate circuit structures which cause testability problems and can be used as a basis of increasing fault detection. The limitations of this approach are presented in section 5 and the conclusions in section 6.

2. A Comparison of Test Techniques

2.1 Requirements for test comparison

It is of particular interest to be able to compare the efficiency of several test methodologies. In order to perform comparisons, several aspects of fault simulation must be considered which have so far precluded the comparison of published fault simulation results.

- The same circuit must be used in the comparison. This includes netlist, device parameters, and fault-free device models. The capabilities of test equipment should be modelled as accurately as possible, e.g. output loading effects.

- The fault list and simulation fault models must be the same for each test technique considered. The accuracy to which the simulation fault models represent the actual defects will also have an effect on the evaluation.

- Process parameters must be varied using the same input distributions.

- Test limits should be set equally based on the fault-free distribution simulation results.

2.2 Experimental Procedure

This section presents an accurate comparison between three test techniques based on these points above, using probabilistic fault simulation. The circuit used for the comparison is the analogue multiplier circuit described in appendix A. The catastrophic
MOS transistor fault model shown in figure 4-6 was used with shorts modelled using a 1Ω resistor and opens with a 100MΩ resistor in parallel with a 1fF capacitor. Faults were considered in subcircuits XA61 and XA59 but not in the voltage reference cell XA60. Global process parameters were varied based on appendix B, section 1. 300 Monte Carlo simulation runs were used to generate the output distributions for the fault-free circuit and 30 Monte Carlo runs were used for each fault, based on results from chapter 4. The three test approaches used in the comparison are described below:

2.2.1 RMS Testing

RMS supply current testing has been proposed and evaluated by several authors as a means of detecting faults in analogue circuits (see chapter 2, section 2.2.1.2). The method used here for test selection and fault simulation is the Hybrid Fault simulation algorithm described in section 3 of chapter 5. The test inputs are described in section 3.3 of chapter 5 based on varying the input frequency and offset.

2.2.2 Transient Supply Current Monitoring

Test techniques based on transient supply current monitoring have also been proposed (chapter 2, section 2.2.1.2). The test input evaluated here uses a 6.5 µs input waveform on inputs VP and VN, shown in figure 4-5. This stimulus was used because every combination of the two inputs is included, hence the device is forced into all of its possible operating states. The resulting supply current waveform is measured at 6 strobe points on the waveform.

2.2.3 Specification Tests

It is of particular interest to evaluate structural tests along with specification tests. This can highlight, for example, tests which are undetectable using a functional test, but which are detectable using a structural technique. The functional test used is based on the specification of the multiplier cell, consisting of offset voltage, gain, and non-linear distortion. The tests are described in appendix A, section 2.2.1.3.

2.3 Results

Initial fault simulation results are presented in fig. 7-1 with faults classed as detectable, partially detectable and undetectable depending on the probability of detection figure for fault i according to table 7-1. The KS test was used to the 95% level to ensure that distributions were Normal. Those faults for which the simulation failed to converge or whose outputs were not Normal for a given test are represented in the graph as “Did not simulate”.

<table>
<thead>
<tr>
<th>Classification</th>
<th>Probability of detection Range of Fault x</th>
</tr>
</thead>
<tbody>
<tr>
<td>Detectable</td>
<td>PD_x &gt; 99.5%</td>
</tr>
<tr>
<td>Partially Detectable</td>
<td>0.5% &lt; PD_x &lt; 99.5%</td>
</tr>
<tr>
<td>Undetectable</td>
<td>PD_x &lt; 0.5%</td>
</tr>
</tbody>
</table>

Table 7-1 - Fault Classifications
Chapter 7 - An Evaluation of Structural Test Techniques 7-3

The weighted average probability of detection results for the three tests are presented in figure 7-2. Faults are weighted according to their probability of occurrence presented in table 6-1. In order to account for faults which either did not simulate or failed the KS test, maximum and minimum values of weighted fault coverage were calculated by assuming the unsimulatable faults were 100% and 0% detectable respectively. The average value was calculated without considering unsimulatable faults.

Figure 7-2 - Weighted Average Probability of Detection Results
2.4 Discussion of Results

The results show the three test techniques detected faults with varying degrees of success, with the specification test producing the highest fault coverage followed by the RMS test and then the transient supply current test. One point which is evident from the results is that the specification test did not detect all faults with a probability of 100%. Examination of the results also revealed that faults exist which were classed as undetectable using all techniques. An investigation into the reason for the lack of detection of certain faults was carried out and conclusions are presented in table 7-2. All faults included have an unweighted probability of detection less than 50% for all tests.

Table 7-2 shows that there are two main reasons for the low probabilities of detection obtained for certain faults. Many transistors are exclusively part of the power down circuitry which is not part of the functional test specification and are thus undetectable because they are effectively redundant to the specification. These faults are also undetectable using supply current testing techniques because the NEN input is fixed. The other main reason is due to the start-up components in the current reference circuit which are insensitive to many defects.

In the literature, some authors have chosen to remove faults from a fault list which do not cause functional failure such as in [Soma91b] where an initial functional fault simulation is used before structural test evaluation to obtain the fault list. Doing so however ignores the reliability issues which arise such as fault degradation effects. If the above faults were removed from the fault list then fault detection properties of the structural test techniques would greatly increase. Alternatively, more faults would be detected by including the power-down input (NEN) as part of the test input stimulus. Since this is an internal signal, this would require propagation through other circuitry.

Several open faults in transistors forming inverter structures are undetectable using 100MΩ as the resistance parameter in the open circuit fault model. However, using a value of 1e20Ω for one of these faults produced an output stuck-at fault, indicating that fault detection depends on the severity of the fault model. Further work should address this issue.

One point that should be noted is that although the type I error was the same for each measurement part of the tests since the +/-3σ limits were used each time, the dynamic supply current monitoring and specification test comprised more than one measurement. Therefore the overall type I error for these two tests may be greater than the RMS test which only used one measurement. If all measurements (1..n) comprising a test technique have the same probability of type I error (α) then the overall type I error will lie between α and nα. In practice, the upper limit of nα is unlikely since it would require all sets of process parameter deviations to cause type I errors independently. This is particularly unlikely for the transient supply current technique presented since the process spread effect tends to cause a fixed offset.
### Chapter 7 - An Evaluation of Structural Test Techniques  7-5

#### Faults

<table>
<thead>
<tr>
<th>Faults</th>
<th>Reason for low probability of detection</th>
</tr>
</thead>
<tbody>
<tr>
<td>A) <strong>Power-Down Circuitry</strong></td>
<td></td>
</tr>
<tr>
<td>XA57.XOP1.M4:DOP,SOP</td>
<td>The XOP1 operational amplifier circuit and IREF1 current reference circuit have a power-down facility controlled using the NEN signal which is internal to the chip and not externally controllable. In this case the NEN input was set to VSS which forces many transistors in the power down circuitry to be permanently off. Therefore, open drain and source faults cannot be detected in transistors M4, M5, M6, M26, M31 of XOP1 and transistors M4 in IREF1.</td>
</tr>
<tr>
<td>XA57.XOP1.M5:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA57.XOP1.M6:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA57.XOP1.M26:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA57.XOP1.M31:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA61.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA57.XOP1.M21:DOP,GSS,SOP</td>
<td>Transistors M4 and M8 form an inverter structure forcing the gate of M21 to VDD, hence a gate-source short cannot be detected on this component because the gate and source are at the same potential. Since this transistor is effectively off, source and drain open faults are also undetectable. Similarly, transistors M5 and M7 form an inverter and the NEN input forces the gate of M12 to the same potential as the drain so that the gate-drain short is undetectable.</td>
</tr>
<tr>
<td>XA57.XOP1.M12:GDS</td>
<td></td>
</tr>
<tr>
<td>XA61.M4:GSS</td>
<td>Transistor pairs (M7,M5), (M8,M4) in XOP1 and (M5,M4) in IREF1 form inverters with NEN as an input. The open source and drain fault resistance (100MΩ) is much less than the off resistance of the NMOS transistor (≈10^12 Ω) hence the output voltage is unaffected. The supply current in the fault-free case is low thus the open fault is not detectable in the supply current. Resimulation using an open fault model resistance value of 1e20Ω produces an output stuck-at VSS fault for fault XA57.XOP1.M7:DOP.</td>
</tr>
<tr>
<td>XA61.M5:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>B) <strong>Current Reference Start-Up Circuitry</strong></td>
<td></td>
</tr>
<tr>
<td>XA61.M10:DOP,SOP</td>
<td>The current reference cell IREF1 is a bootstrapped reference cell which uses “start-up” circuitry (transistors M8, M9, M10, M11) to set the VBP output to one of two possible equilibrium points. Many faults in the startup circuitry do not affect the output bias voltage VBP and hence have no functional effect. Faults in this section which cause a change in supply current are masked by variations in the supply current of higher current devices.</td>
</tr>
<tr>
<td>XA61.M11:DOP,GSS,SOP</td>
<td></td>
</tr>
<tr>
<td>XA61.M8:GSS</td>
<td></td>
</tr>
<tr>
<td>XA61.M9:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>C) <strong>Other Effects</strong></td>
<td></td>
</tr>
<tr>
<td>XA57.XM14.M4:GSS</td>
<td>The gate and source of transistor XM14.M4 are at fixed potentials since the VXN input to the multiplier is grounded. A gate to source short therefore has no effect on the output or the supply current.</td>
</tr>
<tr>
<td>XA61.M4:GSS</td>
<td>The short fault affects the voltage on the gate of transistor XA61.M14, but the transistor remains on hence there is no functional effect.</td>
</tr>
</tbody>
</table>

Table 7-2 - Reason for Low Probability of Detection of Faults in the Multiplier Circuit
Due to this point, it is hard to know the overall type I error based on the type I error of the constituent measurements. One technique would be to run an additional Monte Carlo simulation on the fault-free circuit and determine the number of runs incorrectly failed. However, to obtain a suitable degree of accuracy many simulations would be required. Further to this problem, if the type I error was to be set at a specific value, it would have to be done by moving the measurement test limits iteratively thus requiring this Monte Carlo simulation to be repeated.

3. Evaluation of Supply Current Monitoring as a Structural Test Technique

3.1 Introduction

Examination of the literature review shows that several authors have presented results for supply current monitoring of analogue circuits. Many of the results presented use fixed, arbitrary thresholds as a post-fault simulation process to generate figures of fault coverage which may not provide accurate results. This section presents an evaluation of supply current monitoring for 3 analogue circuits using the probabilistic fault simulation approach described in this thesis.

3.2 Circuits and Inputs

In order to investigate the supply current monitoring technique, a range of circuits was used which perform a variety of functions. Examination of the literature review reveals that there is no suitable ATPG tool currently available for structural analogue circuit testing. The inputs used for this evaluation were therefore based on reasoning and results from published work.

3.2.1 Multiplier Circuit

The multiplier circuit was that used in section 1 with the same input stimulus. The supply current is sampled at the 6 strobe points shown in figure 4-5. Faults were considered in the multiplier and current reference modules (XA57 and XA61 respectively).

Under normal operation, the majority of the supply current is drawn by transistor M30 in the output opamp (subcircuit XA57.XOPl) and the attenuator transistors (XA57.XM1.M1, XA57.XM5.M1, XA57.XM9.M1, XA57.XM13.M1). The current reference cell draws a smaller amount of current than other parts of the circuit.

3.2.2 Absolute Value Circuit

The absolute value circuit is described in Appendix B, section 2.2.4. The circuit has one functional input and the test stimulus chosen was one cycle of a 1kHz sine wave. The input was chosen so that its amplitude was close to the maximum input range and also has the advantage that it is relatively simple to generate. The input and output of the absolute value circuit from a 30 run Monte Carlo simulation are shown graphically in
(figure 7-3a)) along with the resulting supply current (figure 7-3b)). Examining the supply current waveform, it is evident that it contains 3 areas of sharp transient spikes caused by switching transistor pair XABS3.XM8.M4, XABS3.XM9.M4. Taking accurate supply current measurements in the region of these spikes with test equipment would be particularly difficult due to the fast rising edge. In terms of circuit simulation, a greatly increased number of sample points would be required to obtain an accurate sample of the spike waveform. For this reason the strobe function of ANACOV was used to define a subset of the set of all sample points on the waveform at which measurement would take place avoiding these sections. The strobe points selected are shown in figure 7-3c).

The output opamp transistor M30 of subcircuit XABS3.XOPA1 draws the most supply current, followed by the earlier stages of this opamp. High current is also drawn by the 4 transistors in XABS3.XM7 and the output transistor of the input buffer (XABSBUF.M30).

![Figure 7-3 - Absolute value circuit: a) input and output, b) supply current and c) strobe points](image)

### 3.2.3 Sample and Hold Circuit

The sample and hold circuit (described in Appendix B, section 2.2.2) has two inputs. The SH (sample and hold clock) input used was based on the functional input as a 20kHz digital pulse train. The IN signal input was a 1kHz sine wave with peak-to-peak values close to that of the maximum input range. The input, output and supply current waveforms are shown below in figure 7-4a) and 7-4b). Again, large current spikes are present due to the digital switching effect. In order to avoid sampling in the current spike regions, strobe points were defined after each spike (see figure 7-4c).

CMOS inverter transistors XA7.XS1.M7/M8, XA7.XS2.M7/M8, XA7.XS3.M7/M8, XA7.XS4.M7/M8 draw very low quiescent current but generate the current spikes during switching. The majority of the current is drawn by output transistor M30 of the buffer amplifier (XA7.XOPA1). The remainder of the supply current is drawn by earlier stages in this opamp.
Chapter 7 - An Evaluation of Structural Test Techniques

3.3 Results

A probabilistic fault simulation was performed on all circuits using the catastrophic fault model shown in figure 4-6. At each strobe point, test limits were set at the +/-3σ limits of the fault-free distribution. 30 Monte Carlo simulation runs were used on each faulty circuit to generate figures of probability of detection. Initial fault classification results are presented in figure 7-5 based on the classification of the probability of detection given in table 7-1. Average probability of detection results are presented as unweighted in figure 7-6 and weighted according to table 6-1. Maximum and minimum values were used to account for undetectable faults or those which failed the KS test as in section 2.3.

Figure 7-4 - Sample and Hold Circuit: a) Input and output, b) Supply current and c) Strobe points

Figure 7-5 - Initial Fault Classification Results
3.4 Discussion of Results

The initial results in figures 7-5 and 7-6 show probabilities of detection for the three circuits. When the fault weightings are considered (figure 7-7), the test quality figure is

Figure 7-6 - Unweighted Average Probability of Detection Results

Figure 7-7 - Weighted Average Probability of Detection Results
greatly increased mainly due to the fact that the majority of the undetectable faults are source or drain opens which have a low probability of occurrence.

The low fault detection was due to several effects. Firstly, the opamps and current reference cells used contained power-down circuitry which was not activated during the tests. Other faults were undetectable due to the masking effect of the process deviations. A method of reducing the effect of process parameter deviations is presented in the next section.

4. Increasing Fault Coverage for Supply Current Monitoring

4.1 Introduction

The study in section 2 showed that dynamic supply current monitoring produced a significantly lower fault coverage than the other two test techniques. This was mainly due to the fault masking effect of the supply current which showed a particularly high sensitivity to process parameters. Similarly, test results presented in section 3 are also low. Examination of figures 7-3, 7-4 and 5-3 reveals that the main effect of the process parameters is to generate a DC offset. To avoid the fault masking effect which this causes, one possible approach is to ignore the DC component of the waveform. Thus, the test limits based on the fault-free circuit response would be closer. This can be achieved by calculating the mean value of the waveform and subtracting it from every sample point in the waveform. A practical implementation of this could either use this as a post-processing algorithm on ATE or use a capacitor to block the DC component of the waveform. To enable this technique to be investigated using fault simulation, an algorithm to subtract the DC component of the waveform has been included as part of the ANACOV post-processing software.

4.2 Experimental Procedure and Results

The investigation used the same test inputs and circuits as that used in section 2, however the ANACOV alldata detection mode was used rather than the probabilistic method used previously. 30 Monte Carlo simulation runs were used to generate the upper and lower bounds on the faulty and fault-free circuits. Results are presented in table 7-3 using the previously defined strobe points for the current measurements. A fault was considered detectable if one sample point was detectable based on the alldata detection algorithm.

<table>
<thead>
<tr>
<th></th>
<th>Percentage Fault Coverage</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Normal</td>
</tr>
<tr>
<td>Multiplier</td>
<td>40%</td>
</tr>
<tr>
<td>Absolute Value Circuit</td>
<td>42%</td>
</tr>
<tr>
<td>Sample and Hold</td>
<td>53%</td>
</tr>
</tbody>
</table>

Table 7-3 - Increased Fault Coverage for Test Circuits with DC Shift Algorithm
Chapter 7 - An Evaluation of Structural Test Techniques 7-11

<table>
<thead>
<tr>
<th></th>
<th>Normal</th>
<th>DC Shifted</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Average NP</td>
<td>Average ACM</td>
</tr>
<tr>
<td>Multiplier</td>
<td>5.2</td>
<td>3</td>
</tr>
<tr>
<td></td>
<td>0.067</td>
<td>6.53e-5</td>
</tr>
<tr>
<td>Absolute Value Circuit</td>
<td>16.4</td>
<td>14.9</td>
</tr>
<tr>
<td></td>
<td>0.114</td>
<td>1.75e-4</td>
</tr>
<tr>
<td>Sample and Hold</td>
<td>30</td>
<td>32</td>
</tr>
<tr>
<td></td>
<td>0.15</td>
<td>1.04e-4</td>
</tr>
</tbody>
</table>

Table 7-4 - Confidence Measures for Test Circuits with DC Shift Algorithm

The results in table 7-3 show a clear increase in fault coverage using the proposed technique for this circuit. Table 7-4 presents the average number of detectable points and the average distance confidence measure for all detectable faults (see chapter 3, section 6.7). Although the DC shifted technique has a higher fault coverage, the confidence measures associated with it are much lower. This is due to the reduction in the separation distance between the faulty and fault-free waveform after the shifting of the waveform since many faults cause a DC offset effect which is disregarded. The average distance confidence measure for the DC shifted technique is lower than the normal results by several orders of magnitude in some cases. This may however be distorted since some short faults cause a very large increase in supply current which affects the overall average. In practice a combination of both techniques will provide the best fault detectability.

4.3 Investigation into the Reason for Undetectable Faults

Although the DC component subtraction has increased the fault coverage substantially, there still remain undetectable faults. A further investigation was carried out in order to investigate the nature of these undetectable faults. Results of the investigation are presented in tables 7-5, 7-6 and 7-7. Note that faults undetectable in the multiplier in table 7-5 are in addition to those already described in table 7-1.
MULTIPLIER

<table>
<thead>
<tr>
<th>FAULTS</th>
<th>Reason for low probability of detection</th>
</tr>
</thead>
<tbody>
<tr>
<td>A) FAULTS IN THE GILBERT GAIN CELL</td>
<td></td>
</tr>
<tr>
<td>XA57_XM17_M6:DOP,SOP</td>
<td>These faults all cause a functional failure - the fault effect is propagated to opamp output transistor M30 which draws a high current. However, the fault effects are masked by current deviations in the level shifter/attenuator stages. Many faults produce a DC current offset which prevents detection using the shifted DC technique.</td>
</tr>
<tr>
<td>XA57_XM23_M4:GDS</td>
<td></td>
</tr>
<tr>
<td>XA57_XM23_M5:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA57_XM20_M4:GDS</td>
<td></td>
</tr>
<tr>
<td>XA57_XM21_M4:GDS</td>
<td></td>
</tr>
<tr>
<td>XA57_XM21_M5:GDS,DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA57_XM22_M4:GDS</td>
<td></td>
</tr>
<tr>
<td>B) FAULTS IN THE LEVELSHIFTERS/ATTENUATOR STAGE</td>
<td></td>
</tr>
<tr>
<td>XA57_XM7_M8:DOP</td>
<td>The fault effect propagates through VYB as a fixed offset to opamp output transistor M30 causing a small offset. However, this effect and the decrease in current through the faulty transistor M8 is masked by process effects.</td>
</tr>
<tr>
<td>C) FAULTS IN THE OPAMP</td>
<td></td>
</tr>
<tr>
<td>XA57_XOP1_M16:GSS</td>
<td>These faults in the opamp either cause a complete functional failure or a small DC offset. However, in all cases although the fault propagates to the opamp output transistor, the effect is not severe enough to avoid fault masking.</td>
</tr>
<tr>
<td>XA57_XOP1_M17:GSS</td>
<td></td>
</tr>
<tr>
<td>XA57_XOP1_M19:DOP</td>
<td></td>
</tr>
<tr>
<td>XA57_XOP1_M19:GDS</td>
<td></td>
</tr>
<tr>
<td>XA57_XOP1_M25:DOP,SOP</td>
<td>Transistor M25 forms the opamp compensation resistance - opens in this transistor are hard to detect since they affect the frequency characteristics of the opamp.</td>
</tr>
<tr>
<td>D) FAULTS IN THE CURRENT REFERENCE CELL</td>
<td></td>
</tr>
<tr>
<td>XA61_M10:GSS</td>
<td>These faults in the current reference cell produce a small decrease in the current reference output voltage, VBP. Although the effect propagates through to the opamp output transistor the effect is not great enough to allow fault detection.</td>
</tr>
<tr>
<td>XA61_M13:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA61_M8:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA61_M9:GSS</td>
<td></td>
</tr>
</tbody>
</table>

Table 7-5 - Undetectable Faults in the Multiplier Circuit
## SAMPLE AND HOLD

<table>
<thead>
<tr>
<th>Faults</th>
<th>Reason for low probability of detection</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>A) OPAMP POWER-DOWN CIRCUITRY</strong></td>
<td></td>
</tr>
<tr>
<td>XA7.XOPA1.M4:DOP,SOP</td>
<td>Transistors M4, M5, M6, M21, M26, and M31 are circuitry used to provide the opamp power-down function. See table 7-1 section A).</td>
</tr>
<tr>
<td>XA7.XOPA1.M5:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA7.XOPA1.M6:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA7.XOPA1.M26:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA7.XOPA1.M31:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XA7.XOPA1.M12:GDS</td>
<td></td>
</tr>
<tr>
<td>XA7.XOPA1.M8:DOP,SOP</td>
<td>Transistor pairs (M7,M5), (M8,M4) form inverters with NEN as an input. See table 7-1 section A).</td>
</tr>
<tr>
<td>XA7.XOPA1.M7:DOP,SOP</td>
<td></td>
</tr>
</tbody>
</table>

| **B) INVERTER AND PASS TRANSISTOR REDUNDANCY** | |
| XA7.XS1.M6:DOP,SOP | The sample and hold circuit contains 4 CMOS pass transistor structures. Source and drain open faults on these transistors have low detectability since the pass transistor operation is maintained by the fault-free transistor and for the majority of input levels. The fault effect results in a small loss of dynamic input range but this is undetectable in the supply current. |
| XA7.XS2.M6:DOP,SOP | |
| XA7.XS2.M5:DOP | |
| XA7.XS3.M6:DOP,SOP | |
| XA7.XS4.M6:DOP,SOP | |
| XA7.XS4.M5:DOP | |
| XA7.XS2.M7:DOP,SOP | Opens have low detectability on several CMOS inverter structures. The faults cause a reduced output drive to the pass transistor input but circuit operation is unaffected. Furthermore, the open circuit does not alter the supply current directly since the inverter in the fault-free case has a high impedance path between the supplies when not switching. |
| XA7.XS2.M8:DOP,SOP | |
| XA7.XS4.M8:DOP,SOP | |
| XA7.XS4.M7:DOP,SOP | |

| **C) OTHER EFFECTS** | |
| XA7.XOPA1.M25:DOP,SOP | Opamp compensation resistance - (see table 7-5, section C) |
| XA7.XOPA1.M31:GSS | The gate and source of this transistor are at fixed potentials. Therefore a gate to source short has no effect. |

Table 7-6 - Undetectable Faults in the Sample and Hold Circuit
### Chapter 7 - An Evaluation of Structural Test Techniques

#### Table 7-7- Undetectable faults in the absolute value circuit

<table>
<thead>
<tr>
<th>ABSOLUTE VALUE CIRCUIT</th>
<th>Reason for low probability of detection</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>A) POWER-DOWN CIRCUITRY</strong></td>
<td></td>
</tr>
<tr>
<td>XREF1.M4:DOP,SOP</td>
<td>The current reference source (XREF1) and the two opamps (XABS3.XOP1, XABSBUF) contain power-down circuitry which is permanently off since the NEN signal is at VSS. See table 7-1 section A).</td>
</tr>
<tr>
<td>XABS3.XOP1.M31:GSS,DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M6:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M5:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M5:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M6:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M26:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M31:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M12:GDS</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M12:GDS</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M21:GSS,DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M21:GSS,DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XREF1.M5:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M8:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M7:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M8:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M7:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td><strong>B) CURRENT REFERENCE START-UP CIRCUITRY</strong></td>
<td></td>
</tr>
<tr>
<td>XREF1.M11:GSS,DOP,SOP</td>
<td>Several faults are undetectable in the start-up circuitry of the current reference cell. See table 7-1, section B).</td>
</tr>
<tr>
<td>XREF1.M10:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XREF1.M9:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XREF1.M8:GSS</td>
<td></td>
</tr>
<tr>
<td><strong>C) COMPENSATION RESISTOR</strong></td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M25:DOP,SOP</td>
<td>Opamp transistor M25 forms part of the opamp compensation circuitry - open circuits affect the frequency characteristics.</td>
</tr>
<tr>
<td>XABSBUF.M25:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td><strong>D) OTHER</strong></td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M26:DOP,SOP</td>
<td>These faults in the output opamp (XOP1) cause severe functional failure. However, they are not detectable in the supply current due to fault masking by process deviations of earlier stages which draw more current.</td>
</tr>
<tr>
<td>XABS3.XOP1.M18:DOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M4:GDS</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M30:GDS</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M19:GSS,DOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M17:DOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M16:GSS</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM14.M4:DOP,SOP</td>
<td>Many of the faults in the earlier stage of the absolute value produce functional effects. However, the main area that is affected is the “crossover” area which is not sampled. Faults in other areas of the supply current are masked by process deviations.</td>
</tr>
<tr>
<td>XABS3.XM6.M8:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM5.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM1.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM2.M5:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM8.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM15.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM9.M4:DOP,SOP</td>
<td></td>
</tr>
</tbody>
</table>

Table 7-7- Undetectable faults in the absolute value circuit
### Absolute Value Circuit

<table>
<thead>
<tr>
<th>Faults</th>
<th>Reason for low probability of detection</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>A) Power-down Circuitry</strong></td>
<td></td>
</tr>
<tr>
<td>XREF1.M4:DOP,SOP</td>
<td>The current reference source (XREF1) and the two opamps (XABS3.XOP1, XABSBUF) contain power-down circuitry which is permanently off since the NEN signal is at VSS. See table 7-1 section A).</td>
</tr>
<tr>
<td>XABS3.XOP1.M31:GSS,DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M6:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M5:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M5:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M6:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M26:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M31:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M12:GDS</td>
<td>Undetectable inverter structures. See table 7-1 section A).</td>
</tr>
<tr>
<td>XABS3.XOP1.M12:GDS</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M21:GSS,DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M21:GSS,DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.xOP1.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.xOP1.M31:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.xOP1.M26:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.xOP1.M12:GDS</td>
<td></td>
</tr>
<tr>
<td>XIREF1.M5:DOP,SOP</td>
<td>See table 7-1 section A).</td>
</tr>
<tr>
<td>XABS3.XOP1.M8:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M7:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M8:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M7:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td><strong>B) Current Reference Start-up Circuitry</strong></td>
<td></td>
</tr>
<tr>
<td>XREF1.M11:GSS,DOP,SOP</td>
<td>Several faults are undetectable in the start-up circuitry of the current reference cell. See table 7-1, section B).</td>
</tr>
<tr>
<td>XREF1.M10:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XREF1.M9:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XREF1.M8:GSS</td>
<td></td>
</tr>
<tr>
<td><strong>C) Compensation Resistor</strong></td>
<td>Opamp transistor M25 forms part of the opamp compensation circuitry - open circuits affect the frequency characteristics.</td>
</tr>
<tr>
<td>XABS3.XOP1.M25:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABSBUF.M25:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td><strong>D) Other</strong></td>
<td>These faults in the output opamp (XOP1) cause severe functional failure. However, they are not detectable in the supply current due to fault masking by process deviations of earlier stages which draw more current.</td>
</tr>
<tr>
<td>XABS3.XOP1.M26:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M18:DOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M4:GDS</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M30:GDS</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M19:GSS,DOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M17:DOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XOP1.M16:GSS</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM14.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM6.M8:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM5.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM1.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM2.M5:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM8.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM15.M4:DOP,SOP</td>
<td></td>
</tr>
<tr>
<td>XABS3.XM9.M4:DOP,SOP</td>
<td>Many of the faults in the earlier stage of the absolute value produce functional effects. However, the main area that is affected is the “crossover” area which is not sampled. Faults in other areas of the supply current are masked by process deviations.</td>
</tr>
</tbody>
</table>

**Table 7-7- Undetectable faults in the absolute value circuit**
4.4 Discussion

The investigation has highlighted several common areas and circuit structures which show poor fault detectability. The digital power-down switching circuitry accounts for 41% of the undetectable faults in the multiplier, 46% in the sample and hold circuit and 51% in the absolute value circuit. This was due to the fact that the internal power-down enable signal is not part of the input stimulus. Fault detectability would be improved if this node was either directly available as an external input or an input signal could be propagated to it. The start-up circuitry of the current reference cell presents a testability problem since it plays little part in the overall function and many faults have little or no effect. One possible way to increase fault coverage in this cell using supply current monitoring would be to ramp the supply voltage, which has been shown to aid fault detection [A'ain94]. Certain CMOS structures such as inverters and pass transistors present test problems since correct function is maintained for certain input signal levels for open circuit faults. Faults in opamp compensation components also present test problems using supply current monitoring. Two cases exist where a gate to source short fault is undetectable on a transistor with the gate as an input and the source connected to the negative supply. These faults will cause a large increase in input current which could be used to detect them.

Several faults produce functional output failures but are undetectable in the supply current due to fault masking by process parameter deviations in sections which draw higher supply currents. This was noted in the absolute value circuit and the multiplier circuit where faults in the output opamp and faults whose effect propagated to the output opamp were masked by the higher supply current in previous stages. One possible approach based on [Binn94] would be to connect a low impedance load to the output opamp stage. The current through this stage would then dominate the overall supply current and a fault producing a functional output failure would be detected. Another approach would be to monitor more than one current, for example, current monitors on each macrocell to reduce fault masking. However, there is a trade-off between the increased fault detection and the area overhead required.

Examination of undetectable faults shows that the majority are open faults which have a low probability of occurrence. Therefore if a weighted metric was used the overall test quality would be higher than the fault coverage figure presented.

5. Inaccuracies and Limitations

The results presented in this chapter use process parameter deviations in order to set test limits for the faulty and fault-free cases which is an improvement over other published results. However, there still remain certain limitations and inaccuracies with the results which should be considered. Most of these limitations and inaccuracies apply to other published results in addition to those results presented here.

Firstly, only a limited number of circuits has been studied and in only one technology. Further work should investigate alternative circuits if possible using different process technologies. Another aspect of the work which is non-ideal is that a simplistic fault model was used. Since the layouts of the devices were not available, a component-based
Chapter 7 - An Evaluation of Structural Test Techniques

transistor fault model was used and probabilities of occurrence were derived from manufacturing data rather than IFA. Only one value of fault resistance was used rather than the distribution-based technique presented in chapter 6, although for the multiplier circuit this was shown to have limited effect on the overall test quality results.

The investigations accounted for global process parameter deviations using Monte Carlo simulation, however, neither local (intradie) parameter variations nor correlation between the process parameters were considered since they require a circuit layout and additional process information which were unavailable. Given this information, the techniques of probabilistic fault simulation could have been applied and more accurate results presented. Another point of inaccuracy is that only 30 Monte Carlo simulations were used to generate the distributions and determine test limits, however the study for the multiplier circuit in chapter 4 showed this to generate average probability of detection figures to <2% error. The final assumption which is common to all simulated fault analysis is that the circuit simulator produces accurate results.

Several aspects of test equipment modelling were not considered in the case study, for example measurement, noise, loading effects and power supply effects were not modelled. These will ultimately depend on the capability of the test equipment used during device testing. However, no such equipment was available and these effects were neglected, although it was assumed that the dynamic supply current waveform could not be sampled on areas with sharp spikes. If the specification of a tester is known then the non-idealities should be modelled using either additional circuitry or post-processing functions such as those available in ANACOV.

6. Conclusions

In this chapter structural based test techniques have been examined using three circuits as a case study. Initial test results showed that dynamic supply current monitoring shows a lower detectability than other test techniques mainly due to the fault masking effect of process deviations in parts of the circuit which draw the most supply current. A method of increasing the fault coverage has been presented based on removing the DC component of the supply current waveform (obtained by subtracting the average current level). However, the average distance confidence measures are lower using this technique and a combination of the standard supply current monitoring and the shifted DC technique presents the best approach to fault detection.

The reasons for the low detectability of faults has been established, and various suggestions for increasing fault detection have been described. Based on the observations of the studies presented, one general approach to detect faults using supply current monitoring is to try to propagate the fault effect to the circuit sections with the highest supply currents to avoid fault masking effects.

Although only a small number of circuits has been investigated, the work indicates certain structures which may produce testability problems. Further work providing an evaluation of a greater number of common analogue structures could yield a knowledge-based system indicating circuit areas with testability problems prior to fault simulation.
Chapter 8
Conclusions and Further Work

1. Introduction

In recent years there has been a large increase in mixed-signal ICs, with higher levels of integration. Although research in the digital test domain has provided well established fault models, DFT methodologies and test automation, the same is not true of the analogue test domain. Many of the problems in testing analogue and analogue portions of mixed-signal ICs are described in the literature review. Those that have been investigated in this thesis are based on fault simulation for structural test techniques and include fault modelling, the setting of pass-fail tolerance bands and the comparison of test techniques.

2. Fault Simulation and Test Quality Metrics

Practically any investigation into structural test methodologies requires an analogue fault simulator. The ANTICS analogue fault simulator based on HSPICE has been developed and is described in chapter 3. ANTICS has many features in common with other fault simulators that have been described in the literature and some which are different. One feature is that several post-processing detection modes are available which can be used to model detection decisions. The HSPICE .measure output variables can be used to define which variables are to be measured which gives the flexibility of being able to use a variety of different measurements within a fault simulation.

One problem highlighted in the literature review was that the pass-fail tolerance for the faulty and fault-free cases has in many cases been either set at an arbitrary value or assumed to be fixed throughout the entirety of a waveform. Examples given in this thesis show this to be a poor approximation in certain cases. In order to model the effect of process parameter deviations, ANTICS has the ability to use a Monte Carlo simulation approach to obtain the faulty and fault-free tolerance bounds. Whilst several papers have described Monte Carlo analysis as part of an investigation into structural testing, a fault simulation scheme using this has not been described in the literature.

Using Monte Carlo simulation to generate faulty and fault-free circuit tolerance bands also generates the additional problem of partially detectable faults, that is when the tolerance bands overlap one another partially. Analysis presented in chapter 7 shows this to be the case for many faults in the example circuits. The partial detectability of faults is ignored by many authors since the traditional test quality figure of fault coverage is based on a Boolean fault classification. To incorporate partially detectable faults into the test quality metric, a statistical fault simulation approach has been developed. The output of a Monte Carlo simulation is considered as a statistical distribution and the probability of fault detection is determined. The average probability of detection for all
faults can be interpreted in the same manner as a figure of fault coverage. The theory has been developed so that it is general and does not depend on distribution type, however to obtain the probability of detection figure the distribution must be estimated. The work presented here assumes the Normal distribution and uses the KS test in order to check the Normal distribution assumption. The results from investigation into the number of Monte Carlo simulations required to adequately describe the distribution showed that for the circuit studied 30 Monte Carlo simulation runs was adequate, providing an average probability of detection error of less than 2%.

3. Reducing Fault Simulation Time and Improving Test Quality Metrics

The main drawback to using a Monte Carlo-based fault simulation approach is the increased amount of simulation time required, thus many authors have avoided this technique. Two novel methods of reducing Monte Carlo fault simulation time have been presented and evaluated (chapter 5), with the aim of trading some simulation accuracy for a large decrease in simulation time. The first technique, best/worst case analysis, makes the assumption that the parameter deviation sets producing the minimum and maximum deviations for the fault-free circuit also produce the worst case deviations for every faulty circuit. Hybrid fault simulation technique has also been described, based on using a single run fault simulation for test selection and to eliminate clearly detectable and undetectable faults. Investigations into both techniques show a large decrease in simulation time whilst maintaining good accuracy.

In chapter 6, two problems with the current fault coverage measure are highlighted, firstly that all faults are assumed to be equiprobable and secondly that the fault model parameters are fixed rather than distributed. A weighted fault coverage test metric approach based on the probability of fault occurrence has been applied to the statistical fault simulation defined in chapter 4. The technique has been developed and investigated based on published fault occurrence and results show that weighting the probability of detection figures has an effect on the test quality metric. A more accurate result would be obtained using IFA to generate the probability of occurrence. A further investigation has been carried out into the effect of varying the fault model parameter of a short fault (short resistance) as a Monte Carlo simulation parameter. The results showed little effect on the average probability of detection test metric for supply current measurements for the circuit studied.

The conclusions described for chapters 5 and 6 can also be considered jointly as a method for the reduction of fault simulation time. Firstly, it has been concluded in chapter 5 that a multi-level approach to fault simulation can reduce simulation time. Further it is apparent from chapter 6 that weighting faults according to the probability of occurrence can have an effect on the error in the average probability of detection figure. For example a large error can be tolerated on a fault with a low probability of occurrence but conversely a small error on a fault with a high probability of occurrence could have a large effect on the test metric. A consideration of the probability of fault occurrence within a multi-level fault simulation scheme would allow, for example, a higher degree of accuracy to be used on faults which were more likely to occur and a
lower accuracy to be used on faults with a lower probability of occurrence. Using this approach would reduce simulation time with little loss in accuracy.

4. Evaluation of Structural Test Techniques

In the final chapter an investigation into structural test methods using fault simulation techniques developed in previous chapters has been presented. Due to the points described in the literature review, a comparison between published simulation results has been difficult. The first section presents an accurate comparison between three test techniques: specification-based tests, RMS supply current monitoring and dynamic supply current monitoring. Highest fault detection was obtained using specification testing, followed by RMS supply current testing and then dynamic supply current monitoring. The reason for low detection using dynamic supply current monitoring was established mainly as being due to the larger process parameter deviation effects masking the fault effects. An approach to increase fault coverage for dynamic supply current monitoring has been investigated by removing the DC component of the waveform. An increase in fault coverage was obtained for all circuits studied using this method, although the confidence measure associated with them were lower. It is concluded that this is a technique which has the potential to increase fault coverage for a supply current monitoring test scheme.

An investigation into the reasons for low detectabilities of specific faults has highlighted several circuit structures which contain faults which are hard to test and possible solutions to improve test quality. This could be used as a basis for a knowledge-based system to indicate testability problems, and ultimately an automatic test pattern generation system.

5. Further Work

Based on the findings and conclusions of this work, further work is justified in several areas.

Fault Simulation

- Several conclusions have led to the concept of having an integrated fault simulation management process to provide automated control of fault simulation using the ANTICS fault simulator. Several fault simulation control algorithms are possible based on the work presented here and should be investigated. In particular using the probability of occurrence information with a multi-level fault simulation approach to minimise fault simulation time described in section 3 should be considered. Combining this approach with hierarchical fault modelling, so that higher level fault models are automatically inserted where appropriate, should also be examined in future work.

- An interface to an IFA program should be written so that probability of occurrence information can be used in fault simulation with ANTICS.
• A statistical device modelling approach such as the SMOS model [Mich93] should be incorporated into ANTICS in the Monte Carlo random parameter generation stage, modifying HSPICE netlists automatically. This would allow local as well as global process parameter distributions to be considered during fault simulation.

• ANTICS should provide weighted fault coverage (results were obtained using a spreadsheet).

• More distributions should be considered in addition to the Normal distribution as output variable distributions when calculating probability of detection since several faults failed the KS test.

• For several of the investigations in this theses, future work should examine more circuits, with different process parameter deviations and measurements. These include hybrid and best/worst case fault simulation, the effect of changing the fault model parameters and the number of Monte Carlo simulations required for suitable accuracy.

Structural Test Evaluation

• Further work should investigate structural test techniques on an additional range of circuits implemented with different processes using techniques presented in this thesis for an accurate test comparison. Investigations should use IFA to generate probability of occurrence information and realistic defects for the circuits studied.

6. Summary of Achievements

Chapter 2 described the three main aims of this work all of which have been fulfilled.

1) The ANTICS fault simulation tool has been successfully developed and used for structural test evaluations. Accurate fault simulation considering process tolerance on the faulty and fault-free circuits is possible. Further to this, a statistical fault simulation approach has been developed which provides an improved test quality metric. Two methods have been described for the reduction of fault simulation time whilst maintaining good simulation accuracy. These techniques make Monte Carlo-based fault simulation more attractive to the design environment.

2) The statistical fault simulation test metric has been improved by considering probability of fault occurrence. An accurate investigation into the effect of altering fault model resistance has also been performed.

3) Several structural test techniques have been successfully evaluated and compared using the accurate fault simulation techniques developed. A method of improving fault detection using dynamic supply current monitoring has been presented.

The work presented now allows an accurate structural test evaluation and a comparison of structural test methods for analogue circuits.
Appendix A

Circuit Schematics and Descriptions

1. Circuits using Second Device Parameter Set

1.1 Unbuffered Opamp Circuit

The circuit is a 2-stage unbuffered opamp with n-channel inputs V+, V- and output Vout. VDD=+5V and VSS=-5V. The design uses the model parameters from section 2 in appendix B.

![Unbuffered CMOS Opamp Diagram](image)

<table>
<thead>
<tr>
<th>Device</th>
<th>Parameter</th>
</tr>
</thead>
<tbody>
<tr>
<td>M1</td>
<td>10u/63u</td>
</tr>
<tr>
<td>M2</td>
<td>10u/63u</td>
</tr>
<tr>
<td>M3</td>
<td>10u/10u</td>
</tr>
<tr>
<td>M4</td>
<td>10u/10u</td>
</tr>
<tr>
<td>M5</td>
<td>10u/10u</td>
</tr>
<tr>
<td>M6</td>
<td>10u/207u</td>
</tr>
<tr>
<td>M7</td>
<td>10u/319u</td>
</tr>
<tr>
<td>M8</td>
<td>10u/10u</td>
</tr>
<tr>
<td>M19</td>
<td>42u/10u</td>
</tr>
<tr>
<td>M20</td>
<td>42u/10u</td>
</tr>
<tr>
<td>M21</td>
<td>42u/10u</td>
</tr>
<tr>
<td>Cc</td>
<td>2.2pF</td>
</tr>
</tbody>
</table>

Figure A-1 - Unbuffered CMOS Opamp
1.2 Open Loop Configuration

![Figure A-2 - Open Loop Configuration](image)

1.3 Closed-Loop Configuration

![Figure A-3 - Closed Loop Configuration](image)
2. 3\(\mu\) Cell Array Circuits

This section consists of circuits made from cells from a 3\(\mu\) analogue cell array. VDD=5V and VSS=-5V for all circuits. All designs use the model parameters from section 1, appendix B.

2.1 Common Component Cells

2.1.1 IREF1 - Current Reference Cell

IREF1 is used to generate the bias voltage VBP which is used within other cells to generate bias currents. The NEN input is used to power down the cell, it is set at VSS for normal operation.

![Diagram of IREF1 Current Reference Cell]

- M4 5u/7u
- M5 5u/7u
- M8 60u/7u
- M9 40u/7u
- M10 40u/7u
- M11 5u/10u
- M12 5u/50u
- M13 10u/20u
- M14 5u/50u
- M17 5u/50u
- M18 10u/20u
- R19 55K

Figure A-4- IREF1 Current Reference Cell
2.1.2 VREF1 - Voltage Reference Cell

VREF1 is a voltage reference cell included automatically with IREF1 for the multiplier circuit. The D1 and D2 outputs are unused. The VBP is from the current reference cell IREF1, NEN is the power down input which is at VSS for normal operation.

![Figure A-5 - VREF1 Voltage Reference Cell](image-url)
2.1.3 OPA1/OPA2 - Opamp Cells

OPA1 and OPA2 are CMOS opamps with a power-down facility. Cells OPA1 and OPA2 share the same netlist but use different component values. VP and VN are the positive and negative inputs respectively, Vout is the output and NEN is the power down input which is at VSS for normal operation.

![Circuit Diagram]

**OPA1 Component Values:**

- M31: 5u/7u
- M29: 5u/600u
- M26: 5u/7u
- M24: 5u/7u
- M22: 10u/40u
- M18: 10u/50u
- M11: 10u/25u
- M21: 5u/7u
- M23: 60u/4u

**OPA2 Component Values:**

- M31: 5u/7u
- M29: 5u/300u
- M26: 5u/7u
- M24: 5u/7u
- M22: 10u/40u
- M18: 10u/50u
- M11: 10u/25u
- M21: 5u/7u
- M23: 60u/4u

*Figure A-6 - OPA1/OPA2 Opamp Subcircuits*
2.1.4 IOPAD - Input/Output Cell

IOPAD is an input/output cell with resistor/diode protection.

![Diagram of IOPAD Input/Output Cell]

Figure A-7 - IOPAD Input/Output Cell
2.2 Circuits Used for Evaluation

2.2.1 Analogue Multiplier

2.2.1.1 Top Level Multiplier Circuit
The multiplier circuit consists primarily of the MULTI cell, however, current and voltage reference cells (IREF1 and VREF1) and I/O pads (IOPAD) are included to improve simulation accuracy. The MULTI, IREF1 and VREF1 cells have a NEN power down input which remains at VSS (circuit enabled).

![Top level Diagram of Multiplier](image)

2.2.1.2 MULTI Cell
The MULTI cell consists of 4 attenuators/level shifters, a Gilbert transconductance multiplier and an opamp (OPA1) configured as a current to voltage converter. Inputs are applied to VXP and VYP, VOUT is the output. The VXN and VYN inputs to the multiplier are grounded.
Figure A-9 - MULT1 Multiplier Cell
2.2.1.3 Multiplier Specification Test

The specification test for the multiplier circuit consists of four parameters, the offset voltage (vos), the gain (k), and the non-linear distortion evaluated for each combination of inputs (nlpp, nlpn, nlnn, nlnp).

**offset voltage (vos)**

The offset voltage is the output voltage with both inputs set to 0V.

\[
vos = v_{out} \text{ with inputs } vx = 0V, \, vy = 0V
\]

**gain (k)**

The gain is the average of 4 output voltages evaluated at 4 input combinations of ±1V.

\[
\begin{align*}
k1 &= v_{out} - vos \text{ with inputs } vx = 1V, \, vy = 1V \\
k2 &= v_{out} - vos \text{ with inputs } vx = -1V, \, vy = 1V \\
k3 &= v_{out} - vos \text{ with inputs } vx = 1V, \, vy = -1V \\
k4 &= v_{out} - vos \text{ with inputs } vx = -1V, \, vy = -1V
\end{align*}
\]

\[
k = \frac{(k1+(-k2)+(-k3)+k4)}{4}
\]

**non-linear distortion (nlpp, nlpn, nlnn, nlnp)**

The percentage non-linear distortion is evaluated for the 4 possible input combinations of ±4V. The output voltage is subtracted from the ideal linear case which is calculated based on the gain k.

\[
\begin{align*}
nlpp &= ((v_{out}-vos) - k*4*4)/k*4*4) \times 100 \text{ with inputs } vx = 4V, \, vy = 4V \\
nlpn &= ((v_{out}-vos) - k*4*(-4)/k*4*(-4)) \times 100 \text{ with inputs } vx = 4V, \, vy = -4V \\
nlnn &= ((v_{out}-vos) - k*(-4)*(-4))/k*(-4)*(-4)) \times 100 \text{ with inputs } vx = -4V, \, vy = -4V \\
nlnp &= ((v_{out}-vos) - k*(-4)*4)/k*(-4)*4) \times 100 \text{ with inputs } vx = -4V, \, vy = 4V
\end{align*}
\]
2.2.2 Sample and Hold Circuit

The sample and hold circuit consists of 4 switches and 2 capacitors. OPA2 acts as an output buffer amplifier. SH is the sample clock, IN is the input and OUT the sampled output.

![Sample and Hold Circuit Diagram]

**Figure A-10 - Sample and Hold Circuit**

- XA7.XS1.M5, M6, M7, M8: 5u/7u
- XA7.XS2.M5, M6, M7, M8: 5u/7u
- XA7.XS3.M5, M6, M7, M8: 5u/7u
- XA7.XS4.M5, M6, M7, M8: 5u/7u
- C1, C2: 5.1pF
2.2.3 Absolute Value Circuit

2.2.3.1 Top Level Absolute Value Circuit

The absolute value circuit has an input $V_{IN}$ buffered through the OPA2 opamp and output $V_{OUT}$.

![Diagram of the Top Level Absolute Value Circuit](image)

Figure A-11 - Top Level Absolute Value Circuit
2.2.3.2 *ABSVAL - Absolute Value Cell*

![Circuit Diagram]

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>5u/50u</td>
<td>5u/50u</td>
<td>10u/10u</td>
<td>10u/10u</td>
<td>10u/10u</td>
<td>10u/10u</td>
<td>10u/10u</td>
<td>5u/10u</td>
</tr>
<tr>
<td>5u/10u</td>
<td>10u/10u</td>
<td>10u/10u</td>
<td>10u/10u</td>
<td>5u/40u</td>
<td>5u/50u</td>
<td>13.2K</td>
<td></td>
</tr>
</tbody>
</table>

**Figure A-12 - ABSVAL Absolute Value Cell**
Appendix B

HSPICE Model Statements and Process Deviation Parameters

1. Model and Process Deviation Parameters for 3\mu Standard Cell Process

1.1 Global and Process Deviation Parameters

In order that process parameter deviations can be preserved between different model parameter definitions (e.g. oxide thickness) global HSPICE parameters are defined. These are altered using data-driven analysis with a suitable input file generated using the MCRAND random number generation program to provide Monte Carlo simulation. For a nominal, single simulation run, the nominal value of the parameters is used. For Monte Carlo analysis the Nominal value is used as the mean parameter. The distribution parameter is standard deviation (\sigma) for Normal distributions and maximum/minimum variation from the mean for uniform distributions.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Nominal Value</th>
<th>Distribution Type</th>
<th>Distribution Parameter (absolute variation)</th>
</tr>
</thead>
<tbody>
<tr>
<td>ox</td>
<td>47n</td>
<td>Uniform</td>
<td>3.055n</td>
</tr>
<tr>
<td>latdiff</td>
<td>1</td>
<td>Uniform</td>
<td>0.5</td>
</tr>
<tr>
<td>uop</td>
<td>220</td>
<td>Uniform</td>
<td>19.8</td>
</tr>
<tr>
<td>uon</td>
<td>600</td>
<td>Uniform</td>
<td>54</td>
</tr>
<tr>
<td>vtp</td>
<td>-0.8</td>
<td>Normal</td>
<td>-0.0664</td>
</tr>
<tr>
<td>vtn</td>
<td>0.8</td>
<td>Normal</td>
<td>0.0664</td>
</tr>
<tr>
<td>resval1</td>
<td>1500</td>
<td>Uniform</td>
<td>30</td>
</tr>
<tr>
<td>modres1</td>
<td>1</td>
<td>Uniform</td>
<td>0.25</td>
</tr>
<tr>
<td>modres2</td>
<td>1</td>
<td>Uniform</td>
<td>0.99</td>
</tr>
</tbody>
</table>

Table B-1 - Process Deviation Parameters

1.2 P-Type MOS Transistor

```
.MODEL MOSP PMOS LEVEL = 2
+ VTO=vtp TOX=ox UO=uop UCRIT=4.2K LD= '0.4u*latdiff' PB=0.84
+ NSUB=20E15 TPG=-1 NEFF=2.34 NFS=6.35G UEXP=0.175 XJ=0.55U
+ VMAX=5E4 CJ=1.2E-4 MJ=0.5 CJSW=190P MJSW=0.33 CGSO=301P
+ CGDO=301P CGBO=20P DELTA=0.907 FC=0.5 KF=7.5E-27 AF=1.1 JS=1.8E-5
+ GAMMA=0.4 PHI=0.5
```

1.3 N-Type MOS Transistor
Appendix B - HSPICE Model Statements and Process Deviation Parameters

1.4 Diode

.MODEL D1 D(IS=0.47E-18 N=1.073 XTI=3.2 CJO=0.7E-16)

1.5 NPN Bipolar transistor

.MODEL NPN NPN(IS=5E-18 BF=800 NE=1.2 IKF=5.3E-6 ISE=5.76E-15 VA=9)

1.6 Resistor

.MODEL RPOLY2 R RES=modres1
.MODEL RCON2 R RES=modres2

2. Model and Process Deviation Parameters for Opamps

2.1 Global and Process Deviation Parameters

Parameters mc1, mc2, mc3, mc4 and mc5 are all taken as Normally distributed with mean 1 and standard deviation 0.05. Parameters are set to 1 during nominal simulations.

2.2 P-Type MOS Transistor

.MODEL NMOD NMOS LEVEL=2
  + VTO='0.5*mc1' KP='71E-6*mc2' PB=0.53 CGSO=110E-12 CGDO=110E-12
  + RSH='20*mc3' CJ=280E-6 MJ=0.48 CJSW=190E-12 MJSW=0.16
  + TOX='315E-10*mc4' NSUB=2.1E+15 XJ='0.43E-6*mc5' LD=0.34E-6 UO=715
  + UCRIT=2.24E+4 UEXP=0.007 PHI=0.61 GAMMA=0.4 VMAX=8.5E+4

2.3 P-Type MOS Transistor

.MODEL PMOD PMOS LEVEL=2
  + VTO='0.6*mc1' KP='28E-6*mc2' PB=0.51 CGSO=180E-12 CGDO=180E-12
  + RSH='25*mc3' CJ=280E-6 MJ=0.48 CJSW=290E-12 MJSW=0.28
  + TOX='315E-10*mc4' NSUB=3.9E+15 XJ='0.43E-6*mc5' LD=0.34E-6 UO=274
  + UCRIT=2.16E+4 UEXP=0.011 PHI=0.61 GAMMA=0.4 VMAX=9.3E+4
References

Books


Papers and Theses


References


References


List of Publications


