The Use of a Convolutional Neural Network in Detecting Soldering Faults from a Printed Circuit Board Assembly

Automatic Optical Inspection (AOI) is any method of detecting defects during a Printed Circuit Board (PCB) manufacturing process. Early AOI methods were based on classic image processing algorithms using a reference PCB. The traditional methods require very complex and inflexible preprocessing stages. With recent advances in the field of deep learning, especially Convolutional Neural Networks (CNN), automating various computer vision tasks has been established. Limited research has been carried out in the past on using CNN for AOI. The present systems are inflexible and require a lot of preprocessing steps or a complex illumination system to improve the accuracy. This paper studies the effectiveness of using CNN to detect soldering bridge faults in a PCB assembly. The paper presents a method for designing an optimized CNN architecture to detect soldering faults in a PCBA. The proposed CNN architecture is compared with the state-of-the-art object detection architecture, namely YOLO, with respect to detection accuracy, processing time, and memory requirement. The results of our experiments show that the proposed CNN architecture has a 3.0% better average precision, has 50% less number of parameters and infers in half the time as YOLO. The experimental results prove the effectiveness of using CNN in AOI by using images of a PCB assembly without any reference image, any complex preprocessing stage, or a complex illumination system.


Introduction
A Printed Circuit Board (PCB) is a mechanical structure that holds and connects electronic components. A PCB without electronic components installed is also called a bare PCB. Soldering is used to fix the electronic components in place on the PCB permanently by applying hot copper liquid onto a joint. After placing the electronic components onto the bare PCB it becomes a printed circuit board assembly (PCBA). With the development of technology, demand for electronic products to contain more features and be smaller in size has emerged. This demand has in turn caused the PCBA area to be smaller, more complex and denser. From enhanced complexity stems the need for accuracy. PCBA problems are often very costly to correct [1]. That is why, in a PCBA mass production process, the inspection of PCBA is considered an important task. For years, Manual Visual Inspection (MVI) has acted as the de facto test process for PCBA. This, coupled with an electrical test, such as an in-circuit or functional test, was deemed enough to detect major placement and soldering errors [2]. Manual modes of inspection had a low reliability rate and were often affected by visual fatigue [3,4].
PCBA production process consists of three main steps. 1) Solder paste layering on the board's surface; 2) component positioning; and 3) solder joint shaping by reflowing the solder paste. At each step of the production process, different 2 defects could occur that could be detected by stage specific AOI. Inspection after the first stage is called "Solder Paste Inspection (SPI)". The inspection techniques applied after the second stage are known as automatic placement inspection (API) techniques, while the inspection carried out after the third stage is known as post soldering inspection (PSI). It is observed that in all the PCBA processes, 90% of the faults are only detectable during PSI [5]. Possible faults occurring at this stage are solder bridge (a form of short in which solder creates a short circuit between two pins not meant to be connected), cold solder (a form of open where solder has not melted to create an electrical connection between the pin and the board), and dry-joint (where solder has not been applied to a pin, and bare copper is visible), to name some.
To detect structural defects at an early stage in the PCBA manufacturing process is necessary to reduce the PCBA production cost. Many complex and high-cost techniques have been proposed in the industry, such as using X-ray, optical, ultrasonic, and thermal imaging [6]. Using classical image processing algorithms, Automated Optical Inspection (AOI), also known as Automated Visual Inspection (AVI) was proposed as a technique that improved diagnostic capabilities in terms of speed and tasks. Moganti et al. (1996) proposed a categorization of AOI algorithms based on the way information is treated, i.e., a referential approach and a non-referential approach [5]. The referential method compares the image to be inspected with a defect-free template, requires high alignment accuracy, and is sensitive to illumination. The non-referential approach works by checking if the image to be detected satisfies the general design rules, paving the way to losing irregular defects that do not satisfy the design rules. These image processing and classification algorithms take a lot of computational configuration and are usually defect specific. They can't be found useful across multiple PCBAs. Because of its ability to self-learn and its promising potential for generalizability on object classification and detection tasks, CNN has been successful in replacing traditional computer vision algorithms. The deep network architecture of CNN [7] can detect discrimination features from all the input images on its own, so we do not need individuals to define image features. With improved computing machines, especially GPUs [8], the detection process has become so fast that on-line PCBA fault detection is possible using CNN. This paper outlines a method to design an optimal CNN architecture for soldering fault detection in a PCBA. It presents a novel CNN architecture that performs well in detecting soldering bridge faults on PCBAs from a single image without requiring any pre-processing step or a referential PCBA image. The dataset contains images of different PCBAs with soldering bridge faults. The dataset is small and imbalanced, so various data augmentation techniques were used.
The rest of the paper is structured as follows; Section 2 outlines the limitations of the previous research done in using CNN for AOI. Section 3 describes the methodology used to design the optimized CNN architecture. Section 4 presents the results of the optimized CNN architecture with the YOLO architecture. Section 5 concludes the paper and outlines the future work.

Literature Review
In the early days of the PCBA manufacturing industry, inspection tasks were performed by humans who were fatigued from perfunctory tasks. A comprehensive summary of the advancements in AOI systems over time has been given in Huang and Pan, (2015) [1], Moganti et al. (1996) [5], Taha et al. (2014) [9], Harlow (1982) [10], and Chin (1988) [11]. According to a report stated in Loh and Lu (1999) [12], solder joint defects correspond to 55% of the total faults in a PCBA. AOI can be broadly classified into three main categories, namely referential, non-referential, and hybrid methods [5]. Referential AOI systems compare the image of the PCB under test to a template image of the PCB that is free of any defects [13][14][15]. Referential methods include image subtraction, introduced by Lee (1978) [16], feature matching or template matching as used by Hara et al. (1983) [17], and comparing the compression codes [18]. Referential methods are susceptible to degraded performance due to image misalignment and variations in environmental conditions when capturing images. Non-referential methods remove the misalignment issues from the inspection process and are based on general design rule verification [19,20]. Non-referential methods require complete knowledge of the PCBA design. Hybrid methods combine the positive effects of both referential and non-referential methods. Various forms of AI have been widely used in hybrid approaches, with different referential methods used as a preprocessing step for localizing the fault area in the image [21]. To control the variations in illumination conditions while capturing the image of the PCB, most AOI systems provide complex user-controlled illuminations [22,23], e.g., three ring-shaped LEDs as shown in Figure 1.
The biggest potential barrier to AOI is its inflexibility and reliance on system configuration. CNN is a self-learning process that has potential for generalizability. However, most of the work done on AOI using CNN has been very preliminary and in its initial phase. Previous applications of CNN have been mainly focused on bare PCBs using a reference image, a computation-intensive preprocessing step, a complex illumination system, or a combination of these [24][25][26][27][28]. Acciani et al. (2006) [29] proposed a general architecture for applying very shallow neural networks in AOI based on hand-crafted features. Fanni et al. (2000) [30] used the energy components from the Fast Fourier Transform (FFT) and Haar Transform (HT) as the input feature set. Classic machine learning algorithms with input from selected feature sets [31][32][33]. CNN [34,35] has achieved outstanding results in image classification and detection tasks [36,37]. A CNN takes in the whole image and learns the features necessary for classification and detection, whereas previous classifiers take a set of manually selected features. A huge amount of data is needed to train a CNN that generalizes well. For developing CNN based AOI, researchers feel a great void in the availability of publicly accessible, huge and diverse datasets. Tang et al. (2019) [26] and Huang and Wei (2019) [25], the authors present a publicly available dataset that contains examples of defects on a bare PCB only. These datasets cannot be used to design a CNN for post-soldering AOI systems.

Figure 1. Three Ring LEDs structure taken from Wu and Zhang (2014) [27]
Tang et al. (2019) [26] proposes a template image based object detection CNN that treats the faults in a PCB as objects. Currently various CNN algorithms exist for object detection that try to balance the accuracy of the CNN with architecture efficiency. These object detection CNNs are broadly divided into two stage detectors or single stage detectors. [38] is a famous two stage detector that uses selective search [39], in stage one, to generate region proposals, also called regions of interest (ROI). In a later version, Fast R-CNN [40], the whole image passes through a CNN once instead of applying CNN on each ROI individually. In Faster R-CNN [41] the region proposal algorithm is integrated into the CNN. Even after all the advances R-CNN is very slow but performs very well in terms of prediction accuracy making R-CNN impossible to infer in real-time. Overfeat combines the classification and localization tasks into single object detection CNN [42] that is faster but less accurate than R-CNN. YOLO (You Only Look Once) [43] is simple single stage object detection CNN that divides the image into a grid of fixed size and for each grid cell it detects the bounding box coordinates through regression and the class probabilities for a fixed number of anchor boxes. YOLO was able to generalize well, corroborated by its ability to predict objects from hand painted images. It is the fastest object detection algorithm even though it drags down the performance in accuracy. In a better version YOLOv2 [44], the authors have used batch normalization for faster convergence during training; a custom feature extraction network that makes it faster; a convolutional anchor box predictions instead of fully connected layer that has shown to increase the prediction accuracy at the expense of increased false detection i.e. detecting an object in a grid cell that was not present in reality, and a pass-through layer to use fine-grained features from an earlier layer leading to increased accuracy performance than earlier version of YOLO. YOLOv3 [45] was further improved by incorporating feature pyramid representation for multiscale detection and increase in the number of feature extraction layers with residual connections that improved its accuracy significantly. Figure 2 represents the working principal of YOLO.  [48] has used YOLOv3 to detect missing components from a PCBA using dataset from [49] that labels each IC component on PCBA image.

R-CNN (Regions with CNN features)
To the best of the author's knowledge at the time of writing this paper this is the first work on the effectiveness of using CNN for AOI of a PCBA from 2 dimensional colored image of the PCBA without requiring a referential image, or any pre-processing step, or a complex illumination system. In our experiment we base the CNN design on the grid cell division principal used in YOLO. Each input PCBA image is divided into 14 × 14 grid. The output is a binary value for each grid cell. An output value of 1 suggests presence of soldering bridge fault in that grid cell.

Research Methodology
There is no one-hit formula to design an optimum CNN model, therefore, we had to rely on experiments to find an optimum CNN model to detect soldering bridge faults.

Dataset
Due to unavailability of open source dataset of soldering faults on a PCBA the dataset was collected manually. It includes 2D RGB images of 64 different PCBAs with soldering bridge fault manually introduced at different places. The total number of soldering bridge faults in the dataset is 359. The images in the dataset are resized to the size of 1024 ×1024. A corresponding annotation file was generated that contains bounding box information for all the possible defects in the image. Figure 3 shows a hypothetical image of PCBA and its corresponding annotation file. The dimensions of RGB input image to the CNN are chosen to be 448 × 448, inspired by YOLO. Resizing the image of whole PCBA to this size would incur loss of crucial information as the size of soldering faults are very small compared to the whole image size. The available dataset was not enough to train a CNN that generalizes well. Data augmentation is used to create a larger dataset for training. To avoid information loss and keeping in view the concept of generalization, we randomly cropped images of the complete PCBA panels in the dataset with the dimensions ranging from 448 × 448 to 512×512. The augmented dataset contains mutually exclusive 2000 images created by cropping images of the PCBA panels and randomly applying rotation and flipping on each cropped image. A grid of size 14 × 14 was used.

CNN Design
The performance of a CNN gets better when the network gets deeper [50] at the expense of increased resources utilization and increase in the number of learnable parameters. The choice of hyperparameters also plays a significant role in improving the performance of a CNN. We designed the CNN for soldering bridge detection in PCBA from scratch using the design optimization principles of inception module, bottleneck layer and residual block. In the inception module [51] filters of different sizes are used in each layer and the results are stacked. This allows the model to choose optimal filter size for itself. Conventionally the number of channels increases as we go deeper in a CNN model. This gives the deeper layers a larger receptive field. Lin et al. (2013) [52] a network in network layer is introduced as a 1 × 1 convolutional layer called a bottleneck layer. The bottleneck layer has the same summarising effect as pooling layer except that pooling layer shrinks the width and height while the bottleneck layer shrinks the number of channels which in turn reduces the overall number of parameters. A ground-breaking CNN architecture optimization principal was the introduction of residual block in He et al. (2016) [53]. Residual block effectively diminishes the vanishing gradient problem with increasing depth of the network without adding to the computational complexity of the architecture. Figure 4 shows the basic structure used for designing the CNN. The number of hidden layers is chosen from [25] that describes CNN architecture for detecting faults in bare PCB. The five max pooling layers downsample the input image to an output of size 14×14. The output is a binary number for each grid cell. It is 1 if there is soldering bridge fault in that grid cell and 0 otherwise. The YOLO architecture described in [45] is used as benchmark for comparing the performance of the optimally designed CNN architecture. The metrics used for comparison are: detection accuracy, inference time, and number of learnable parameters in CNN. To determine the detection accuracy of an object detection algorithm, Average Precision (AP) is a popular metric that ranges between 0 and 1. AP is determined for an individual class. Mean average precision (mAP) is the mean value of average precision for all the classes in a dataset. As we have only soldering bridge fault in our dataset we use AP to determine the detection accuracy of the models. A higher value of AP signifies higher detection accuracy of a model.

Figure 4. Basic structure used for designing CNN to detect soldering faults
Conventional convolutional layers are referred to as plain convolutional layers that do not contain an inception module or a residual block or a bottleneck layer. We start the experiment with basic CNN architecture employing plain convolutional layers and then try various combinations of convolutional layers added on to the basic structure i.e. going deeper, bottleneck layers, inception modules and residual blocks. The models were trained using the augmented dataset. The models were designed based on the following methodology: Figure 4.

Hyperparameters
We use filters of size 3×3 and stride value of 1 in all the convolutional layers except for the Inception layer which is a combination of filters of different sizes. We used max pooling layer with window size of 2 and stride value of 2. Following YOLO, we used Leaky ReLU, with slope for negative input values equal to 0.1, as the activation function in all the hidden layers and used sigmoid activation function in the last layer to return the prediction values in between 0 and 1. Adam optimizer is used for training. As our output values are binary hence, we used binary cross entropy loss function. A loss function determines how far away the predicted output of the model is from the ground truth during the training process.
The learning rate was initially chosen as a small value of 1 × 10 −5 for the first 25 epochs to induce stability in the training process. For the next 50 epochs its value was raised to 1 × 10 −3 for a faster convergence of the model. For the remaining epochs we used a learning rate of 5 × 10 −6 . The model was evaluated tested after every 25 epochs using average precision (AP) of soldering bridge faults as the metric, with the threshold value of 0.5. Training was stopped when the AP started to drop. Beyond this point the model start to overfit the training data, a phenomenon where the model learns detail and noise in the training data such that it negatively starts to impact the performance of the model on new and previously unseen data, in other words it starts to lose generalization. Regularization is any supplementary technique that makes the model generalize well and prevents the model from overfitting. A simple technique to choose a model that generalizes well is to terminate training when the loss on validation dataset starts to decrease. This technique is called early stopping. Batch normalization is proven to improve convergence and generalization in training neural networks in Luo et al. (2018) [54]. We added batch normalization followed by leaky ReLU activation function for regularization. Another very common, simple and extremely effective regularization technique used is dropout [55]. When using dropout on a layer in CNN each neuron is ignored during a training step with a probability p, where p is a hyperparameter called dropout rate. We have used dropout layer before the prediction layer with dropout rate equal to 0.5.

Results
This section compares the performance of the 6 CNN architectures described in Appendix I with YOLO architecture based on the three important metrics, namely the accuracy of the model measured through AP for soldering bridge fault, inference time (all the time measurements are taken on the same machine) and the number of learnable parameters in the model. Model4 and Model5 are based on Model1, while Model6 alters the architecture of Model4. In the comparison tables  means an increase in performance compared to YOLO and  means a decrease in performance as compared to YOLO. Table 1 describes the number of learnable parameters for the models used in the experiment. Table 2 shows the results of soldering bridge fault detection AP for the models used (the higher the better.) Recall value gives the total number of soldering bridge faults detected in the test images out of the total number of true soldering bridge faults in the test images. Table 3 shows the results of time taken by the models for inferencing a single panel image. The inferencing time experiment was repeated 5 times for each model and Table 3 shows the average value.  From the results, it can be inferred that Model1 performs equally well in detecting soldering bridge faults as the YOLO model, with a slight decrease (≈2%) in the AP of soldering bridge fault detection but a significant savings in memory (≈88%) and inference time. These results corroborate the claim that an optimal CNN architecture exists that can perform better than the state-of-the-art YOLO architecture in detecting soldering faults.
Results for Model2 and Model3 signify the importance of adding CONV layers in improving the accuracy performance of a model. As a rule of thumb, increasing the number of CONV layers increases the number of features learned, which in turn improves the accuracy of the CNN architecture, but only up to a certain number of layers [52]. Model3, which adds CONV layers to the high-level feature extraction part of Model1, shows ≈4% increase in AP at the cost of a significant increase in memory requirements and a higher inference time when compared to Model1. Model2, which adds CONV layers to the low-level feature extraction parts of Model1, showed anomalous behavior with a significant decrease in accuracy performance (≈16%), an increase in the inference time (equivalent to the increased inference time of Model3), and a slight increase in memory requirement. For verification, we repeated the training of Model2, resulting in a decreased accuracy once again. These results also suggest that adding CONV layers to the highlevel feature extraction part of a CNN has a lower chance of overfitting. Comparing Model3 to the YOLO architecture suggests that YOLO also has better accuracy than Model1 because it uses more CONV layers in the mid-level and highlevel feature extraction parts. Another implication from these results is that inference time does not only depend upon the number of learnable parameters in a model, as the inference time for Model2 and Model3 is almost similar, whereas, the number of learnable parameters in Model2 is 10 times higher than in Model3. This can be dependent on many parameters, including the depth, filter size, value of the stride, type of operations, and many more.
To understand the phenomenon of overfitting due to an increase in the number of CONV layers in a model, we can use the example of a model that classifies an image as a cow or not a cow. After a certain number of layers, adding more layers to the model will let it learn non-important features, leading to poor generalization of the model, e.g., learning to extract a bell from images of cows with bells around their necks in the training dataset, or a green background if images labeled as cows are captured in meadows.
For further experiments, we therefore chose Model1, which gives the best compromise between detection accuracy, memory requirement, and inference time. Model4 onwards is based upon improvements in the architecture of Model1. Model4 that incorporates only skip connections to Model1 displays a significant improvement in the performance of Model1 as the recall value has improved from 135 to 141 and the AP value also shows an improvement of 6% and 3.5% compared to Model1 and YOLO, respectively. The inventors of the residual block attribute the improvement in accuracy to the ability of the model to learn identity mappings that bypass the nonlinearities of a CONV layer. The performance improvement in Model4 suggests the effectiveness of the residual block not only in increasing the accuracy without increasing the learnable parameters, as Model4 has almost 84% less learnable parameters than YOLO. Model4 and Model1 have almost the same inference time.
Model5 indicates the impact of applying inception modules in the low-level and mid-level feature extraction layers, and it proves to be beneficial in improving the accuracy of Model4 slightly at the cost of increasing the number of learnable parameters and inference time significantly. In Model4 and Model5 we also used the bottleneck layer. Results of Model5 and Model6 show a slight improvement in the accuracy performance at the expense of a significant increase in the number of learnable parameters and significantly slower inference time. The average accuracy of manually detecting the soldering faults with the aid of a magnifying glass is almost 90% [5] and the fault detection accuracy given by the recall value for the optimal CNN is 84%. This suggests that CNN can be powerful in achieving human level accuracy given it is trained on a larger dataset with an equal number of diverse examples. Figure 5 shows a sample of the prediction result with grid lines drawn for understanding that the image is divided into 14×14 grid. The prediction returns a binary for each grid cell. In the future, with more data, we can work on drawing bounding boxes around the fault only.

Conclusion
It has been shown that CNN based AOI can be used to replace manual inspection to detect soldering faults on PCBAs. The problem was treated as an object detection task. Fast inferencing plays a vital role in AOI, and for this reason, we based our custom CNN on YOLO, which is a state-of-the-art fast object detection CNN. The experiments show that using state-of-the-art object detection CNNs in AOI can perform well in accuracy detection but does not prove to be resource efficient. Hence, transfer learning does not always provide an efficient solution for carrying out a CNN based AOI task. It was also shown that the accuracy performance of a custom CNN can be improved using optimization blocks without compromising its resource efficiency. The use of a bottleneck layer was effective in constraining the memory utilization while achieving high accuracy. Use of residual blocks had the most significant impact on accuracy improvement without any increase in resource utilization. It was seen that YOLO provides a simple technique for designing fast object detection CNN that generalizes well. This technique of dividing the image into a grid can be the basis of a custom CNN design for other types of fault detection in a PCBA.
The author believes that the performance and generalizability of the CNN model can be improved by collecting more data with a diversity of examples and classes. This research paper sets a solid foundation that CNNs can provide a simple, highly flexible, and fast AOI system for fault detection in PCBAs. In future work, the optimized architectural design principles described in this study can be used to detect multiple types of faults in PCBAs. It requires there to be an open source dataset for different types of faults in PCBAs.

Data Availability Statement
The data presented in this study are available in article.

Funding and Acknowledgements
The author would like to thank Sony Mobile Communications AB, Lund, Sweden for their contribution to this study.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.