NC State
Fang, Y., Guo, X., Chen, K., Zhou, Z., and Ye, Q. (2021). "Accurate and automated detection of surface knots on sawn timbers using YOLO-V5 model," BioResources 16(3), 5390-5406.


Knot detection is a challenging problem for the wood industry. Traditional methodologies depend heavily on the features selected manually and therefore were not always accurate due to the variety of knot appearances. This paper proposes an automated framework for addressing the aforementioned problem by using the state-of-the-art YOLO-v5 (the fifth version of You Only Look Once) detector. The features of surface knots were learned and extracted adaptively, and then the knot defects were identified accurately even though the knots vary in terms of color and texture. The proposed method was compared with YOLO-v3 SPP and Faster R-CNN on two datasets. Experimental results demonstrated that YOLO-v5 model achieved the best performance for detecting surface knot defects. F-Score on Dataset 1 was 91.7% and that of Dataset 2 was up to 97.7%. Moreover, YOLO-v5 has clear advantages in terms of training speed and the size of the weight file. These advantages made YOLO-v5 more suitable for the detection of surface knots on sawn timbers and potential for timber grading.

Download PDF

Full Article

Accurate and Automated Detection of Surface Knots on Sawn Timbers Using YOLO-V5 Model

Yiming Fang,a,b,c Xianxin Guo,a Kun Chen,a,* Zhu Zhou,b and Qing Ye c

Knot detection is a challenging problem for the wood industry. Traditional methodologies depend heavily on the features selected manually and therefore were not always accurate due to the variety of knot appearances. This paper proposes an automated framework for addressing the aforementioned problem by using the state-of-the-art YOLO-v5 (the fifth version of You Only Look Once) detector. The features of surface knots were learned and extracted adaptively, and then the knot defects were identified accurately even though the knots vary in terms of color and texture. The proposed method was compared with YOLO-v3 SPP and Faster R-CNN on two datasets. Experimental results demonstrated that YOLO-v5 model achieved the best performance for detecting surface knot defects. F-Score on Dataset 1 was 91.7% and that of Dataset 2 was up to 97.7%. Moreover, YOLO-v5 has clear advantages in terms of training speed and the size of the weight file. These advantages made YOLO-v5 more suitable for the detection of surface knots on sawn timbers and potential for timber grading.

Keywords: Defect detection; Surface knots; Sawn timber; YOLO-v5

Contact information: a: School of Mechanical & Electrical Engineering, Shaoxing University, Shaoxing 312000, P. R. China; b: School of Information Engineering, Zhejiang A & F University, Hangzhou 311300, P. R. China; c: Suncha Bamboo & Wood Technology Co. Ltd., Lishui 323899, P. R. China;

* Corresponding author:


Knots are remnants of branches found in sawn timber and have widely been considered as defects for timber grading (Qu et al. 2019). First, knots cause deviations in the fiber direction and significantly reduce the mechanical properties, such as Young’s modulus, shear modulus, etc. (Sarnaghi and Kuilen 2019). Second, the appearance of knots may bend the wood grain, thus, aesthetically making the wooden product unattractive to view. Finally, knots are susceptible to splitting during manufacturing, causing an uneven break in finishing (Wells et al. 2018). Therefore, knot determination and classification are critically important for either sorting the timber and optimizing it for further processing, or predicting the mechanical properties (Longuetaud et al. 2012; Hittawe et al. 2015).

Computer vision techniques are a promising tool for knot detection. There are numerous studies regarding the surface inspection of sawn timbers. Kamal et al. (2017) and Urbonas et al. (2019) reviewed the related work from the technological standpoint. The existing methods varied greatly from the different input images, but two common steps can be highlighted. The first was feature extraction. Knots have a slightly darker color and torn grain.

Various techniques, such as gray level cooccurrence matrix (GLCM) (Hu et al. 2011; Hashim et al. 2016), Gabor filter (Pölzleitner 2003; Hittawe et al. 2015), local binary pattern (LBP) analysis (Silvén et al. 2003; Zhang et al. 2008), etc., have been employed to extract the color or texture features. Next, classifiers have been utilized to analyze the extracted characteristics and distinguish the knots from the normal wood tissue. Most research has employed artificial neural network (ANN) techniques for addressing this problem (Xie and Wang 2015; Hashim et al. 2016; Kamal et al. 2017; Yu et al. 2019). Another common classifier used was support vector machine (SVM). As an example, a tree-structure SVM was proposed to classify four types of wood knots. The authors claimed that the average accuracy was 96.5% and the false alarm rate was only 2.25% (Gu et al. 2009). Many other methods have also been investigated, such as clustering (Silvén et al. 2003), compressed sensing (Zhang et al. 2016), and convex optimization (Chang et al. 2018).

The main drawback of these methods is that the performance suffers from variations in the material. The color and texture vary significantly while the tree species, the surface condition, or the light environment change (He et al. 2019). Finding a method to efficiently extract color or texture features of timber surface images with a large variety is formidable. Therefore, automated solutions of detecting surface knots on sawn timbers remain as a critical need for the wood industry and further exploration is still needed.

Convolutional neural network (CNN) is the leading technique for object detection, and the most recent papers regarding surface defect inspection rely on it (Dhillon and Verma 2020). As a universal function approximator, it has a strong ability to extract almost any high-level features from the input images. Particularly, it was reported to achieve good performance while applied for defect identification in the wood industry (Rudakov et al. 2019; He et al. 2019; Urbonas et al. 2019; Ding et al. 2020; He et al. 2020; Tu et al. 2021). Rudakov et al. (2019), to the best of the authors’ knowledge, was a pioneer in using CNN-based approaches to detect the timber surface defects. AlexNet, GoogLeNet, VGG-16, and ResNet-50 were compared in terms of the classification accuracy of mechanical damages of sawn timber. The experimental results showed that VGG-16 was the best approach and achieved over 92% accuracy (Rudakov et al. 2019). In Hu’s work, a pre-trained ResNet18 network, combined with transfer learning strategies, was utilized for the problems of wood defect, wood texture, and wood species classification (Hu et al. 2019). Some recently proposed CNN-based models, such as Mix-FCN (He et al. 2019; He et al. 2020), Faster R-CNN (Urbonas et al. 2019), Mask R-CNN (Hu et al. 2020; Shi et al. 2020), etc., have also found very successful application in this field.

This paper proposes an idea for applying the current state-of-the-art YOLO (You Only Look Once) in the automated detection of surface knots on sawn timbers. YOLO, originally designed by Joseph Redmon (Redmon et al. 2016), is an attractive CNN-based algorithm for object detection, classification, and localization in images and videos (Desai et al. 2020). During the past years, YOLO kept improving with some new algorithms to optimize the computing speed and achieve better performance. The fifth version of YOLO (YOLO-v5) was introduced by Glenn Jocher in June 2020 (Jocher et al. 2021). This model significantly reduced the model size (YOLO-v4 on Darknet had 244MB size whereas YOLO-v5 smallest model is 27MB). YOLO-v5 also claimed a higher accuracy and more frames per second than all previous versions.



Dataset 1

Some sawn timbers with sanded surfaces were purchased from a local sawmill. Parts of them were Metasequoia glyptostroboides, and the other parts were Pinus koraiensis. The number and the ratio were random. 305 images were collected from those sawn timbers and utilized as Dataset 1. The raw images were first processed with Adobe Photoshop software (San Jose, CA) to ensure that each image contained at least one knot. The image size ranged from 70 kB to 3354 kB. Figure 1 demonstrates some typical images in the dataset. All images were labeled by experienced workers, using the open-source software labelImg ( The red rectangular box in Fig. 1 illustrated the annotation results. Annotations were saved as XML files in PASCAL VOC format and then were transferred to YOLO format with a Python program.

Fig. 1. Some typical images in Dataset 1

Dataset 2

Dataset 2 was downloaded from the website of the University of Oulu, Finland (Silvén et al. 2003). It contains 839 images of spruce wood. They were RBG color images with 8 bits per channel and the size was 488 by 512 pixels. Figure 2(a) shows one typical image of the dataset.

Originally, each image was divided into rectangular regions. One rectangle corresponds to about 2.5 * 2.5 cm2 area of wood surface which has been manually labeled, as illustrated in Fig. 2(b). In total, there were 5,952 defect rectangles in the dataset. Table 1 lists the different types and quantities of defects.

Figure 2(b) shows that one whole knot was often split into two or more batches because the rectangles were arranged in a fixed position and the size was fixed. The geometric feature of the knot was then destroyed in each batch, which was not beneficial for the knot detection. Another drawback of this annotation method was that the feature of a small knot was easily submerged by the large area of normal wood tissues, like the small knot showed in Fig. 2(b). Therefore, all images were labeled manually in the same way as Dataset 1. Figure 2(c) shows the annotation results in this work. The dry knot and the small knot were labeled with one rectangle, respectively. The size and position of the rectangle were no longer fixed and completely depended on the knots.

Fig. 2. A typical image of Dataset 2 and its annotations: (a) The original image; (b) The original annotations; (c) The annotations in this work

Table 1. Summary of Dataset 2

This work focused on the detection method of knot defects, and all knot types were grouped into one category. Otherwise, the detection performance would not be adequate if the samples were split by knot classes. It should be worth noting that a part of the samples in Dataset 2 had a sanded surface, whereas the surface of the other part of the samples was unsurfaced, as shown in Fig. 3.

Fig. 3. Illustration of the surface condition of some samples in Dataset 2

To the authors’ knowledge, it is more in line with industrial practice. The common need for knot identification is in unsurfaced or “rough” boards, even though most of the existing research were conducted on the sanded samples.

Architecture of YOLO-v5 Model

Architecturally, the YOLO-v5 model is similar to YOLO-v4. As shown in Fig. 4, it consists of three main parts: Backbone, Neck, and Head (Zhu et al. 2020; Xu et al. 2021).

The first part, Backbone, extracts crucial features from the given input image. In YOLO-v5, CSPNet s (Cross Stage Partial Networks) are incorporated into Darknet, creating CSPDarknet as its backbone. Compared to the Darknet53 used by YOLO-v3, CSPDarknet has achieved considerable improvement in processing speed with equivalent or even superior detection accuracy (Wang et al. 2020).

The second part, Neck, is primarily employed to generate feature pyramids, which benefit YOLO-v5 in generalizing the object scaling for identifying the same object with different sizes and scales. In YOLO-v5, Neck employs Path Aggregation Network (PANet) as a parametric polymerization mechanism for different bone and detector levels. The feature grid is connected to all the feature layers by the adaptive feature pools provided by PANet. Consequently, the useful information obtained from each feature layer can be transmitted directly to the proposed subnetwork (Liu et al. 2018; Cheng and Zhang 2020).

The final detection is performed in the part of Head, which is the same as the previous YOLO-v3 and v4 versions. Head generates anchor boxes for feature maps and outputs final output vectors with class probabilities and bounding boxes of detected knots.

Fig. 4. The architecture of YOLO-v5 model

Experimental Configuration

The experiments were conducted in the environment represented in Table 2. The implementation of YOLO-v5 was downloaded from the website, ultralytics/yolov5. The release of YOLO-v5 includes four models of different sizes: YOLO-v5s (smallest), YOLO-v5m, YOLO-v5l, YOLO-v5x (largest). YOLO-v5m model was selected in this work due to the compromise of its modest size and outstanding performance. The network was pre-trained with Coco dataset and it was fine-tuned using Dataset 1 or Dataset 2 mentioned above.

Table 2. Configuration of Experimental Environment

Evaluation Metrics

The output of the detectors was compared with the manual annotations to evaluate the performance quantitatively. Precision, Recall Rate, and overall accuracy F-Score were calculated according to Eqs. 1 through 3 (Goutte and Gaussier 2005),




where TP (true positive) referred to the number of the correctly detected knots, FP (false positive) indicated the number of the extra knots that did not exist on the timber surface (commission error), and FN (false negative) denoted the number of knots that were not detected (omission error). Obviously, Precision indicated the ratio of correctly detected knots out of all detected knots. Recall Rate was an indication of the detector’s sensitivity. F-Score provided a way to combine both Precision and Recall Rate into a single measure that captured both properties. A higher F-Score indicated a more accurate model.


Experiments on Dataset 1

Data augmentation

The data loader of YOLO-v5 makes three kinds of augmentations automatically: scaling, color space adjustments, and mosaic augmentation. Mosaic is a novel and effective data augmentation technique, which combines 4 training images into one in certain ratios, as demonstrated in Fig. 5. It is beneficial to optimize the performance of the detector and avoid overfitting by enriching the training dataset.


The dataset was randomly split as follows: 80% for training, 10% for validation, and 10% for testing. The total numbers of knots contained in the three subsets were 305, 37, and 36, respectively. GIOU_Loss was used as the loss of the bounding box (Jiang et al. 2021) and the threshold for non-maximum suppression (NMS) of the bounding box was 0.45. Table 3 lists the main training parameters.

Fig. 5. Illustration of the mosaic data augmentation

Table 3. Main Training Parameter

In the training process, each iteration can be divided into two steps. First, the data in the training set was applied to the model, and the model automatically adjusted the weight according to the loss value. Then, the data in the validation set was applied to the model. Consequently, the loss value was calculated using the weight just updated. The loss value obtained using the data in the validation set was used as an important index to evaluate the performance of the model.

Fig. 6. The loss for the YOLO-v5 in training

The training finished after 1000 iterations, and a weight file with a size of 43.3 MB was obtained. The training work took 7.23 h. The loss for validation dataset during training is shown in Fig. 6.

After the first 20 iterations, the loss dropped rapidly from 0.144 to 0.052. Then, a steady decline can be observed from the curve of the loss even though an oscillation appeared. After 800 iterations, the loss tended to be stable and gradually approached 0.009. The loss curve showed that YOLO-v5 has a strong learning ability and can converge quickly.

Detection performance

The trained YOLO-v5 model was utilized to detect the images of the testing set. The object confidence threshold was 0.25. Figure 7 shows the results of six typical samples. Rectangles were employed to mark the knots detected. The confidence coefficients were also represented on the top of the rectangles.

Fig. 7. The detection results of some typical samples: (a) dark knot; (b) decayed knot; (c) sound knot; (d) pin knot; (e) edge knot; (f) irregular knot

The trained YOLO-v5 accurately detected the knot defects. The knots in Fig. 5(a) and (b) featured a different grain pattern and a distinctly different color. Figure 5(c) shows a sound knot whose color was very close to the surrounding tissues. The image shown in Fig. 5(d) contained some pin knots, which often have an actual size that is less than one-fourth of an inch in diameter. Note that some researchers argued that YOLO-v5, like other existing methods, is still deficient in detecting small targets, especially those near large targets (Wang et al. 2021). The detecting results in this work showed that YOLO-v5 is enough for the application of pin knot detection. An edge knot is shown in Fig. 5(e). The whole knot was split into two different pieces. As a result, the geometric shape and the grain changed significantly. The knot in Fig. 5(f) has an irregular shape. Generally, it was challenging for the traditional methods to identify all these types of knots. However, the detection results demonstrated that YOLO-v5 model can detect them with high confidence coefficients (most of the confidence coefficients are bigger than 0.90).

Table 4 lists the detailed results on the training set, the validating set, and the testing set. YOLO-v5 achieved the best performance on the training set. All 305 knots were identified accurately, and only one extra knot that did not exist was marked. Recall rate was 100% and Precision was 99.7%. The F-Score was 99.8%, which was very close to 100%. This is a high performance in the field of target detection. On the validating set, YOLO-v5 could find 36 out of 37 knots and made two commission errors. Precision, Recall Rate, and F-Score were 94.7%, 97.3%, and 96.0%, respectively. The detection performance was still very high on the testing set, even though it was slightly lower than that of the training set and the validating set. Totally, 33 out of 36 knots were identified and F-score was 91.7%. All the results demonstrated that YOLO-v5 can effectively learn enough information from the training set, and then correctly identify the knot defects from the background.

Table 4. Detailed Results on the Training Set, Validating Set, and Testing Set

Comparison YOLO-v5 with other methods

To further evaluate the performance, the YOLO-v5 model was compared with YOLO-v3 SPP (Liu et al. 2020 ) and Faster R-CNN (Ren et al. 2017). The implementation of these two models was also downloaded from the website, The backbone of Faster R-CNN was Restnet50. Due to the limitation of GPU, the batch size of training for these two models was set as 4. The training iterations was 1000. Table 5 lists the training time and the size of the weight file of these two models.

Table 5. Training Time and the Size of the Weight File of YOLO-v3 SPP Model and Faster R-CNN

The training time of these two models was longer than that of the YOLO-v5 model. Especially, the training time of Faster R-CNN was almost 155% of that of YOLO-v5 model. The weight file size of these two models was also significantly larger than that of YOLO-v5 model. The weight file size of YOLO-v3 SPP was almost 6 times that of the YOLO-v5 model, and the Faster R-CNN was almost 3 times that of the YOLO-v5 model.

Figure 8 shows the output of the image from Fig. 7(d) using these two detectors. The performance of these two models was worse than that of the YOLO-v5 model. Many small knots were omitted to be detected. Moreover, some of the confidence coefficients obtained by YOLO-v3 SPP were quite low. Table 6 listed the quantitative results of these two models. By comparing values in Tables 4 and 6, it can be easily observed that YOLO-v5 model had the best detection results regardless of the Precision, Recall Rate, or F-Score. YOLO-v3 SPP achieved almost the same results, and Faster R-CNN was the worst one.

Fig. 8. The detection result of the image showed in Figure 5(d) obtained from: (a) YOLO-v3 SPP; (b) Faster R-CNN

Table 6. Quantitative Results of YOLO-v3 SPP and Faster R-CNN

While checking all the output images of Faster R-CNN, it was found that Faster R-CNN could accurately detect most of the knot defects. However, it sometimes marked the area of torn grain as a knot. One typical example was shown in Fig. 9(a). Another reason of the decrease in Precision was that the detector labeled an actual knot two or more times, as shown in Fig. 9(b) and 9(c). For comparison, Fig. 9(d)-(f) showed the detection results of the same sample using YOLO-v5. Obviously, YOLO-v5 avoids these problems.

Fig. 9. The comparison of detection results of two methods using some typical samples. (a)-(c) The output of Faster R-CNN. (d)-(f) The output of YOLO-v5

Experimental Results on Dataset 2

The same tests were conducted on Dataset 2. At the first stage, 80% of the images were used for training. Half of the remaining images were employed to validate the models and the other half part were utilized to evaluate the knot detection. Figures 10 to 12 depicted the results of three typical images.

According to the original annotation, there were two small knots and two dry knots on the image of the sample st1184, which is shown in Fig. 10(a). Figure 10(b) illustrates the detection results of the YOLO-v5 model. Undoubtedly, the YOLO-v5 model identified all the knot defects correctly. Figure 10(c-d) shows that the four knots could also be recognized by YOLO-v3 SPP and Faster R-CNN. However, the confidence coefficients obtained by YOLO-v3 SPP were quite low, whereas Faster R-CNN labeled one actual knot two or three times. These two problems can also be found from the previous experimental results on Dataset 1.

Fig. 10. Detection results of the sample st1184: (a) Original annotation; (b) Result of YOLO-v5; (c) Result of YOLO-v3 SPP; (d) Result of Faster RCNN

Similar results are shown in Fig. 11. While inspecting the image of sample st1188, the YOLO-v5 model could identify the dry knot with a confidence coefficient of 0.96, whereas the coefficient obtained by YOLO-v3 SPP was only 0.68. For the same input image, Faster R-CNN made a commission error besides marking the dry knots repeatedly.

The image of the sample st1407 depicted in Fig. 12(a) was quite complicated. It included an encased knot, three small knots, and an edge knot. Particularly, the encased knot had an irregular shape and the edge knot was incomplete and very small. All the detectors omitted the encased knot except the YOLO-v5 model. Similar to previous results, YOLO-v3 SPP detected the small knots with low confidence coefficients and overlapping rectangles could be found in the result of Faster R-CNN.

At the next stage, the impact of the size of the training set on the detection accuracy was investigated. The detectors were trained with train sets included 80%, 70%, 60%, and 50% images of Dataset 2, respectively. Similarly, the remaining images were divided into two halves. One half was used as the validating set, and the other half was used for testing.

Fig. 11. Detection results of the sample st1188: (a) Original annotation; (b) Result of YOLO-v5 model; (c) Result of YOLO-v3 SPP; (d) Result of Faster R-CNN

Fig. 12. Detection results of sample st1407: (a) Original annotation; (b) Result of YOLO-v5; (c) Result of YOLO-v3 SPP; (d) Result of Faster R-CNN

Figure 13 demonstrated the detection performance of the three types of detectors. Overall, YOLO-v5, among the three methods, always achieved the best identification performance and Faster R-CNN was the worst one. While 80% of images were used for training, F-Score values obtained from the three detectors were 97.7%, 93.4%, and 77.2%, respectively. The same tendency could also be observed when the detectors were trained with 60% of all the images. The three F-Score were 93.6%, 91.9, and 72.3%.