Abstract
A wood species classification scheme was developed based on open set using an improved Nearest Non-Outlier (NNO) classifier. Near infrared (NIR) spectral curves were collected in spectral band 950 to 1650 nm by a micro spectrometer. The spectral dimension reduction was performed with a Metric Learning (ML) algorithm. Two improvements were proposed in the following NNO classifier. First, a cluster analysis was performed in each wood class by using a Density Peak Clustering (DPC) algorithm to get 1 to 3 clusters. A fixed threshold for all wood classes was replaced by a variable for all clusters. This threshold defines an internal boundary for one wood species to further compute a class membership score for all wood species. The classification accuracy based on these clusters of each wood class was better than that based on each class. The experimental results in different open set scenarios demonstrate that the improved NNO classifier outperformed the original NNO classifier and some other state-of-the-art open set recognition (OSR) algorithms.
Download PDF
Full Article
Wood Species Classification in Open Set Using an Improved NNO Classifier
Ke-Xin Zhang and Peng Zhao *
A wood species classification scheme was developed based on open set using an improved Nearest Non-Outlier (NNO) classifier. Near infrared (NIR) spectral curves were collected in spectral band 950 to 1650 nm by a micro spectrometer. The spectral dimension reduction was performed with a Metric Learning (ML) algorithm. Two improvements were proposed in the following NNO classifier. First, a cluster analysis was performed in each wood class by using a Density Peak Clustering (DPC) algorithm to get 1 to 3 clusters. A fixed threshold for all wood classes was replaced by a variable for all clusters. This threshold defines an internal boundary for one wood species to further compute a class membership score for all wood species. The classification accuracy based on these clusters of each wood class was better than that based on each class. The experimental results in different open set scenarios demonstrate that the improved NNO classifier outperformed the original NNO classifier and some other state-of-the-art open set recognition (OSR) algorithms.
DOI: 10.15376/biores.20.1.944-955
Keywords: Wood species classification; Open set recognition; Spectral analysis; NNO algorithm
Contact information: School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou 545006, China; * Corresponding author: bit_zhao@aliyun.com
INTRODUCTION
There are approximately 60,000 tree species around the world according to statistical data. It is very hard to develop a wood species classification system to classify all these tree species. In most cases, only those tree species in a specific region (e.g., in Heilongjiang Province of China) or specific category (e.g., mahogany) need classification. These specific tree species are included in the training set of the above-mentioned classification system. However, those tree species not in the training set should be rejected correctly by this classification system. In summary, wood species classification should be studied in an open set scenario in practice. However, most wood species classification investigations are performed in a closed set scenario with emphasis on classification methodologies such as spectral analysis and anatomical analysis (Zhan et al. 2023; Ma et al. 2021; Park et al. 2021; Tuncer et al. 2021). The wood species in the training set (i.e., the known species) can be classified correctly, whereas those not in the training set (i.e., the unknown species) will be misclassified as one known species in a closed set scenario. In fact, to the best of our knowledge, the wood species number in the training set of almost all wood species classification systems is usually under 100. It is appropriate to study wood species classification systems in an open set scenario, since it is very likely that some wood samples from unknown wood species will be encountered by these systems.
Open set recognition (OSR) has been investigated for more than 10 years. It can not only classify the known classes in the training set correctly, but also reject those unknown classes not included in the training set effectively (Geng et al. 2021). Scheirer et al. (2013) proposed a 1-vs-Set Machine based on the one-class Support Vector Machine (SVM) for OSR application in image processing. A Compact Abating Probability (CAP) model was further proposed later (Scheirer et al. 2014) to combine with a statistical Extreme Value Theory (EVT). A Weibull distribution calibrated SVM (W-SVM) was proposed to further improve the OSR classification accuracy. Jain et al. (2014) proposed a PI-SVM, which also uses the EVT to model the positive training samples in a decision boundary. Except for the OSR schemes based on SVM, some other machine learning based OSR schemes have also been proposed. For instance, Zhang and Patel (2017) proposed a sparse representation based OSR scheme, which uses the EVT to model the tail distribution of reconstruction loss. Junior et al. (2017) proposed an Open Set Version of Nearest Neighbor Classifier (OSNN). In this scheme, the distances between a detected sample s and the two nearest neighbor samples t, u from two different classes are calculated and the ratio is computed as Ratio = d(s, t)/d(s, u). If Ratio ≤ TR, then s is classified into the class which includes the sample t; otherwise, s is rejected as an unknown class. Moreover, some deep learning based OSR schemes have been proposed such as OpenMax neural network (Bendale and Boult, 2016), where the SoftMax layer is replaced by OpenMax layer to modify the membership probability of known and unknown classes.
Bendale and Boult (2015) extended the Nearest Class Mean (NCM) classifier and propose the Nearest Non-Outlier (NNO) classifier for OSR use. The distances between a detected sample and each known class mean are computed to classify the known classes and reject the unknown classes. Moreover, a Metric Learning (ML) algorithm (Mensink et al. 2013) is used for feature dimension reduction. When a new class is added into the training set of the classification system, the previous projection matrix W can still be used with near-zero errors, as pointed out by Mensink et al. (2013). However, this NNO classifier can achieve a satisfied classification accuracy in open set scenario only when each known class is represented by a similar sphere distribution (i.e., with a same sphere radius approximately). In practice, this strict constraint is hard to satisfy.
In this article, this NNO classifier was improved so that it can be used for known classes with different shape distributions (e.g., sphere structure or manifold structure) in OSR use. Specifically, every known class is further processed by an automatic cluster analysis to get 1 to 3 clusters with a sphere structure approximately. In a known class, each cluster usually has a different size. Then the distances between a detected sample and each cluster of one known class are computed to get membership probability of known classes. Moreover, each cluster’s size threshold is computed by an optimal strategy. In this way, the proposed improved NNO classifier can achieve more accurate classification results in open set in practice for known classes with different shape distributions. As for the classification feature, the NIR spectral curve is used here, since it has the advantages of fast speed, high accuracy, and non-destructive testing (Ma et al. 2021; Park et al. 2021; Tuncer et al. 2021).
EXPERIMENTAL
There were in total 35 wood species used in the experimental wood dataset, which included both broadleaved and coniferous tree species. The wood dataset contained some similar wood species with similar colors and textures or within the same genus. The specific tree species information is illustrated in Table 1. The cross sections of these wood species were used for spectral acquisition and are illustrated in Fig. 1. The Ocean Optics Flame-NIR micro spectrometer was used to pick up the NIR spectral curves. The effective wavelength band was 950 to 1650 nm, with a wavelength resolution of 5.4 nm, respectively. Each wood species consisted of 50 samples (i.e., NIR spectral curves) so that there were a total of 1750 samples. Each spectral curve was represented by a 128-dimensional (128D) vector.
Table 1. Information on Experimental Samples
Before the wood spectral collection, the wood sample pre-processing was performed. First, 25 wood blocks from different trees were selected for every tree species. These 25 wood blocks are then cut into small wood samples with size of 2 × 2 × 3 cm. The 2 × 2 cm surface was the cross section, while the 2 × 3 cm surface was the radial or tangential section. Second, 2 wood samples from every wood block were randomly selected so as to obtain 50 wood samples with size of 2 × 2 × 3 cm in total for each wood species. To delete the uneven burrs from the wood cutting procedure, sandpaper of 800 to 1200 mesh was used to polish the cross sections of wood samples. Finally, the wood NIR spectral curves may be sensitive to some external environmental factors such as temperature and humidity so that the spectral acquisition was performed in a room with temperature at 24 °C and humidity at 35%. It should be noted that the physical property of wood samples is influenced by some variables such as the age of trees, geographic origin, growth ring position, and proportion of latewood versus earlywood. These variables are controlled effectively in wood spectral acquisition so that the within-class difference of spectral curves for each wood species is adequately small. This control is implemented in practice by ensuring that trace (Sw) is small or less than a threshold for every species (i.e., Sw denotes the within-class scatter matrix).
Fig. 1. Cross sections of the 35 wood species (the serial number in Figure 1 is same as that in Table 1).
Figure 2 shows the experimental spectral collection setup. This spectral collection setup mainly consists of a computer, spectrometer, optical fiber, and radian (i.e., halogen lamp). The spectral acquisition is performed as following steps. A spectral calibration is performed by using a standard whiteboard. Then one wood sample is placed on the holder, and the distance between this wood sample and the fiber probe is adjusted. Finally, the spectral reflectance curves are picked up and are saved in the computer.
Fig. 2. Experimental spectral acquisition setup
Spectral Dimension Reduction
Before the wood spectral dimension reduction, a wood spectral pre-processing procedure is required. Figure 3 illustrates the spectral reflectance curves of the 35 wood species. A standard normal variation (SNV) correction and a smoothing correction with a moving window of 5 × 5 size are often used for the NIR spectral curves to ensure a good classification accuracy.
Fig. 3. Spectral reflectance curves of cross sections of 35 wood species
The NIR spectral curve may be a 128D vector, so that a spectral dimension reduction is usually performed to decrease the redundant information and increase the computational efficiency. Some feature dimension reduction algorithms can be used such as principal component analysis (PCA) (Reddy et al. 2020), multidimensional scaling (MDS) (Mignotte 2011), locally linear embedding (LLE) (Yu et al. 2020), Laplacian (Belkin and Niyogi 2003), and Kernel PCA (Alhayani and Ilhan 2017).
In this work, a Metric Learning (ML) algorithm proposed by Mensink et al. (2013) was used for spectral dimension reduction. When a new class is added into the training set of the open set classifier, the previous projection matrix W can still be used with near-zero errors, as pointed out by Mensink et al. (2013). Therefore, this projection matrix W can be used in an incremental learning classifier, in which the number of known classes is increased gradually. This incremental learning classifier is usually used in an open set scenario, since one often wishes to increase the number of known classes in a training set so as to classify more known classes and reject fewer unknown classes correctly. Due to the above-mentioned advantages in the incremental learning classifier, this ML algorithm is also used in the NNO classifier (Bendale and Boult 2015). If one defines an original 128D spectral vector as v, then a new spectral vector after spectral dimension reduction is denoted as W v. The detailed computation procedure is illustrated by Mensink et al. (2013), which is omitted here.
Original NNO Classifier
The original NNO classifier was proposed by Bendale and Boult (2015). In summary, an original NNO classifier is an extension version of NCM classifier for OSR use. In an NNO classifier, a confidence score for one known class y is defined as follows.
(1)
where x is a detected sample vector, and the parameter is a fixed threshold value to define a sphere radius around each known class mean . This is determined in advance by an experienced expert and it is a constant value for all known classes in the original NNO classifier. The distance The normalization factor is defined as so that sy integrates to 1 in the domain sy(·) > 0 (i.e., represents a standard gamma function). In fact, is the reverse of sphere volume with radius and dimension m.
As for open set classification, a detected sample x is rejected by one known class y when , and this x is rejected as an unknown class only when it is rejected by all known classes. Otherwise, this x is classified as a known class with the largest positive . The projection matrix W is learned offline in an initial training set of known classes. This matrix can still be used with near-zero errors in an incremental learning classifier where some new classes are added into the training set gradually, as pointed out by Mensink et al. (2013).
Proposed Improved NNO Classifier
The original NNO classifier (Bendale and Boult 2015) has some disadvantages. First, a constant threshold is used for all known classes. Assuming that all known classes have sphere distribution structures, these sphere radii are usually different. Therefore, different threshold values should be used. Second, in practice, all known classes may have different topological distribution structures such as sphere structure and manifold structure. The manifold structure can be usually divided into different clusters and this division can be fulfilled by a cluster analysis.
To overcome the above-mentioned two disadvantages, an improved NNO classifier version was proposed here. For every known class, a Density Peak Clustering (DPC) algorithm is used to perform a cluster analysis (Rodriguez and Laio 2014). In this clustering algorithm, the number of clusters is not required to be determined in advance, and this number can be determined automatically in the clustering process. This algorithm is hardly influenced by outliers. The detailed clustering procedure is illustrated as follows.
The clustering centers have relatively high local density, and they are relatively far from those points with higher local density. The local density and distance were calculated for each sample . A local density of each sample is calculated by either a cut-off kernel Eq. 2 or a Gaussian kernel Eq. 3,
(2)
(3)
(4)
where is the distance between sample and sample is the cut-off distance that is determined in advance by an expert; is a 0-1 function defined as Eq. 4. Therefore, the computed by Eq. 2 is a discrete value, whereas that by Eq. 3 is a continuous value. The is the distance between a sample and its nearest sample among those samples with higher local density, as illustrated by Eq. 5. Therefore, a sample is possibly a clustering center when it has a relatively large so that the probability of one sample being a clustering center can be computed by Eq. 6. Once the clustering centers are determined, a detected sample is classified into the cluster which consists of the nearest neighbor of this sample with a higher local density.
(5)
(6)
The DPC algorithm is applied to the spectral vectors after spectral dimension reduction by using the above ML algorithm (Mensink et al. 2013). This spectral vector is denoted as . In the following wood spectral classification experiments, 1 to 3 clusters were obtained for each known wood species. All these clusters can be approximated by spheres with different radii (i.e., this is a threshold for a cluster sphere). Then the original NNO classification is performed by using these clusters with different sizes. Please note that the in Eq. 1 is computed by a Mahalanobis distance instead of Euclidean distance, as illustrated by Eq. 7,
(7)
where is a cluster center after spectral dimension reduction; C is the covariance matrix for the cluster whose cluster center is The different thresholds of different clusters can be obtained by an optimal grid search. More accurate classification is achieved because the NNO classifier is applied with those extracted clusters with different radius thresholds. However, the original NNO classifier is applied with those original known classes with a same radius , even some classes may have manifold distribution structures. This situation may produce large classification errors.
RESULTS AND DISCUSSION
Classification Performance Evaluations
Performance evaluation measures play an important role in judging the classification performance of the OSR classifier. To comprehensively assess the classification performance in OSR, three measures such as F-Score, Kappa coefficient, and overall recognition accuracy (ORA) are used. The F-Score computation is based on the Precision and Recall, as illustrated in Eqs. 8 to 10. Here represents the number of correctly classified samples in known classes, whereas represent the number of misclassified samples in known and unknown classes, respectively.
(8)
(9)
(10)
Dataset Partition
There are 35 wood species in total in the wood dataset as illustrated in Table 1, and each wood species consists of 50 spectral samples. The wood dataset is divided into 3 groups to testify the proposed improved NNO classifier in open set scenario. These 3 groups are explained as follows.
Group 1: The initial 5 wood species are selected randomly as the known species to form the training set of the NNO classifier, and these 5 wood species are used in the ML algorithm (Mensink et al. 2013) to obtain the projection matrix W. Then in the incremental learning process, another 5 wood species are selected randomly and added into the training set of the improved NNO classifier. This incremental learning process is repeated for 4 times. Finally, the training dataset consists of 25 known wood species. The unknown wood dataset consists of 5 wood species. This wood dataset partition is illustrated in Table 2.
Group 2: The initial 5 wood species are selected randomly as the known species to form the training set, and these 5 wood species are used in the ML algorithm (Mensink et al. 2013) to obtain the projection matrix W. Then in the incremental learning process, another 5 wood species are added into the training set. This incremental learning process is repeated for 3 times. Finally, the training dataset consists of 20 known wood species. The unknown wood dataset consists of 10 wood species. This wood dataset partition is illustrated in Table 3.
Group 3: The initial 10 wood species are selected randomly as the known species to form the training set, and these 10 wood species are used in the ML algorithm (Mensink et al. 2013) to obtain the projection matrix W. Then in the incremental learning process, another 5 wood species are added into the training set. This incremental learning process is repeated for 3 times. Finally, the training dataset consists of 25 known wood species. The unknown wood dataset consists of 5 wood species. This wood dataset partition is illustrated in Table 4.
Table 2. The Known and Unknown Wood Species Number Partition in Group 1
Table 3. The Known and Unknown Wood Species Number Partition in Group 2
Table 4. The Known and Unknown Wood Species Number Partition in Group 3
Wood Species Classification Comparisons
The proposed improved NNO classifier was compared in open set with other 5 representative OSR classifiers. The original NNO classifier (Bendale and Boult 2015) was used for a baseline comparison. Another one conventional OSR classifier was used, which consisted of two parts.
Table 5. The OSR Classification Performance Comparisons in Group 1
Table 6. The OSR Classification Performance Comparisons in Group 2
Table 7. The OSR Classification Performance Comparisons in Group 3