Abstract
Prior work on computer-vision wood identification (CVWID) for North American hardwoods yielded two independent deep learning models – a 22-class model for diffuse-porous woods and a 17-class model for ring-porous woods – but did not address semi-ring-porous woods nor provide a CVWID solution for an unknown specimen without a human first determining which model to deploy. As untrained human operators would lack the anatomical proficiency to differentiate among porosity domains, it is necessary to develop a consolidated model that can identify diffuse-, ring-, and semi-ring-porous woods. Previous research suggests that prediction accuracy might decrease as class number grows. A potential strategy to reduce the number of classes a CVWID system must consider at a time is to hierarchically deploy a cascade of models. In pursuit of a unified model that can cover North American hardwoods of all porosity types, this study compared the accuracies of a consolidated 39-class (ring- + diffuse-porous) model and a consolidated 42-class (ring- + diffuse- + semi-ring-porous) model with a two-tiered, cascading model scheme whereby images are first differentiated into three porosity domain classes and then again into only those taxonomic classes with that porosity. The results showed that the cascading model scheme can mitigate the accuracy reductions incurred by the 42-class model and nearly eliminate the occurrence of cross-domain misidentifications.
Download PDF
Full Article
Predicting Hardwood Porosity Domains: Toward Cascading Computer-Vision Wood Identification Models
Frank C. Owens,a,* Prabu Ravindran,b,c Adriana Costa,a Rubin Shmulsky,a and Alex C. Wiedenhoeft a,b,c,d,e
Prior work on computer-vision wood identification (CVWID) for North American hardwoods yielded two independent deep learning models – a 22-class model for diffuse-porous woods and a 17-class model for ring-porous woods – but did not address semi-ring-porous woods nor provide a CVWID solution for an unknown specimen without a human first determining which model to deploy. As untrained human operators would lack the anatomical proficiency to differentiate among porosity domains, it is necessary to develop a consolidated model that can identify diffuse-, ring-, and semi-ring-porous woods. Previous research suggests that prediction accuracy might decrease as class number grows. A potential strategy to reduce the number of classes a CVWID system must consider at a time is to hierarchically deploy a cascade of models. In pursuit of a unified model that can cover North American hardwoods of all porosity types, this study compared the accuracies of a consolidated 39-class (ring- + diffuse-porous) model and a consolidated 42-class (ring- + diffuse- + semi-ring-porous) model with a two-tiered, cascading model scheme whereby images are first differentiated into three porosity domain classes and then again into only those taxonomic classes with that porosity. The results showed that the cascading model scheme can mitigate the accuracy reductions incurred by the 42-class model and nearly eliminate the occurrence of cross-domain misidentifications.
DOI: 10.15376/biores.19.4.9741-9772
Keywords: Wood identification; XyloTron; Computer vision; Machine learning; Deep learning; Porosity domain; Cascading models
Contact information: a: Department of Sustainable Bioproducts, Mississippi State University, Starkville, MS, USA; b: Department of Botany, University of Wisconsin, Madison, WI, USA; c: Center for Wood Anatomy Research, USDA Forest Service, Forest Products Laboratory, Madison, WI, USA; d: Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, USA; e: Departamento de Ciências Biológicas (Botânica), Universidade Estadual Paulista – Botucatu, São Paulo, Brasil.
* Corresponding author: fco7@msstate.edu
INTRODUCTION
Wood identification is an important tool for combatting the global and pervasive problem of illegal logging and timber trade (Johnson and Laestadius 2011; Dormontt et al. 2015; Koch et al. 2015; Lowe et al. 2016; UNODC 2016), investigating supply chain integrity (Wiedenhoeft et al. 2019), and verifying compliance with import/export declaration requirements. Conventional wood identification relies on humans trained in wood anatomy to differentiate among taxa by observing anatomical features under a hand lens and/or microscope (Wheeler and Bass 1998; Gasson 2011). Learning to manually identify wood anatomically requires extensive training, and there is a severe shortage of reliable conventional wood identification capacity in North America (Wiedenhoeft et al. 2019) and worldwide. For this reason, scientists are utilizing advances in technology to develop non-conventional wood identification methods based on computer-vision, DNA, mass spectroscopy/spectrometry, and others to reduce dependence on human expertise in wood anatomy (Johnson and Laestadius 2011; Dormontt et al. 2015; Lowe et al. 2016). Proliferation of these new, non-conventional tools promises to mitigate the shortage of wood identification capacity and better empower industrial compliance with laws and regulations while also enabling law enforcement around the world to detect and prosecute illicit activity.
In conventional human-based identification, a wood anatomist classifies a specimen based on observations of anatomical features (Wheeler and Bass 1998; Gasson 2011). Until an anatomist can recognize a taxon on sight, s/he typically relies on a wood identification key to arrive at a potential identification and then compares the unknown specimen to a known reference specimen and confirms an identification. A key is a decision tree comprising a series of (often dichotomous) choices arranged in a hierarchy. At each level of the hierarchy, the anatomist must decide which of two (or more) descriptions best characterize the anatomical features they observe in the specimen. The option they select determines the next set of choices they must consider further down in the hierarchy, and so on, until they arrive at a terminal description and the name of the taxon (usually a genus, species, or subgeneric category). A well-designed wood identification key works because it focuses attention on only a limited set of anatomical features at a time and prevents wildly deviant conclusions – presuming each user decision is correct – by limiting the remaining options based on previous choices. It also reduces the number of possible solutions as the anatomist descends from one level to the next. In computer science terms, using a wood identification key is the process of choosing a path from the root node (the first level in the key) to a leaf node (the terminal identification) by means of a cascade of solitary decisions at each node on this path.
In many wood identification keys that include temperate hardwood species, a set of alternatives regarding the size and distribution of pores across a growth ring (often referred to as porosity) appears early in the decision tree. These typically include diffuse-porous, ring-porous, and semi-ring-porous. Diffuse-porous woods are characterized by growth rings in which the earlywood pores are not conspicuously larger than the latewood pores (Fig. 1a, Panshin and de Zeeuw 1980). Conversely, ring-porous woods typically exhibit growth rings with a zone of larger earlywood pores abruptly transitioning in size to latewood with conspicuously smaller pores (Fig. 1b, Panshin and de Zeeuw 1980).
Fig. 1. Examples, from left to right, of the three porosity domains: diffuse-porous (a: Acer saccharum), ring-porous (b: Quercus falcata), and semi-ring-porous (c: Diospyros virginiana) woods
Semi-ring-porous (or sometimes semi-diffuse-porous) is a category intermediate between the former and the latter in which growth rings exhibit a comparatively steady decrease in pore diameter from the earlywood to the latewood (Fig. 1c, Panshin and de Zeeuw 1980).
The concept of porosity domain is well established in the wood anatomy and identification literature. The terms ring-porous and diffuse-porous have been used since at least the 19th Century (Roth 1895). The addition of the category semi-ring-porous in later years (Lodewick 1928; Brown et al. 1949) suggests that porosity exists along a spectrum. Wood belonging to the genera Carya and Populus, for example, exbibit pore sizes and distributions that commonly intergrade between ring- and semi-ring-porous, and semi-ring-porous and diffuse-porous, respectively (Hoadley 1990). Despite the potential for confusion at the border of discrete categories assigned to a more-or-less continuous characteristic, these porosity categories, or domains, have proven useful for separating woods anatomically for more than a century.
A non-conventional wood identification method that has shown great promise is computer-vision (Khalid et al. 2008; Esteban et al. 2009; Nasirzadeh et al. 2010; Wang et al. 2010; Hermanson and Wiedenhoeft 2011; Ravindran et al. 2018, 2022a; Hwang and Sugiyama 2021). Computer-vision wood identification (CVWID) involves the capture and analysis of digital images of wood specimens by trained classification models leading to an identification (as reviewed in Hwang and Sugiyama 2021). Recently, deep learning, a powerful and flexible approach to training classification models, has been employed to sort images into their appropriate, pre-defined classes (typically corresponding to species, genera and/or anatomically distinct subgeneric categories, e.g., Ravindran et al. 2018, 2022a; Liu et al. 2024). In addition to high predictive accuracy, the relatively low cost and portability of CVWID systems such as the XyloTron (Ravindran et al. 2020) and XyloPhone (Wiedenhoeft 2020) make them readily deployable in the field.
In previous work on North American hardwoods, Ravindran et al. (2022a,b) designed separate XyloTron models to differentiate 22 classes of diffuse-porous and 17 classes of ring-porous woods. Deploying each model separately would require the operator to make a porosity domain classification (e.g., diffuse- or ring-porous) prior to selecting the appropriate model. As differentiating among porosity domains requires at least some understanding of wood anatomy, maintaining separate models falls short of the promise of full automation. To realize a unified CVWID model for North American hardwoods, it is necessary to combine ring-porous and diffuse-porous woods into the same model and add semi-ring-porous woods to cover the entire porosity domain spectrum.
Consolidating models from disparate porosity domains has the potential to impact model performance. Previous research suggests that predictive accuracy could decrease as class number grows (Bilal et al. 2018; Shigei 2019; Ravindran et al. 2022a, b), so it is not unreasonable to expect an increase in misclassifications when combining a 22-class label space with a 17-class label space to yield a 39-class model. In addition, even highly accurate CVWID models have been shown to occasionally produce anatomically unexplainable predictions of the kind that no human would likely make (Ravindran et al. 2021, 2022a,b), defined as a Type 3 misclassification in Ravindran et al. (2022a). In a consolidated model of North American hardwoods, credibility-reducing cross-domain misclassifications between woods of disparate porosity domains become a possibility.
While they both arrive at a common end – namely, the assignment of a taxonomic label to a specimen of wood – the process of CVWID is different from conventional human-based wood identification. The image-based CVWID used in Ravindran et al. (2022a,b) differs from human-based identification in two important ways. First, it is not clear how the features the model detects in the digital image correspond to anatomical features that would be used in a key. Second, computer-vision image classification is based on a single decision step as opposed to the explicit, cascading sequence of decisions found in a wood identification key. In short, though the features employed by the model are implicitly hierarchical, it is difficult to rigorously explain why a decision was made by a trained classification model.
These differences contribute to a few limitations. CVWID, though rapid and highly accurate, still has the potential to make Type 3 misclassifications, those that trained humans would almost never make such as cross-domain misidentifications (e.g., confusing a hardwood for a softwood or a diffuse-porous wood for a ring-porous wood). Also, as the number of classes increases, the model has no explicit mechanism to break down the task of identification into smaller steps to limit the number of classes it must discriminate. Breaking down a CVWID model with many classes into a cascade or tree of models with fewer classes at each level might help reduce the occurrence of Type 3 misclassifications and better enable a CVWID system to handle larger numbers of classes without a drop in accuracy, especially in the medium-sized dataset regime typical in CVWID.
Until a CVWID model can employ explicit and enumerable semantic information allowing it to identify the pixels in an image that correspond to particular anatomical features (such as vessels, rays, fibers, etc.), it is not possible to duplicate the decision-making process of a wood identification key. It is possible to approximate that process by developing new classifiers based on wood anatomical character-based label spaces rather than taxonomic label spaces – for example, woods with marginal parenchyma vs. woods without marginal parenchyma.
In pursuit of a unified model that can cover all commercial North American hardwoods, this study had two objectives. The first was to determine how accurately a convolutional neural network (CNN)-based model, trained using the same pipeline from Ravindran et al. (2022a, 2022b), can predict the 39 classes of (22) diffuse- and (17) ring-porous woods from those publications plus three additional classes of semi-ring-porous woods. The second was to determine if gains in accuracy might be achieved (or losses in accuracy might be mitigated) by creating a two-level decision tree wherein images are first classified by a root classifier into one of three porosity domain classes (diffuse-, ring- or semi-ring-porous) and then again by a first level model covering only those woods with that porosity. The results of this investigation should help determine viable options for structuring future CVWID models aimed at covering large numbers of classes using a richer set of anatomical characters akin to a traditional wood identification key.
EXPERIMENTAL
Materials
Specimens and images
The datasets for this study comprised 1) the images of diffuse-porous specimens used in Ravindran et al. (2022a), 2) the images of ring-porous specimens used in Ravindran et al. (2022b), and 3) a new set of images from specimens of three semi-ring-porous species (Diospyros virginiana, Juglans cinerea, and Juglans nigra). The images for the training datasets were captured from specimens, chosen to represent characteristic wood anatomical variability for the classes, sourced exclusively from a) the MADw and SJRw collections at the USDA Forest Products Laboratory and b) the Tw collection at the Royal Museum of Africa. The images for the testing datasets were captured from specimens sourced exclusively from the David A. Kribs (PACw) and teaching collections at Mississippi State University. Specimens in all xylaria were at moisture contents consistent with ambient, indoor, conditioned conditions (typically assumed to range from ~5% to 9% moisture content depending on season and location).
To prepare each wood specimen for imaging, the transverse surface was polished in coarse-to-fine progression on a benchtop disc sander at grits of 80, 180, 240, 400, 600, 800, and 1500, but note that testing-dataset images produced at much coarser grit levels or with a knife as in the field (Ravindran et al. 2023) are still largely identifiable, as are digitally perturbed testing dataset images (Owens et al. 2024) for a Peruvian woods CVWID model (Ravindran et al. 2021). The polished surfaces were imaged using the XyloTron platform at a linear resolution of microns per pixel. Multiple, non-overlapping 2048 × 2048-pixel images (representing 6.35mm-by-6.35mm of tissue) were acquired from each specimen. Commonly, images are of heartwood, though some sapwood images are doubtless present for each class. Images with a sapwood-heartwood transition were culled.
Label spaces
Descriptions of the six label spaces used in this study are shown in Table 1. The first three were used for the “domain-based” models: the 22-class label space for the diffuse-porous woods from Ravindran et al. (2022a), the 17-class label space for the ring-porous woods from Ravindran et al. (2022b), and a new 3-class label space for semi-ring-porous woods.
Table 1. Descriptions of the Six Label Spaces Used in This Study
The classes comprising these three label spaces are detailed in Table 2. The next two label spaces are used for the “consolidated” models: a 39-class label space (39DP-RP) comprising all the classes from 22DP and 17RP, and a 42-class label space (42-DP-RP-SRP) comprising all the classes from 22DP, 17RP and 3SRP. The final 3-class label space models the three porosity domains: an aggregate diffuse-porous (DP) class including all the woods with diffuse-porous structure (entire left column of Table 2); an aggregate ring-porous (RP) class including all the woods with a ring-porous structure (entire center column of Table 2); and an aggregate semi-ring-porous (SRP) class including all the woods with a semi-ring-porous structure (entire right column of Table 2). Details of label, specimen, image, and taxa counts are shown in Tables 3 and 4 by label space. When referring to a CVWID class, class names are written without italics, while italicization is used when referring to the same woods as botanical entities (e.g., the class Diospyros vs. the genus Diospyros). Finer details of the taxa included in the training and testing datasets for 39DP-RP and 42DP-RP-SRP are provided in Tables S1, S2 in the Appendix.
Table 2. Porosity Domain Membership of Woods Used in Training and Testing
Table 3. Training Dataset Details by Label Space
Table 4. Test Dataset Details by Label Space
Methods
Machine learning models
CNNs (LeCun et al. 1989) with ImageNet (Russakovsky et al. 2015) pretrained ResNet34 (He et al. 2015) backbones and custom classification heads were trained for each of the six label spaces. A two-stage transfer learning strategy (freezing the backbone and training only the randomly initialized custom head followed by full network fine-tuning) was employed to train the models along with a data augmentation strategy that included horizontal/vertical flips, small rotations, and cutout (Devries and Taylor 2017). For both the training stages, random patches of 2048 × 768 pixels were downsampled to 512 × 192 pixels and fed into the network in mini batches of size 16. The Adam optimizer (Kingma and Ba 2015) with cosine annealing (Smith 2018) of the learning rate and momentum was used for updating the model weights during the training process. Further details about the architecture and the two-stage (Howard and Gugger 2020) transfer learning (Pan and Yang 2010) training methodology can be found in Ravindran et al. (2019) and Arévalo et al. (2021). Scientific Python tools (Pedregosa et al. 2011) and the PyTorch deep learning framework (Paszke et al. 2019) were used for model definition, training, and evaluation.
The predictive performance of the trained field models (the trained models obtained by using the entire training data) was evaluated using the top-1 and top-2 specimen level accuracies on the mutually exclusive (from completely different collections) testing dataset. The majority of the class predictions for the (up to 5) images contributed by a specimen was taken as the top-1 specimen level prediction. For top-2 accuracy analysis, if the true class of a specimen was one of the top-2 predicted classes in an equally weighted voting of (up to 5) image-level top-2 predictions, then the specimen was considered to be correctly classified. The field model performance was evaluated using the proxy field testing approach (in which specimens for training and testing were obtained from different xylaria) introduced in Ravindran et al. (2021). The importance of mutually exclusive training and evaluation datasets is elaborated in Ravindran and Wiedenhoeft (2022).
Models with a ResNet50 backbone were also trained and evaluated, and these results are presented in Table S3 and Figs. S1, S2 in the Appendix. Additionally, results of five-fold cross-validation analyses (i.e. internal validation) for both ResNet34 and ResNet50 based model architectures are presented in Table S4 and Figs. S3 – S6 in the Appendix. The field models were evaluated in the following manner under the following assumptions.
Domain-based model scheme
To test the individual accuracies of the domain-based diffuse- (22DP), ring- (17RP) and semi-ring-porous (3SRP) models, each model was run on the test dataset that corresponded to its porosity domain. In the case of actual field deployment, a specimen would first be examined by a human operator who would make a porosity domain determination and then select the corresponding domain-based XyloTron model: 22DP, 17RP, or 3SRP (Fig. 2). These accuracies were evaluated under a best-case scenario assuming that the human operator made the initial porosity domain classification without error.
Fig. 2. In the domain-based model scheme, a human would first make a (correct) porosity domain determination (diffuse-porous, DP; ring-porous, RP; or semi-ring-porous, SRP) for the specimen and then select the corresponding domain-based XyloTron model.
Consolidated model schemes
To test the accuracies of the consolidated models, the 39DP-RP model was run on the test dataset that included all diffuse- and ring-porous woods, and the 42-DP-RP-SRP model was run on the test dataset that included all diffuse-, ring-, and semi-ring-porous woods. In the case of actual field deployment, the human operator would deploy one model without first making a porosity domain determination (Fig. 3). In the case of the 39DP-RP model, we assume that there are no semi-ring-porous woods in this particular region of deployment.
Fig. 3. In the consolidated model schemes, a human does not need to first make a (correct) porosity domain determination. One XyloTron model would be deployed. In the case of the 39DP-RP model (left), we assume that there are no semi-ring-porous woods in the region of deployment. (Abbreviations: diffuse-porous, DP; ring-porous, RP; semi-ring-porous, SRP)
Cascading model scheme
To test the accuracy of a cascading model scheme, the 3POR XyloTron model was deployed first to classify the porosity domain of the image. Based on the output, the image was submitted to the XyloTron model that corresponded to the identified porosity domain. In the case of actual field deployment, the XyloTron software would implement the two models sequentially without need for the human operator to make a porosity domain determination (Fig. 4) or need to manually submit the image to the corresponding model as we have done here.
Fig. 4. In the cascading model scheme, a human does not need to first make a porosity domain determination. Instead, the 3POR XyloTron model is deployed first to classify the porosity of the specimen (diffuse-porous, DP; ring-porous, RP; or semi-ring-porous, SRP). Based on the output, the XyloTron model that corresponded to that porosity domain type would then be deployed.
Model top-1 accuracies are reported below, and top-2 accuracies can be found in the Appendix.
RESULTS AND DISCUSSION
Domain-based Model Scheme Accuracy
Confusion matrices for the domain-based diffuse- (22DP) and ring-porous (17RP) models can be found in Ravindran et al. 2022a and 2022b, respectively, along with misidentification analyses. Their respective top-1 predictive accuracies were 80.6% and 91.4%.
The top-1 predictive accuracy for the domain-based semi-ring-porous model (3SRP) was 100.0%, and it was tested on 3 specimens of Diospyros virginiana, 12 specimens of Juglans cinerea, and 13 specimens of Juglans nigra.
Assuming the XyloTron operator would be able to separate all the specimens into their correct porosity domains and apply the appropriate domain-based model to each specimen, the overall accuracy for the domain-based model scheme would be 85.9%.
Consolidated Model Scheme Accuracies
39-class consolidated model
The top-1 predictive accuracy for the consolidated 39-class model (39DP-RP) was 84.2%. Broken down by porosity domain, the predictive accuracy for diffuse-porous woods was 80.1% and for ring-porous woods was 90.0% (Fig. 5, Tables 5, 6).
Fig. 5. Confusion matrix of ResNet34 field model on test specimens for the consolidated 39DP-RP model. Counts in the diagonal cells indicate correct predictions. Off-diagonal counts indicate misclassifications. The blue dotted lines delineate porosity domains. From AcerH to Tilia are diffuse-porous classes and from Asimina to UlmusS are ring-porous classes. Counts appearing in the top right region are cross-domain misclassifications for porosity (diffuse-porous woods mistaken for ring-porous woods). N = 482
The confusion matrix for the consolidated 39DP-RP model appears in Fig. 5 and is separated into four regions by blue dotted lines, which delineate the boundaries between the two porosity domains on the vertical and horizontal axes. The diffuse-porous classes run from AcerH to Tilia while the ring-porous classes run from Asimina to UlmusS. Counts in the diagonal represent correct predictions. Off-diagonal counts indicate misidentifications. The upper right region of the matrix is populated by three cross-domain prediction errors representing three different diffuse-porous specimens misclassified as three different ring-porous woods. In contrast, the lower left region contains no cross-domain errors showing that none of the ring-porous woods were misclassified as diffuse-porous woods. Off-diagonal counts in the upper left and lower right regions indicate erroneous predictions within a porosity domain.
Table 5. Prediction Errors and Accuracies by Input Class for 39DP-RP (Diffuse-porous portion of the dataset)