NC State
BioResources
Bardak, T. (2023). “Predicting prices of case furniture products using web mining techniques,” BioResources 18(4), 7412-7427.

Abstract

This article presents a methodology based on web mining techniques for estimating furniture prices using e-commerce data. Data on different public e-commerce sites in the United States were collected and analyzed using web mining methods. Deep learning and random forest algorithms were used to predict the prices of different types of furniture. Bookcase and dresser type furniture, which are widely used in price estimation, were selected. The inquiry identified a collection of eight distinctive attributes linked to furniture items, spanning measurements such as width, depth, and height, alongside features encompassing frame material, partition count, drawer count, color, and price. In preparation for constructing predictive models, a dataset comprising 300 instances was compiled for comprehensive analysis. Models developed based on web mining to predict furniture prices gave satisfactory results. During the testing phase, the random forest algorithm outperformed deep learning, achieving high goodness of fit values of 0.89 and 0.94 for bookcase and dresser furniture, respectively. The results indicate that price estimation for dresser furniture was more accurate than for bookcases in all models. The findings demonstrate that web mining techniques can be used effectively in competitive furniture pricing, with potential to save time and cost in pricing for furniture purchasing.


Download PDF

Full Article

Predicting Prices of Case Furniture Products Using Web Mining Techniques

Timucin Bardak *

This article presents a methodology based on web mining techniques for estimating furniture prices using e-commerce data. Data on different public e-commerce sites in the United States were collected and analyzed using web mining methods. Deep learning and random forest algorithms were used to predict the prices of different types of furniture. Bookcase and dresser type furniture, which are widely used in price estimation, were selected. The inquiry identified a collection of eight distinctive attributes linked to furniture items, spanning measurements such as width, depth, and height, alongside features encompassing frame material, partition count, drawer count, color, and price. In preparation for constructing predictive models, a dataset comprising 300 instances was compiled for comprehensive analysis. Models developed based on web mining to predict furniture prices gave satisfactory results. During the testing phase, the random forest algorithm outperformed deep learning, achieving high goodness of fit values of 0.89 and 0.94 for bookcase and dresser furniture, respectively. The results indicate that price estimation for dresser furniture was more accurate than for bookcases in all models. The findings demonstrate that web mining techniques can be used effectively in competitive furniture pricing, with potential to save time and cost in pricing for furniture purchasing.

DOI: 10.15376/biores.18.4.7412-7427

Keywords: Furniture industry; Price; Data mining; Prediction modeling

Contact information: Bartin University, Bartin Vocational School, Furniture and Decoration Program, 74200, Bartin, Turkey; *Corresponding author: timucinb@bartin.edu.tr

INTRODUCTION

The furniture industry is a critical sector in terms of creating both added value and employment in the economic development of many countries. When determining prices, furniture manufacturers consider many factors, such as profitability, competition, and market. For consumers, a product must be offered at a price that matches its value, which is an important determining factor when deciding to buy a product (Raposo et al. 2018). In a study by Lihra et al. (2012), factors influencing consumers’ preferences in furniture shopping were examined. According to the results, approximately 50% of consumers consider price to be the most significant factor in product selection, 20% prioritize product customization, 20% consider delivery time, and 10% take into account the time required for product customization (Lihra et al. 2012). In the realm of online shopping, due to the inability of consumers to physically see and touch products, it is widely believed that elements such as pricing and product information play a more pronounced role in influencing consumer purchasing behavior. A study of online consumer behavior shows that pricing plays an important role in influencing purchasing decisions (Zhao et al. 2021). Correct pricing for companies is a complex problem in competitive market conditions.

Data mining (DM) is used in many industries to solve challenging problems (He et al. 2022). DM can be defined as a method that allows the most valuable information to be obtained by analyzing data sets. This method includes data processing, model creation, feature extraction, and discovery studies (Pérez-Campuzano et al. 2022). Using important data sources and data mining algorithms helps reduce costs effectively by making it easier to determine the optimal price. Numerous investigations have demonstrated that data-driven models exhibit superior efficacy in predicting the prices of various products compared to conventional techniques (Chen et al. 2016; Li et al. 2019; Duan and Liu 2021). Studies have shown that Artificial Neural Network (ANN) cost models were developed using data from 300 building projects to evaluate the total construction cost for customers. As a result of the analysis, it was determined that the ANN models had a high prediction performance (Emsley et al. 2002; Hassim et al. 2018). In a separate investigation, Putri et al. (2019) devised predictive models utilizing monthly retail pricing data to forecast domestic and global beef prices in the Indonesian market. These models were efficacious in their predictive capacity (Putri et al. 2019).

Web mining (WM) is a field of research that seeks to discover valuable information, especially by referring to texts from web content analysis (Brauner et al. 2022). The WM technique has become extremely popular, as it offers the opportunity to discover new information in many fields. However, WM poses a considerable challenge due to the limited interpretability of its results, hindering the ability to draw meaningful inferences. Various machine learning (ML) techniques can be employed in web mining to extract valuable insights from web documents. Because of these techniques, the interpretability of web mining results can be increased (Lee and Lee 2011). Random forest (RF) and deep learning (DL) algorithms are widely used among ML methods and provide effective results for different data sets. The DL generates high-precision results and appropriate outputs using training datasets. The DL model consists of many layers, and each layer manages different stages of the data processing process. These layers are called the input layer, the output layer, and the hidden layer (Kim and Falcone 2017; Ghorbanzadeh et al. 2019; Saha et al. 2023). The RF is a commonly used method for developing predictive models. This method works by constructing many conditionally independent decision trees to solve the overfitting problem. These trees are created using different features and data points and are independent of each other (Breiman 2001; Orte et al. 2023). Both the DL and RF algorithms have their respective strengths within their domains. The RF algorithm offers significant advantages for medium to large-scale datasets, excelling in capturing non-linear relationships present in the data. It also simplifies tasks involving structured data and scenarios where interpretability holds importance. On the other hand, the DL algorithms prove particularly effective when dealing with extensive, unstructured data such as images, audio, and text. They also stand out for their ability to process diverse types of data. Thus, in the present study, a preference was given to these two distinct algorithm types. The fundamental distinction between The DL and The RF algorithms lies in the former’s design to handle large, unstructured datasets, while the latter is more suitable for structured data and is often favored in cases emphasizing interpretability (Schonlau and Zou 2020; Sarker 2021).

E-commerce platforms (EP) are an ideal data source for web mining. An EP encompasses a range of information concerning furniture products; for instance, details encompassing type, style, color variations, design attributes, and furniture prices serve as only a few illustrative examples. These particulars constitute a subset of the comprehensive dataset concerning furniture items within such platforms. Furthermore, these websites also facilitate data acquisition about vastly diverse furniture pieces such as beds, sofas, tables, and chairs. Nevertheless, there exist constraints related to the information accessible through e-commerce platforms. To illustrate, e-commerce websites refrain from divulging personal or sensitive data about customers, such as credit card numbers or social security numbers. Moreover, detailed financial particulars concerning the company, such as revenue figures or profit margins, are generally not within the purview of e-commerce platforms. Instead, the central focus of e-commerce websites resides in furnishing customers with a secure and gratifying shopping encounter. The EP commerce sites are platforms that are becoming increasingly common today and remarkably transform the shopping experience for customers. Customer shopping behavior can be tracked and analyzed on these platforms (Massimino 2016; Zhou et al. 2021). In a study, online purchasing data of consumers from public e-commerce sites were collected for customer segmentation. These data were processed with web mining analysis to create profiles of customer purchasing behaviors (Zhou et al. 2021). A new approach using web mining techniques has shown effectiveness in accurately estimating unemployment rates. It was also emphasized that the proposed framework could help understand the factors underlying unemployment rates and can provide people with understandable qualitative clues (Li et al. 2014). In another study, a method is proposed based on the idea of developing a scenario using web mining for a specific topic. This method has been highlighted as having the potential to improve time-consuming desk research in scenario projects and has proven to be very useful (Kayser and Shala 2020).

This study aims to investigate the applicability of different ML algorithms to predict the prices of furniture products using a data set obtained from e-commerce sites. In addition to the furniture products’ features, the prepared data set includes each product’s price information. The price information utilized in our study is directly sourced from various e-commerce platforms themselves. The present analysis is founded upon the authentic price listings furnished by these platforms. The pricing data was gathered between the dates of January 21, 2023, and March 21, 2023. The investigation revealed a set of eight unique characteristics associated with furniture products. These attributes encompass a range of measurements, including width, depth, and height, as well as features such as frame material, partition count, drawer count, color, and price. To formulate predictive models, a dataset consisting of 300 instances was systematically compiled, facilitating a thorough analysis. The relationships between furniture products’ features and price information in the data set were examined using data mining techniques. The results revealed that data mining techniques can be used successfully to predict the prices of furniture products. In addition, the study can provide an essential resource for the furniture industry to determine the correct pricing strategies and offer products that meet customer demands.

EXPERIMENTAL

Data Collection

In this study, price estimates were made for different types of furniture, such as bookcases and dressers. The data were obtained from three different public e-commerce sites and analyzed using ML models. The collected data were homogenized according to criteria such as product categories and price ranges to eliminate inconsistencies arising from the different structures of the web pages. The analysis determined eight different features for the furniture: width, depth, height, frame material, number of partitions, number of drawers, color, and price. 300 pieces of data were gathered for two distinct furniture categories for predictive purposes. Of these datasets, 150 pertained to bookcases, while the remaining 150 were connected to dressers. Tables 1 and 2 summarize the attributes in the training dataset used for the price prediction of types of furniture.

Table 1. Summary of Training Dataset Used for Price Prediction of Bookcase Furniture

Note: The term “inch” used in this table is approximately equivalent to 2.54 centimeters (cm).

Table 2. Summary of Training Dataset Used for Price Prediction of Dresser Furniture

Data Preprocessing

In this study, the normalization process was applied to numerical values. This process was carried out to ensure the data were at the same scale and distribution.

Models

Deep learning (DL) and random forest (RF) algorithms were used for price estimates of different types of furniture. In the analysis, the dataset was divided into training and test datasets by the literature, and the training dataset covered 70% of the total dataset. The remaining 30% dataset was reserved as the test dataset (Velten et al. 2000; Furtney et al. 2022; Saha et al. 2023).

All models were developed with RapidMiner Studio Version 9.3 software (Boston, MA, USA), which has been widely used in many studies (Keet et al. 2015; Bardak et al. 2021; Mariano et al. 2022; Naser 2023). Figure 1 illustrates the workflow of the process used for model comparison.

Fig. 1. The workflow of the process used for model comparison in bookcase furniture

The weights of attributes such as width, depth, height, frame material, number of partitions, number of drawers, and color were determined using the algorithm RF, which has the highest level of success

Model Evaluation

To evaluate the effectiveness of the predictions made by different prediction models, previously accepted performance criteria in the literature were used., the goodness of fit (R2, Eq. 1) and root mean square error (RMSE, Eq. 2) were used to measure the success of predictive models objectively (Razali and Al-Wakeel 2013; Li et al. 2019; Agwu et al. 2020),

(1)

(2)

where and are the measured and predicted values, respectively and n represents the total number of samples (Pervez et al. 2023).

Optimization of the Parameters of the Models

In pursuing optimal prediction performance, adjusting various parameters with precision is essential. To this end, the grid operator is a useful tool for parameter optimization within the RapidMiner software platform. This approach enables identifying and selecting optimal parameters for model, thereby contributing to enhanced efficiency and accuracy in machine learning and other algorithmic applications. To determine the optimal parameters, the (Grid) operator was employed. The workflow prepared to determine the optimum parameters is shown in Fig. 2.

Fig. 2. The workflow prepared to determine the optimum parameters

To optimize model performance, significant parameters in DL and RF algorithms were adjusted. In the DL algorithm, parameters such as hidden layer sizes and train samples per iteration were selected. Hidden layer sizes determine the number of hidden layers and neurons in the model. For example, specifying “100,200,100” would create a model with 3 hidden layers, and the middle-hidden layer would have 200 neurons. This determines the amount of training data that is processed in each iteration. This parameter determines the rate at which scoring and model cancellation can occur. In the RF algorithm, parameters such as the number of trees and maximum depth were chosen. The number of trees specifies the count of random trees to be generated. For each tree, a subset of the example set is chosen through bootstrapping. Maximum depth is used to limit the depth of each random tree. The values of these parameters can vary based on the size and characteristics of the dataset (Rapidminer 2023).

The optimal parameters for each model and furniture type were determined separately and are presented in Tables 3 and 4. For the DT model, the optimal parameters for achieving the highest level of performance for bookcase furniture were 90 the hidden layer sizes and -2 train samples per iteration. The most suitable parameters for achieving effective results for dresser furniture type were 90 hidden layer sizes and -2 train samples per iteration. As for the RF model, to attain the best performance for bookcase furniture, the number of trees was set to 20 and maximal depth to 7. On the other hand, for dresser furniture, the number of trees was set as 20 and the maximal depth was set as 7 to obtain optimal outcomes.

Table 3. Optimal Model Parameters for DL Algorithm

Table 4. Optimal Model Parameters for RF Algorithm

RESULTS AND DISCUSSION

Two different ML models were developed using the data obtained from web pages, and each model was trained and tested on the same data sets. Tables 5 and 6 present the prediction performances for furniture. A total of 300 data samples from three different e-commerce websites were collected for this study: 114 samples were provided by the first website, 97 samples were provided by the second website, and 89 samples were provided by the third website.

Table 5. The Prediction Performances for Bookcase Furniture

Table 6. The Prediction Performances for Dresser Furniture

In terms of performance measurements, the RF algorithm demonstrated the highest R2 (0.939) for price estimations of dresser furniture, while the DL algorithm exhibited the lowest R2 (0.797) for library furniture. The RF algorithm outperformed the DL algorithm in all tests, demonstrating higher performance. It has been determined that all models were suitable for predicting furniture prices. In the literature, an R2 value greater than 0.70 is accepted as a satisfactory result (Wadie et al. 2006; Heng and Suetsugi 2013). These results show that models can be used for price estimation in the furniture industry. Figure 3 illustrates the correlation between the actual and predicted prices of bookcase furniture during the testing phase, utilizing the DL and RF models.

Figure 4 illustrates the correlation between the actual and predicted prices of dresser furniture during the testing phase, utilizing the DP and RF models.

The predicted results of the two data mining models for bookcase and dresser furniture are presented in Tables 7 and 8 during the testing phase. The data presented in Tables 7 and 8 assess the predictive power of our models. These data demonstrate that the present models are aligned with actual prices and generally exhibit a low error rate. These findings indicate that these models could be an effective tool in forecasting future price trends.

For dresser furniture, the percentages of correct predictions obtained by DL and RF models were determined as 79.62% and 83.06%, respectively. For bookcase furniture, the correct prediction percentages of the DL and RF models were calculated as 74.75% and 82.60%, respectively. It was found that the prediction success of dresser furniture was higher than bookcase furniture. This is attributed to the fact that bookcase furniture shows more variety in the end product.

Fig. 3. The DL and RF models’ actual and predicted price results for bookcase furniture

In literature, the assessment of model performance based on prediction percentage values can be classified as follows: a prediction percentage value exceeding 90% indicates a highly accurate model, while a prediction percentage value falling within the range of 89% to 80% is a good prediction. Additionally, a prediction percentage value ranging from 79% to 50% is considered a reasonable prediction model (Lewis 1982).

Fig. 4. The DL and RF models’ actual and predicted price results for dresser furniture

Table 7. Predicted Results of the Two Machine Learning Models Tested for Bookcase Furniture