Abstract
Anaerobic digestion (AD) is a widely recognized method for converting organic waste into biogas, offering a sustainable solution for both waste management and renewable energy generation. This review critically examines recent advancements in mathematical modeling and machine learning (ML) approaches applied to biogas production from AD processes. The study categorizes the models into daily and cumulative biogas production models, kinetic models, and hybrid AI-based predictive techniques. Special attention is given to the comparative evaluation of first-order kinetics, modified Gompertz, and Chen-Hashimoto models, highlighting their applicability and limitations. Furthermore, the integration of artificial neural networks (ANNs) and other ML algorithms is discussed in the context of optimizing biogas yield, understanding system dynamics, and reducing operational uncertainties. Research gaps are identified, including the need for more robust hybrid models, real-time monitoring systems, and studies under diverse feedstock and environmental conditions. The review emphasizes that combining traditional modeling with intelligent systems offers a powerful approach to enhancing AD performance and scaling sustainable energy solutions.
Download PDF
Full Article
Mathematical Modeling and Machine Learning Approaches for Biogas Production from Anaerobic Digestion: A Review
Osama H. Galal,a Mahmoud M. Abdel-Daiem ,b,c,* Hani S. Alharbi,c and Noha Said
,b
Anaerobic digestion (AD) is a widely recognized method for converting organic waste into biogas, offering a sustainable solution for both waste management and renewable energy generation. This review critically examines recent advancements in mathematical modeling and machine learning (ML) approaches applied to biogas production from AD processes. The study categorizes the models into daily and cumulative biogas production models, kinetic models, and hybrid AI-based predictive techniques. Special attention is given to the comparative evaluation of first-order kinetics, modified Gompertz, and Chen-Hashimoto models, highlighting their applicability and limitations. Furthermore, the integration of artificial neural networks (ANNs) and other ML algorithms is discussed in the context of optimizing biogas yield, understanding system dynamics, and reducing operational uncertainties. Research gaps are identified, including the need for more robust hybrid models, real-time monitoring systems, and studies under diverse feedstock and environmental conditions. The review emphasizes that combining traditional modeling with intelligent systems offers a powerful approach to enhancing AD performance and scaling sustainable energy solutions.
DOI: 10.15376/biores.20.4.Galal
Keywords: Mathematical modeling; Anaerobic digestion; Multi-dimensional models; Machine learning; Parameters uncertainty; Renewable energy
Contact information: a: Engineering Mathematics and Physics Department, College of Engineering, Fayoum University, 63514, Fayoum, Egypt; b: Environmental Engineering Department, Faculty of Engineering, Zagazig University, 44519, Zagazig, Egypt; c: Civil Engineering Department, College of Engineering, Shaqra University, 11911, Duwadmi, Riyadh, Saudi Arabia;
* Corresponding author: mmabdeldaiem@eng.zu.edu.eg
INTRODUCTION
Anaerobic digestion (AD) converts organic waste into biogas (primarily CH₄ and CO₂), delivering simultaneous sanitation and energy recovery, and aligning with circular economy goals (Jameel et al. 2024; Alengebawy et al. 2024). Across common feedstocks, including sewage sludge, agricultural residues, food waste, and manure co-digestion, as well as process tuning (temperature, pH, organic loading rate (OLR), hydraulic retention time (HRT)), it is possible to enhance yields and stability when the system is properly managed (Adnane et al. 2024; Liu et al. 2025). Mathematical modeling has emerged as a critical tool in understanding, simulating, and scaling up AD processes across various substrates, including sewage sludge, agricultural residues, and municipal solid waste (Abdel Daiem et al. 2021). Recent advancements in kinetic and mechanistic modeling approaches have significantly improved the predictive accuracy and control of AD systems (See Table 1).
Unlike mathematical models, machine learning (ML) learns patterns from data, enabling flexible prediction and optimization of biogas production. In recent years, the application of ML in renewable energy has gained significant traction, particularly in modelling complex biological processes such as AD for biogas production (Najafi and Ardabili 2018; Beltramo et al. 2019; Abdel Daiem et al. 2021; Cruz et al. 2023; Komarysta et al. 2023; Shindell et al. 2024; Zhu et al. 2025). The nonlinear and dynamic nature of biogas production processes makes conventional modelling approaches less effective. In contrast, artificial neural networks (ANNs) offer high adaptability, pattern recognition, and learning capabilities, making them well-suited for predicting biogas yields from various organic feedstocks (Abdel Daiem et al. 2021). This is especially relevant in the context of sewage sludge and biomass residues, which vary in composition and behaviour during digestion. The integration of ANN into biogas research represents a promising direction for optimizing system performance and enhancing energy recovery, aligning with global sustainability and waste-to-energy initiatives.
The ML techniques have become promising alternatives and complement the traditional mathematical models discussed in this paper, especially for dealing with AD processes’ non-linear, dynamic, and uncertain characteristics. Unlike deterministic models, such as the modified Gompertz or logistic equations, which depend on specific kinetic assumptions and can have difficulty handling variable feedstocks or operational conditions (Roberts et al. 2023; Ling et al. 2024), ML methods are data-driven and capable of capturing complex patterns from high-dimensional inputs without predefined mechanisms (Ling et al. 2024). This makes them suitable for predicting biogas yields, optimizing co-digestion ratios, estimating uncertain parameters, and supporting monitoring of real-time (models that continuously update predictions and provide actionable outputs during ongoing AD plant operation using live SCADA data streams) in multi-dimensional AD systems (Asadi and McPhedran 2021). Recent studies (2019 to 2025) have utilized ML algorithms, such as ANNs (Cruz et al. 2023; Komarysta et al. 2023), random forests (RF), support vector machines (SVM), and deep learning models (LSTM) for AD, often comparing their performance favourably to traditional models (Yildirim and Ozkaya 2023). These approaches address research gaps, such as incorporating parameter uncertainty through probabilistic predictions and extending to multi-dimensional inputs via feature engineering and hybrid models (Sappl et al. 2023).
This review article presents a novel, integrative synthesis of recent advancements in the modelling and optimization of AD processes for biogas production, focusing on the convergence of mathematical modelling and ML techniques. While prior reviews have addressed modelling frameworks in isolation, this work uniquely bridges deterministic kinetic models with data-driven approaches, offering a comparative assessment of their capabilities, limitations, and future trajectories. Thus, the purpose of this study is to evaluate the predictive performance of widely used mathematical models, such as first-order kinetics, modified Gompertz, and Chen–Hashimoto models, alongside ANN and hybrid ML models, including random forests, SVMs, and deep learning architectures. The review highlights how ML algorithms increasingly address the nonlinearities and uncertainties inherent in AD systems, particularly for complex substrates such as sewage sludge, food waste, and co-digested residues. Moreover, it outlines gaps in current modelling practices, including limited real-time adaptability, feature selection, and parameter sensitivity analysis. It proposes future extensions involving hybrid modelling frameworks and smart digesters. Through integrating insights across computational and engineering domains, this review advances a comprehensive understanding of biogas system optimization, promoting scalable and intelligent waste-to-energy solutions aligned with sustainability goals.
Table 1. Summary of Key Studies on Anaerobic Co-Digestion, Highlighting Substrates, Operating Conditions, Biogas/Methane Yields, and Kinetic/Statistical Model Performance
Novelty and Distinctiveness
This review differs from others in the following respects:
1. Classical vs. ML Modeling: The review compares classical kinetic models (first-order, Gompertz, Chen-Hashimoto) with ML approaches across daily-rate and cumulative-yield frameworks. Findings highlight where traditional kinetics remain useful and where ML achieves better predictive accuracy.
2. Multidimensional Kinetic Framework: A multidimensional framework is introduced, treating kinetic parameters as functions of operational variables such as temperature and mixing ratio. This enables response surfaces that support scenario mapping and process optimization, which are rarely discussed in prior AD reviews.
3. Stochastic Parameter Uncertainty: Kinetic parameters are modeled as random variables using stochastic methods, including Karhunen–Loève expansions. This generates probabilistic biogas trajectories with means, quantiles, and variances, offering a risk-aware alternative to point estimates.
4. ML Applications: Advanced ML methods (LSTM, TFT, SHAP) are synthesized for forecasting, optimization, and stability control in AD systems. Their performance is benchmarked against kinetic baselines, emphasizing practical deployment guidance.
5. Hybrid Mechanistic–ML Framework: A hybrid framework integrates mechanistic kinetics with ML residual learning, enabling IoT-based smart digesters. Recommendations for dataset standardization and cross-validation strengthen pathways toward real-world implementation.
6. Up-to-Date Coverage: The article emphasizes the most recent advances (2023–2025), including emerging algorithms (LSTM, hybrid ML models) and updated kinetic formulations, which have not been synthesized elsewhere.
MATHEMATICAL MODELS
Daily Biogas Production Models
Table 2 identifies the parameters and their goodness of fit using daily biogas production models (linear, exponential, and Gaussian models). Among the case studies summarized in Table 2, exponential daily-rate functions consistently achieved the highest goodness-of-fit on both rising and falling limbs (R² ≈ 0.960–0.999), followed by Gaussian profiles when a single, roughly symmetric peak was present (R² ≈ 0.95). Linear fits were acceptable mainly for descending limbs or simple substrates but tended to underfit peak regions and onset dynamics. Practically, daily-rate forecasting should default to exponential models unless there is strong peak asymmetry or multi-modal behavior; linear fits are best used for quick, conservative screening.
Exponential daily-rate models are the most reliable across substrates and digestion stages, with Gaussian profiles competitive when production exhibits a single, symmetric peak; linear fits chiefly succeed on descending limbs and under simple matrices. Lo et al. (2010) and Latinwo and Agarry (2015) illustrate this pattern: exponential fits track both rise and fall with the highest R², Gaussian captures unimodal curves, and linear underestimates peak curvature. Practically, investigators often default to using exponential approaches for short-horizon forecasting and reserve Gaussian approaches for pronounced single-peak shapes; they use linear fitting only for conservative trend screening.
Table 2. Daily Biogas Production Models (Linear, Exponential, and Gaussian) Applied to Diverse Feedstock, with Key Parameters and R² values. All models Show Strong Predictive Accuracy (R² > 0.90), with Exponential Models Excelling in Dynamic Phases and Gaussian Models Performing Well for Heterogeneous Wastes
Linear model
The linear model has been used to simulate and predict the daily biogas production resulting from AD (Rossi et al. 2022). This model assumes that the biogas production starts at an initial time, t0 with a value P0 and then increases linearly up to a maximum value Pmax at time tm, after which it decreases linearly to a final value, Pf at time tf. This plot has two limbs, an ascending limb for and a descending one for
. Assuming the plot similarity about the maximum value, the model equation can be written as,
(1)
where a and b are two dimensionless constants to be determined for the best fitting of the experimental data. They may be expressed as some other constants multiplied by P0 and , respectively. Generally, this model is considered the simplest one, but its statistical indices are not as satisfying as those of some other models. However, this model, along with the exponential one, was shown by Lo et al. (2010) to have a better plot for the descending limb for the BA/MSW 100 g L -1 bioreactor in the process of biogas production from the organic fraction of MSW co-digested with MSWI ashes. Moreover, this model was employed to simulate the biogas production resulting from cow dung only and cow dung with plantain peels (Latinwo and Agarry 2015). It showed an R2 of 0.885 for the ascending limb and 0.995 for the descending one in the first case, while it was 0.879 and 0.997 for the ascending and descending limbs, respectively, in the second case. These correlation values are not that satisfying in comparison with the other models used in the same study. Nevertheless, linear models can still be valuable for first‑cut assessments or when computational simplicity is paramount.
Exponential model
This model proposes an exponential increase in the daily biogas production with time up to an inevitable climax, and then it would decrease exponentially to zero (De Gioannis et al. 2009; Lo et al. 2010; Latinwo and Agarry 2015). The model equation is given by Eq. 2,
(2)
where a and b are two constants (L kg-1 d-1) while c is another constant (d-1), the latter has a positive value for the rising limb and a negative value for the falling one. De Gioannis et al. (2009) used this model in its differential form to simulate Municipal Solid Waste (MSW) landfill gas generation after mechanical biological treatment. Their study aimed to estimate the model constants after 8 and 15 weeks. Regarding R2, the model accuracy showed 0.84 and 0.90 for the rising and falling limbs, respectively, in the case of eight weeks of gasification, while it was 0.81 and 0.95 for 15 weeks. Moreover, Lo et al. (2010) utilized the exponential model in their work mentioned above, where the best R2 values were 0.9579 and 0.9288 for the rising and falling limbs, respectively, and both were achieved in the case of FA/MSW 10 g L-1. Furthermore, Latinwo and Agarry (2015) have employed this model to simulate biogas production resulting from both cow dung and cow dung activated by plantain peels, showing outstanding representation in both cases. The R2 for the ascending and descending limb was 0.9988 and 0.9969 in the first case, while 0.9951 and 0.9969 for the second.
Gaussian model
The Gaussian distribution is usually used to plot numerous natural phenomena (Simon 2002; Lo et al. 2010). It has also been used to describe bacterial growth, resulting in biogas production during AD. Therefore, this model and some other models for growth and decay can be used to simulate the daily production process. The Gaussian model is given as Eq. 3,
(3)
where is a constant (L kg-1 d-1), while tm and b are the mean and standard deviation, respectively, in (d), this model has been investigated by Tonner et al. (2017) to simulate the differential effects of media, genetics, and stress on microbial population growth. Moreover, it was utilized to simulate and predict the biogas production evaluated by Lo et al. (2010), where the best R2 was 0.9486 in the case of FA/MSW 20 g L-1. In addition, Nielfa et al. (2015) used this model to simulate methane production resulting from the composition of heterogeneous organic and inorganic wastes with OFMSW. The highest R² was achieved in the case of a garden waste mixture with the OFMSW, where it was 0.95.
However, AD operational monitoring and management depend heavily on daily biogas output models. Data in Table 2, together with Eqs. 1 to 3, indicate that although basic models such as Gaussian, exponential, and linear can fit the ascending and descending limbs of daily production, their accuracy is strongly influenced by the substrate and process conditions. For example, the exponential model can achieve excellent fits (R² up to 0.9988) for certain organic fractions and waste combinations, while the linear model performs reasonably well (R² up to 0.96) but is often outperformed. The Gaussian model, with good fits (R² = 0.95) for heterogeneous organic wastes, also demonstrates robustness and usefulness in simulating the symmetric rise and fall of daily production rates in specific systems.
For operations, daily-rate models are most useful for short-term scheduling, diagnosing inhibition or overload patterns, and checking whether a feeding change alters rise or fall constants as expected. Exponential forms are a sensible default for forecasting both sides; Gaussian is informative when production shows a single, symmetric peak, while linear fits act as conservative trend indicators rather than control-relevant predictors. These choices help operators prioritize sampling frequency and decide if a perturbation requires adjusting the OLR or mixing strategy in the next cycle.
Linear, exponential, and Gaussian daily-rate forms implicitly assume a unimodal production curve under a stable operating regime over the day, with negligible gas-holding/back-pressure effects. In continuous or semi-batch operation, feed pulses, temperature swings, transient inhibition (e.g., ammonia, sulfide, long-chain fatty acids), foaming, or mixing disruptions can create asymmetric or multi-peak profiles that a single exponential or Gaussian cannot reproduce, biasing rise/fall constants and peak timing (Lo et al. 2010; Altaş 2009). In such cases, segmented fits or multi-population kinetics are preferable; at minimum, re-fit pre-/post-perturbation windows and avoid extrapolating across regime shifts (Ling et al. 2024).
Cumulative Biogas Production Models
Table 3 summarizes cumulative-yield models and reveals a clear pattern: the modified Gompertz consistently achieves near-perfect fits across various substrates and operating conditions (R² ≈ 0.98 to 1.00), often outperforming the logistic and modified-logistic models. The exponential rise-to-maximum model performs exceptionally well in landfill BMP contexts (R² ≈ 0.99 to 0.996), while simple logistic models are mainly competitive for more homogeneous feedstocks (e.g., manure). In practice, A (ultimate potential) and λ (lag) are the most influential parameters in modified-Gompertz fits, emphasizing the importance of accurate estimation or uncertainty ranges.
Engineering interpretation of cumulative-yield parameters directly supports design and start-up. The ultimate potential A informs gasholder/CHP sizing and energy contracts; the lag λ frames warm-up and acclimation windows; and the maximal rate Dm or kinetic constant k links to target HRT and expected time to plateau. Sensitivity analyses around A and λ are therefore recommended before committing to co-digestion ratios or pre-treatment choices, especially where substrate supply is seasonal or heterogeneous.
Logistic kinetic model
The model assumes an exponential increase up to a maximum value and remains constant (Latinwo and Agarry 2015). It has three parameters: A, which is the biogas production potential (L kg-1 d-1); b, a dimensionless constant; and k, another constant (d-1). Equation 4 expresses this model:
(4)
The modified Gompertz model most consistently attains near-perfect cumulative fits across feedstocks and operating regimes, with A (ultimate potential) and λ (lag) dominating sensitivity; exponential rise-to-maximum excels in landfill/BMP contexts; while logistic/modified-logistic forms are competitive for homogeneous manures. Lo et al. (2010), Nielfa et al. (2015), and Deepanraj et al. (2017) embody these trends, modified Gompertz captures lag and plateau robustly, exponential rise-to-maximum performs in mid-range, and simple logistic is adequate when variability is low. Design-wise, use A for gasholder/CHP sizing, λ for start-up windows, and Dm or K to inform HRT and time-to-plateau.
Modified logistic model
This model is based on the bacterial population growth, which leads to the biogas production during the AD process using Eq. 5 (Amleh and Al-Freihat 2025),
(5)
where A is as defined before, is the maximum rate of cumulative biogas production, and λ is the lag (delay) time for the start of biogas production. This model was studied by Jafari-Sejahrood et al. (2019) to plot and predict the biogas production from cow manure, where its R2 was 0.993. Moreover, the inhibitory effect of four heavy metals on the methane-producing anaerobic granular sludge was studied using the same model by Altaş (2009).
Table 3. Biogas Production Kinetic Models (Exponential, Logistic, Modified Gompertz, Modified Richards) Showing High Predictive Accuracy across Substrates, with Modified Gompertz Achieving R² > 0.99 in Most Cases
These studied metals were zinc, nickel, cadmium, and chromium, where the correlation coefficient R2 was greater than 0.99 for all metals except chromium. In addition, Mu et al. (2007) investigated the kinetics of hydrogen production from sucrose by mixed anaerobic cultures. They used this model, which shows an R2 of 0.9916. Concerning the food waste, Deepanraj et al. (2017) studied the biogas production of food waste co-digested with poultry manure. They considered four types of digestate pre-treatment: autoclave (AC), microwave (MW), ultrasonication (US), and a no-pre-treatment case (NT). The best fit of this model for the US is where R² = 0.9991.
Exponential rise-to-maximum model
The exponential rise to maximum model describes many physical phenomena in various fields, including biology, physics, economics, and finance. The model has two parameters: A and k. The first one, A, is the biogas production potential (L kg-1d-1), while is another constant (d-1), and is given as the following equation (Bilgili et al. 2009):
(6)
Bilgili et al. (2009) investigated the exponential rise to maximum model for predicting the biochemical methane potential of landfilled solid waste. They designed two landfill reactors; R1 operated with leachate recirculation and R2 without it. The best R2 was 0.9961 for R1 and 0.9942 for R2 after 400 days of operation for both reactors. For the same problem treated above by Lo et al. (2010), this model was applied, where the best R2 was 0.9907 in the case of the control bioreactor without ash addition. Moreover, Latinwo and Agarry (2015) studied it for the two instances of cow dung only and cow dung with plantain peels where it showed less R2 of 0.9907 in the first case and 0.8543 for the second case.
Gompertz model
The Gompertz model equation contains three constants, , and. The constant is the biogas production potential (), while is a dimensionless constant, and is another constant in (d-1) (Zwietering et al. 1990; Mueller et al. 1995; Lo et al. 2010; Peleg and Corradini 2011):
(7)
Modified Gompertz model
The modified Gompertz model is one of the most notable models and presented by Eq. 8 (Zwietering et al. 1990; Li and Fang 2007; Budiyono et al. 2010; Lo et al. 2010):
(8)
This model equation has the constant A as defined before, Dm is the maximal daily biogas production rate (L kg-1d-1), is the lag phase (d) and e is Euler’s number. This model was extensively applied in many AD problems because of its high correlation. Li and Fang (2007) used this model to simulate the inhibition of H2 production potential due to the effect of six heavy metals on the activity of a granular sludge. They calculated the model constants for different concentrations of these metals, where
in all cases. Moreover, Lin and Shei (2008) studied the effects of ionic Cr, Cu, and Zn on the fermentative hydrogen production of sewage sludge. They used different dosages for each metal and estimated the model constants and correlation in all cases. The model was nearly perfect for the experimental data, with the best R² values of 0.9981, 0.9998, and 0.9923 for investigating the effects. Combined with emerging data analytics, these extensions promise to bridge the gap between theoretical modelling and practical implementation in diverse operational contexts of Cr, Cu, and Zn, respectively. In addition, Altaş (2009) studied the inhibitory effect of four of these metals as mentioned above, where it was shown that R2 was greater than 0.99 for all metals except Cr. Additionally, Tian et al. (2020) studied the kinetic evaluation of the biogas potential from a heavy-metal-stressed anaerobic fermentation process. The model showed good correlation for most studied metals with different concentrations, where the best R2 was 0.9989. Furthermore, Li et al. (2008) investigated the enhancement of bio-hydrogen production from food waste and sewage sludge in the presence of aged refuse excavated from a refuse landfill. They applied the modified Gompertz model to plot the biogas production, which showed a relatively high correlation with of 0.9820. In another work concerning food waste, Deepanraj et al. (2017) used this model to simulate the four cases of digestate, as mentioned before. The best was 0.9995 in the case of NT. Moreover, Mu et al. (2007) used this model in the problem mentioned, showing an R2 of 0.9940. Budiyono et al. (2010) predicted the biogas production rate from cattle manure. They employed this model for two substrates to investigate the effect of liquid rumen to cumulative biogas production. The first substrate consisted of 100 g manure and 100 mL rumen (MR 11), while the second one consisted of manure and water in equal weight ratio (MW 11). The biogas production from both substrates was studied, the model parameters were estimated and R2 was 0.9983 for MR 11 and 0.9987 for MW 11. In addition, they have performed further experiments in room temperature and 38.5 °C to investigate the temperature effect on the biogas production from both substrates. Furthermore, this model has been used to plot the biogas production resulted from the co-digestion of horse and cow dung (Yusuf et al. 2011), where they designed five different mixtures of these dungs based on weight. The maximum biogas production potential and the best R2 were achieved for the ratio of 75% horse dung and 25% cow dung, where R2 was 0.998. Moreover, it was utilized to simulate and predict the biogas production evaluated by Lo et al. (2010), where the best R2 was 0.9977 in case of FA/MSW 10 g L-1. Furthermore, concerning the MSW, Nielfa et al. (2015) used this model to simulate the methane production as mentioned before. The best R2 was achieved for the meat/fish mixture with the OFMSW, which was 1.00.
Furthermore, cumulative biogas production models are critical for estimating total biogas yield, which is an essential parameter for system design and economic viability. Data in Table 3, together with Equations (4–8), indicate that the Logistic, Modified Gompertz, and Exponential Rise-to-Maximum models consistently achieve high prediction accuracy (R² typically >0.98–0.99) across a variety of substrates, including cow manure and complex industrial wastes. The Modified Gompertz model stands out for its broad applicability and reliability, successfully fitting data even under inhibitory conditions such as heavy metal exposure. This consistently high performance underscores its prominence as the preferred kinetic model for comprehensively understanding the digestion process and predicting ultimate gas potential.
Less-used models
Some models are rarely used to plot the biogas production resulting from the AD process. This may be due to their complicated formulas, which may contain more than one constant, and hence their difficulty in application. This group of models includes Richard, Stannard, Schunte, and their modified versions. This model used the equation of Richards’s model which is represented by the following equation (Hsieh 2009),
(9)
where are the biogas production potential, delay time, and a constant, respectively, while
is an additional constant that provides more flexibility for the biogas production simulation, as shown by Eq. 10:
(10)
Consider , and depending on the value of m, Eq. 10 will be reduced to: the Gompertz equation if , monomolecular equation if m = 1, logistic equation if m = 2, or the von Bertalanffy if m = 2/3 (Fan et al. 2004).
This model was used by Mu et al. (2007) to investigate the kinetics of hydrogen production from sucrose by mixed cultures, where it showed a good correlation to the experimental data, as R² was 0.994. In addition, it was utilized to investigate the inhibitory effect of four heavy metals on the methane-producing anaerobic granular sludge by Altaş (2009), and R² was greater than 0.99.
In contrast, the Stannard model equation is represented in Eq. 11 (Zwietering et al. 1990),
(11)
where , and are constants. The modified version of the Stannard equation is the same as the modified Richards’ equation, which is given by Eq. 12:
(12)
One more model that belongs to this section is the Schunte model, which is represented by Eq. 13 (Zwietering et al. 1990),
(13)
and its modified version equation is given by Eq. 14 (Zwietering et al. 1990):
(14)
However, no key works were addressed in the literature using both Stannard and Schunte models and their modified versions.
Sigmoidal equations (logistic/modified Gompertz, Richards/Schunte) presume a single dominant population and constant biodegradability; co-digestion, pre-treatment, or staged hydrolysis–acidogenesis–methanogenesis often produce shoulders or long tails (multiple inflections) that a one-sigmoid curve cannot capture (Nielfa et al. 2015; Deepanraj et al. 2017). Parameter equifinality is common: λ often trades off with Dm or K when sampling is sparse (e.g., < daily), and A can absorb gas losses, leakage, or incomplete degassing, inflating uncertainty (Bilgili et al. 2009; Lo et al. 2010). Inhibition episodes flatten mid-slope and shift apparent lag (Altaş 2009; Tian et al. 2020). Mitigations include higher early-phase sampling, consistent methane normalization (STP, dry gas, per gVS), mass-balance checks, and reporting parameter CIs or Bayesian posteriors rather than single best fits.
Machine Learning Approaches
Table 4 summarizes key peer-reviewed studies emphasizing ML applications related to biogas production in AD. Table 4 details the algorithms used, data sources, performance metrics (e.g., correlation coefficient (R²) and root mean square error (RMSE)), comparisons with traditional models when available, and specific AD contexts. The studies reviewed show a shift from mechanistic to data-driven modelling, with ML consistently achieving higher accuracy (R² often above 0.90) than traditional kinetic models like Gompertz or logistic, especially in co-digestion scenarios involving sewage sludge, agricultural waste, or food waste (Asadi and McPhedran 2021; Ling et al. 2024). For example, tree-based models (RF, XGBoost) perform well in full-scale systems because they handle non-linearity and feature importance through SHAP, highlighting key variables like OLR, pH, and biomass input (Zou et al. 2024). Deep learning methods, such as LSTM with attention or TFT, provide probabilistic forecasts and capture long-term dependencies, addressing parameter uncertainty with quantile regression and data augmentation (Jeong et al. 2021). The regression-based models can be updated with new data, but they usually require explicit recalibration or retraining, whereas ML (especially online learning or adaptive ML). Hybrid techniques incorporating GA or PSO for optimization improve biogas yield and stability management (Salamattalab et al. 2024).
Feature engineering and data quality are crucial, as high-frequency SCADA data or derived indices (e.g., VFA/ALK) improve predictions without needing extensive lab measurements (Zou et al. 2024). Incorporating genomics or pre-treatment data expands the input space, connecting microbial communities to performance (Adeleke et al. 2025). Explainable AI tools address the “black-box” issue, building trust and enabling integration with biokinetic equations for physics-informed hybrids (Gupta et al. 2023).
This extension fills gaps in traditional models by enabling multi-dimensional simulations, such as with variable selection networks, and managing stochastic parameters, for instance, through ensembles. Future research should focus on creating standardized datasets, facilitating real-time IoT integration, and developing hybrid ML-mechanistic frameworks to deploy robust AD systems on a large scale.
Therefore, Table 4 shows that ML methods consistently outperform traditional kinetic models in predicting biogas production, particularly for the co-digestion of diverse wastes. Tree-based models (RF, XGBoost) and deep learning approaches (LSTM, TFT) effectively handle non-linearity, probabilistic forecasting, and feature importance. Hybrid optimization techniques (GA, PSO) further improve biogas yield and process stability. High-frequency SCADA data, feature engineering, and genomics enhance prediction accuracy, while explainable AI tools (e.g., SHAP) increase operational trust and allow integration with biokinetic models. These advancements fill gaps in traditional approaches and enable multi-dimensional simulations.
Table 4. Applications of ML in AD (ANN, LSTM, TFT, RF, etc.) Showing High Predictive Performance across Substrates and Processes, often Surpassing Classical Kinetic Models and Enabling Real-time Optimization and Decision Support
Across pilot and full-scale settings, ML methods generally outperform classical kinetic baselines for short-term forecasting and stability proxies, with many studies reporting usable accuracy (often R² ≥ 0.80) for operational decision-making. Tree-based ensembles (RF, XGBoost/CatBoost) are the most reliable with tabular SCADA inputs, while sequence models (LSTM/TFT) capture temporal dependencies and enable probabilistic (quantile) forecasts.
Explainability tools (SHAP/attention) consistently identify OLR, pH, temperature, and feed configuration as primary levers, and soft-sensor surrogates (e.g., VFA/ALK) enhance early warning. Practically, plants can retain modified-Gompertz-type fits for design/batch contexts and layer ML for online supervision, provided basic hygiene (outlier handling, rolling/external validation) is in place to limit overfitting and improve transferability. In practice, ANN models may overfit small datasets and fail to generalize to new substrates or variable operating conditions. Industrial deployment is further constrained by the high cost of sensors, limited data availability, and the complexity of integrating ML models into real-time control systems.
Comparative Performance of Models
To evaluate the relative strengths of different modelling approaches, a comparative analysis was conducted between mathematical models and ML by using ANN techniques applied to biogas production from co-digestion systems. This comparison assessed predictive accuracy using statistical indicators such as R² and RMSE. The results provide insights into the trade-offs between classical kinetic formulations and advanced data-driven methods.
Table 5 presents a comparative analysis between classical and ML models’ performance metrics for predicting biogas production from co-digestion systems for the same dataset (Abdel Daiem et al. 2021). The comparative analysis highlights the performance of both traditional TDMMs and ANN approaches in predicting biogas production from co-digestion systems.
Among the mathematical models, the logistic kinetic formulation emerged as the most accurate, with an R² value of 0.9879, although all mathematical models achieved strong correlations (R² > 0.97). Nevertheless, their relatively large RMSE > 1000 indicates limited predictive precision when applied to dynamic and nonlinear digestion processes, underscoring their inability to capture the complexity of anaerobic digestion fully. In contrast, ANN-based approaches demonstrated considerably lower error margins (RMSE < 10), highlighting their superior capacity to model process variability and nonlinear relationships.
Conventional ANN training methods such as back-propagation, Marquardt–Levenberg, and ant colony optimization yielded moderate-to-high predictive accuracy (R² between 0.89 and 0.92); however, the integration of metaheuristic optimization techniques substantially improved performance. Specifically, the MFFNN-MFO model achieved near-perfect predictive accuracy (R² = 0.9994; RMSE = 3.86), clearly outperforming both conventional ANN structures and mathematical models. These findings illustrate the value of ANN models, particularly when coupled with advanced optimization algorithms, in addressing the complexity of anaerobic digestion systems and emphasize the potential of hybrid ANN–optimization frameworks as robust and reliable predictive tools for biogas production modelling.
Table 5. Comparative Analysis between Classical and ML Models’ Performance Metrics for Predicting Biogas Production from Co-digestion Systems (Abdel Daiem et al. 2021)
RESEARCH GAPS AND AVAILABLE FUTURE EXTENSIONS
Following the previous review of the mathematical modelling of the AD process, some research gaps have arisen, which can be considered promising candidates for future extensions. These gaps may be concluded as follows.
Future Extensions: Actionable Directions AD
Recent practice in AD has introduced dosing of conductive materials (e.g., biochar, Fe₃O₄) to stimulate direct interspecies electron transfer (DIET) (Lo et al. 2010). A natural extension is to augment cumulative kinetic models (e.g., Chen–Hashimoto, modified Gompertz) with a conductivity/DIET factor,
(15)
where ϕ denotes the mass fraction of conductive additive and d a representative particle size, this formulation preserves parameter interpretability while explicitly linking additive dosing to performance. Calibration requires only routine operational data (biogas rate, temperature) supplemented with two readily available proxies: oxidation, reduction potential, and slurry conductivity. Toxic inhibition (e.g., free NH₃, sulfide, LCFA) can be included multiplicatively via Haldane-type terms, allowing operators to evaluate when inhibitory effects offset DIET benefits and to adjust set-points accordingly (Lo et al. 2010).
For control-oriented applications, the process can be represented by two coupled states, hydrolysis/acidogenesis and methanogenesis, driven by measurable or soft-sensed variables. The following equations define a minimal state-space model,
x = [S_VFA, X_meth] (16)
ẋ = f(x, OLR, T, pH), (17)
with outputs including biogas flow and a soft VFA/ALK indicator derived from pH, alkalinity, and gas rate. An extended Kalman filter or moving-horizon estimator can integrate SCADA data with the soft sensor to reconstruct unmeasured states and provide (1 to 3) day acidification risk bands, enabling operators to connect forecasts to actionable levers (e.g., OLR ramping, temporary set-point changes, co-substrate throttling) (Schroer and Just 2023).
Given the prevalence of small, noisy datasets, plant-level kinetic parameters should be treated as random effects, e.g., (A, λ, Dm)_j ~ N(μ, Σ) for plant j. Partial pooling stabilizes estimates in data-scarce settings while retaining site-specific behaviour. Multi-facility fitting with leave-one-plant-out validation quantifies transferability, producing plant-specific posterior distributions with credible intervals. These can be propagated into risk-aware dashboards and sustainability KPIs (e.g., GWP per kWh, LCOE), ensuring that uncertainty is explicitly visible in decision-making (Gala 2021).
For forecasting with tree- or sequence-based ML models, embedding domain constraints is essential: monotonicity of biogas rate with OLR (within safe ranges), positive correlation of VFA with OLR, and soft penalties for mass-balance violations. Residual-based change-point detection (e.g., CUSUM, Bayesian online methods) can flag operational regime shifts (feedstock change, mixer outage). These triggers initiate lightweight re-tuning and widen predictive intervals, transforming ML from a static predictor into an operator-safe assistant (Ling et al. 2024).
Finally, the experimental design can be optimized to reduce the cost of BMP and pilot trials. Starting from a Latin-hypercube of feed ratios and pre-treatments, cumulative or hybrid models are fitted, and the next experimental point is selected by maximizing expected reduction in parameter uncertainty under safety constraints (e.g., VFA/ALK ≤ threshold). This adaptive loop accelerates the development of decision-quality models for novel feedstock mixtures while minimizing resource requirements (Tiwari et al. 2025).
Incorporating Parameter Uncertainty
Estimating the model parameters is one of the main objectives when simulating biogas production over the AD process using mathematical modelling. However, if the same AD process has been repeated enough times, these parameters are expected to vary slightly from time to time. Few studies estimated the ranges of some model parameters to investigate their variations. For example, Kumar et al. (2004) achieved a qualitative assessment study of different methane emission data using municipal solid waste disposal sites; Danner (2006) considered the parameter uncertainty for some of the growth models; Budiyono et al. (2010) estimated the parameters’ ranges in the modified Gompertz equation that was used to simulate the biogas production resulting from cattle manure.
Mathematically, to express these parameters more accurately, they may be described as random variables rather than deterministic ones. In such a case, a general parameter, can be expressed by the following equation (Ghanem and Spanos 2003),
(18)
where is a controlling factor for the random part and
is a random variable that describes the expected uncertainty in the deterministic value of
. The random variable
is a real-valued measurable function defined on a probability space as
defined on the triple probability space
. This random variable can be assigned entirely by repeating the AD process a relatively large number of times, then estimating the model parameters in each time. For each parameter, the obtained values can then be plotted to determine its probability distribution and its statistical moments such as mean, variance, skewness, and kurtosis, so that a complete definition for this uncertain parameter will be available. The repetition of the AD process several times to determine the parameter uncertainty requires short-time processes and many reactors working simultaneously. Moreover, when the parameter uncertainty is more complicated and expected to have higher fluctuations with time, the random part can be expressed as a random process as Eq. 16,
(19)
where is a second-order random process with a finite variance. This random process can be expanded into random variables multiplied by deterministic constants using K-L expansion, as Eq. 17 (Ghanem and Spanos 2003),
(20)
where 𝛾(𝑡) is the mean value of 𝛾𝑡;𝜃, 𝜉𝑖𝜃𝑖=1∞ is a set of uncorrelated random variables, 𝜆𝑖,𝑓𝑖𝑡 are the eigenvalues and Eigen functions, respectively. Both can be evaluated by solving the integral Eq. 18,
(21)
where D is the time domain over which is defined and
Including these parameters, uncertainty in the model equation yields a probability distribution curve for the biogas production every time. This provides the expected value (mean), variance, different quartiles, required threshold values, and statistical moments for the biogas production. This probably gives a clear vision of the AD process. Such stochastic approaches could also incorporate sensitivity analysis to identify dominant parameters influencing biogas yield variability. This concept has been applied successfully in many fields (Galal 2013, 2021) and could provide the designers with the system’s random response due to these uncertain parameters.
Multidimensional Mathematical Models
The existing models usually plot the biogas production with time under certain conditions, such as the operating temperature, mixing ratio, heavy metal concentration, etc. This yields a single plot for the biogas production versus time for each realization of these conditions. However, these models can be extended to cases with two or more dimensions. This extension to multidimensional modelling can be conducted through an equal number of curve-fitting steps. To implement this extension to a multi-dimensional case, consider a mathematical model with three parameters A, b, and k , then consider several variables such as the time, which is defined as , the mixing ratio defined as