Skip to main content

Enhancing unconfined compressive strength of stabilized soil with lime and cement prediction through a robust hybrid machine learning approach utilizing Naive Bayes Algorithm


The unconfined compressive strength (UCS) of stabilized soil with lime and cement is a crucial mechanical factor in developing accurate geomechanical models. In the past, determining UCS required laborious laboratory testing of core samples or complex well-log analysis, both of which consumed many resources. This study introduces a novel method for real-time UCS prediction while acknowledging the need for efficiency. This method makes use of Specific Naive Bayes (NB) predictive models that are strengthened by the smell agent optimization (SAO) and the Dynamic Arithmetic Optimization Algorithm (DAOA), two reliable meta-heuristic algorithms. Combining these algorithms improves prediction precision while streamlining the process. By examining UCS samples from various soil types obtained from earlier stabilization tests, these models are validated. This study identifies three different models: NBDA, NBSA, and a single NB. The individual insights each model provides work in concert to increase the overall UCS prediction accuracy. This approach represents a significant advancement in UCS prediction methodologies, revealing a quick and effective method with wide-ranging implications for various geomechanical applications. Meta-heuristic algorithms combined with particular NB models produce promising results, opening up new possibilities for real-time UCS estimation across various geological scenarios. Especially noteworthy are the NBDA model’s impressive performance metrics. The entire dataset achieves an R2 value of 0.992 during testing. The RMSE of 108.69 for the NBDA model during the training phase also shows that it has the best performance overall. It consistently exhibits commendable generalization and predictive abilities that outperform those of the developed NB and NBSA models, highlighting its usefulness and effectiveness in practical applications.



Geotechnical engineering \(({\text{GE}})\) is no exception to the recent proliferation of machine learning \(({\text{ML}})\) techniques that have permeated many industries [1]. This increase in interest covers a broad range of tasks, from landslide detection to material property prediction. Developing predictive models specifically designed to tackle \({\text{GE}}\) challenges denotes the fusion of technological sophistication and exceptional efficacy [2, 3]. The imperative significance of using these models becomes more evident as the demand for complex predictive models increases across numerous geotechnical domains. This implementation journey is further supported by ongoing developmental steps that improve their ability to deal with the complex range of geotechnical intricacies [4, 5].

At the heart of engineering endeavors lies the compaction of loose soils, a pivotal process that elevates the weight per unit area of structures like earth dams and highway embankments. This compaction venture transcends mere strength enhancement; it fortifies soil endurance, augments load-bearing capacity, and stabilizes embankment slopes against settlement issues [6]. Beyond bolstering strength, compaction offers many benefits, including porosity, volume, permeability, density, and waterproofing enhancements. These improvements collectively enhance soil quality, amplifying its ability to support structural loads. The linchpin of geomechanical models, the unconfined compressive strength \(({\text{UCS}})\), stands as a pivotal metric in the realm of mechanical rock behavior [7, 8]. \({\text{UCS}}\) signifies the maximum compressive stress a rock can withstand under controlled, uniaxial loading before failure occurs. Rock mechanics, fusing theoretical concepts with real-world applications, illuminates how rocks respond to diverse stress scenarios [9,10,11]. The ramifications of rock failure extend to issues like solids production and wellbore instability, particularly pertinent in petroleum contexts. Access to \({\text{UCS}}\) data from subsurface formations holds critical importance in drilling operations. This data reservoir steers bit hydraulics dynamics determines optimal mud weights for drilling, regulates drilling costs, and optimizes drilling performance [12].

Laboratory tests conducted on extracted core samples, providing insights into actual stress state conditions and mechanical attributes, constitute the cornerstone of directly assessing mechanical rock properties [13]. These tests encompass an array of evaluations, spanning uniaxial and triaxial compressive strength tests, scratch tests, Schmidt hammer tests, and point load tests. These methods represent the gold standard for property determination [14]. However, the seamless profiling of \({\text{UCS}}\) along wellbores is impeded by challenges associated with acquiring representative core samples, such as high costs and time-intensive procedures. To circumvent this limitation, indirect methodologies are devised, bridging gaps by correlating rock characteristics with petrophysical well-log data [15].

The significance of \({\text{UCS}}\) extends beyond rocks to encompass various materials like soils and industrial byproducts, exerting a profound impact on foundation design, slope stability analysis, and structural resilience. \({\text{UCS}}\) emerges as a pivotal factor in the realm of stabilized materials, molding the appearance and functionality of pavements [16]. However, the computation of material \({\text{UCS}}\) involves navigating through an array of variables, physicochemical properties, types of cementitious admixtures, and curing time. These variables demand precisely orchestrated laboratory investigations and specialized equipment [17]. The credibility of these tests hinges on the pursuit of precision, reflected in specimen dimensions. The quest for alternative methodologies in unveiling the \({\text{UCS}}\) of stabilized materials, such as pond ashes, is spurred by the formidable nature of these tests, coupled with resource-intensive requirements and the challenge of locating representative samples [18].

A new era of soft computing methodologies has arisen in response to the evolving landscape of the past few decades, with artificial neural networks \(({\text{ANNs}})\) reigning supreme in the realm of \({\text{UCS}}\) prediction for soil and rock materials [19,20,21]. These studies underscore the potency of neural network paradigms. The backpropagation neural network-based modeling algorithm is central to this paradigm, orchestrating a symphony of learning parameters, including learning rate, momentum, the optimal configuration of hidden layer nodes, and layer depth [22]. Prudent management of training iterations ensures robust predictive performance, guarding against the perils of overtraining [23, 24].

Literature review

Majdi and Rezaei [12] utilized artificial neural network \(({\text{ANN}})\) and multivariable regression analysis \(({\text{MVRA}})\) models to predict \({\text{UCS}}\), employing a database comprising 93 different rock samples. Comparison metrics, including R2, variance accounted for \(({\text{VAF}})\), mean absolute error \(({\text{MAE}})\), and mean relative error \(({\text{MRE}})\), were used to evaluate model performance. The results clearly indicated the superior performance of the \({\text{ANN}}\) model compared to the \({\text{MVRA}}\) model in predicting \({\text{UCS}}\). Ceryan et al. [25] focused on employing the Levenberg-Marquardt algorithm-based artificial neural network \(({\text{LM}}-{\text{ANN}})\) to predict \({\text{UCS}}\). The developed LM-ANN model demonstrated superior performance compared to the multiple linear regression (REG) model, with an R2 of \(0.884\) and an \({\text{RMSE}}\) of 1.11 kN/m2. Hoque et al. [26] introduced the Random Forest \(({\text{RF}})\) method for predicting the \({\text{UCS}}\) of polypropylene-stabilized soft soil. The model's accuracy was assessed using various metrics, and their RF model achieved an R2 of \(0.8942\) and an \({\text{RMSE}}\) of \(0.250\) kN/m2. The results indicate that the proposed RF and sequential models effectively predict the UCS of polypropylene-stabilized soft soil. Notably, this approach proves to be more convenient and less time-consuming than traditional labor-intensive laboratory procedures.

Onyyelowe et al. [27] employed \({\text{AI}}-\) based bi-input predictive models to forecast the properties of black cotton soil \((BCS)\) treated with waste-based high silica content densified ash (HSDA). The A-7 group BCS was treated with varying percentages of HSDA. Desiccation tests monitored changes in weight, diameter, height, and crack development over 30 days. XRF and SEM analyses revealed enhanced pozzolanic strength and increased ettringite and gel formation in treated samples. Intelligent models (ANN, GP, EPR) predicted bulk density, linear shrinkage, crack width, and volumetric shrinkage. \({\text{EPR}}\) demonstrated superior accuracy in predicting bulk density and crack width (98.2% and 92.7%), while ANN excelled in predicting linear and volumetric shrinkages (98.8% and 99.3%). Ebid et al. [28] addressed challenges in unsaturated soils used for construction, particularly their unfavorable reactions to seasonal swell and shrink cycles. Additive stabilization processes were employed to enhance soil volume change characteristics. Supplementary binders from solid waste powder materials were introduced to mitigate environmental hazards associated with conventional cement use. Despite equipment limitations, intelligent prediction techniques, including genetic programming \(({\text{GP}})\) and \({\text{ANN}}\), were used to forecast consistency limits. Hybrid cement (HC), a blend of nanostructured quarry fines and hydrated-lime-activated nanostructured rice husk ash, was employed. Experimental data on varying HANRHA dosages formed the prediction database. Stabilization exercises showed substantial soil property improvements, and ANN outperformed GP in accuracy assessments. Nanostructuring, the binder material, contributed to successful soil improvement and efficient model predictions. Onyyelowe et al. [29] emphasized the critical role of soil stability and durability in foundation constructions, necessitating soil stabilization to meet design requirements. The research focused on predicting the UCS of unsaturated lateritic soil treated with HC using GP. HC, a blend of nanotextured quarry fines and hydrated lime-activated nanotextured rice husk ash, served as the binder material. GP was employed to forecast UCS with varying complexities. Results revealed the superiority of the four-level complexity GP model, demonstrating robustness and flexibility in predicting engineering problems with high accuracy (SSE 2.4%, R2 = 0.991).


The core focus of this investigation revolves around the prognostication of vital soil attributes, with particular emphasis on \({\text{UCS}}\) predictions, utilizing an \({\text{ML}}\) methodology. In light of the challenges linked to acquiring empirical data, this study centers on harnessing the power of the Naive Bayes \(({\text{NB}})\) algorithm. However, to elevate the performance of the \({\text{NB}}\) model, meticulous parameter tuning is indispensable. To navigate this hurdle, a fusion of two algorithms, namely, the smell agent optimization \(({\text{SAO}})\) and the Dynamic Arithmetic Optimization Algorithm \(({\text{DAOA}}),\) is strategically employed. The study sheds light on the profound and positive influence that refining the design and construction of \({\text{UCS}}\)-related structures can exert on the infrastructure sector. By amassing an extensive dataset of \({\text{UCS}}\) values, the study facilitates comprehensive comparative analyses aimed at gauging the effectiveness of the proposed framework. The insights from the study’s outcomes offer valuable guidance for the accurate anticipation of \({\text{UCS}}\) within civil engineering ventures. The study's approach hinges on integrating the \({\text{NB}}\) algorithm into the \({\text{ML}}\) strategy, thereby forming the bedrock for \({\text{UCS}}\) prediction. The intricacies associated with procuring empirical \({\text{UCS}}\) data are mitigated through a process that optimizes the parameters of the \({\text{NB}}\) model, expertly orchestrated by the symbiosis of the \({\text{DAOA}}\) and \({\text{SAO}}\) algorithms. In essence, this research furnishes not only pragmatic directives but also indispensable knowledge for tackling UCS prediction, an indispensable facet entwined with soil behavior in civil engineering endeavors.


Data gathering

A comprehensive and exhaustive approach has been taken to evaluate the \({\text{UCS}}\) in soil. This effort involves considering various factors and has been pursued with great dedication. In this pursuit, the data is meticulously divided into three distinct sets: training \((70\%)\), validation \((15\%)\), and testing \((15\%)\). This distribution has been empirically validated and consistently proven to enhance the performance of predictive models. It is important to note that this method is finely tuned to incorporate six essential variables that cover a wide series of soil properties and the relative composition of soil components. The utilized dataset contains 187 samples. These components include cement, lime, liquid limit \(({\text{LL}})\), plastic limit \(({\text{PL}})\), and Plasticity Index \(({\text{PI}})\). Subsequently, these samples undergo rigorous laboratory tests succinctly described in Table 1 [9, 30,31,32].

  • Soil: the percentage composition of soil in the sample being tested.

  • Cement: this is an important parameter in soil stabilization, where cement is often added to improve the strength and durability of the soil.

  • Lime: lime is another common additive used in soil stabilization to enhance properties such as workability and strength.

  • LL: the moisture level at which soil under typical circumstances changes from a plastic to a liquid form is known as the liquid limit. It is an important property for understanding the behavior of soil.

  • PL: the PL is the moisture content at which the soil starts behaving in a plastic manner and can be molded without breaking. It is another crucial parameter in soil mechanics.

  • PI: the PI is the difference between the LL and the PL. It provides a measure of the plasticity of the soil, indicating its ability to undergo deformation without cracking.

  • UCS: this is a measure of the ability of a soil sample to withstand axial loads without confinement. It is a critical parameter in geotechnical engineering, reflecting the soil's strength under unconfined conditions.

Table 1 Statistical properties of input and UCS

In order to mitigate the risks of overfitting or underfitting, a randomized permutation of the data, referred to as randperming, was performed. Furthermore, normalization techniques were applied to mitigate the impact of data outliers. Subsequently, to ensure robust model performance and assess generalizability, k-fold cross-validation was employed. Additionally, the Table 5 in Appendix has a discussion of the test dataset.

Figure 1 illustrates a tabular presentation where each cell contains a numeric value that signifies the correlation coefficient between the variables stated in the corresponding row and column. The figure depicts a correlation matrix or coefficients representing the relationships among various soil property-related variables. The variables identified include soil \((\%)\), cement \((\%)\), lime \((\mathrm{\%})\), \(\mathrm{LL }(\mathrm{\%}),\) \(\mathrm{PL }(\mathrm{\%})\), \(\mathrm{PI }(\mathrm{\%})\), and UCS \(({\text{kN}}/{{\text{m}}}^{2})\). These coefficients measure the strength and direction of a linear association between pairs of variables. The values fall within the range of \(- 1\) to \(1\):

  • A positive number indicates a positive correlation, meaning that there is a tendency for both variables to grow when one increases.

  • On the other hand, a negative number indicates a negative correlation, meaning that one variable tends to decrease as the other increases.

  • A number that gets close to zero indicates that there is little to no linear connection between the variables.

Fig. 1
figure 1

The correlation plot between input and output

In addition, Fig. 2 indicates the histogram distribution for the input and output variables.

Fig. 2
figure 2

Histogram distribution for the input and output variables

Naive Bayes (NB)

Utilizing the Bayes theorem and supposing robust feature independence, the NB model is probabilistic in nature. Its simple design, which eliminates the need for complex iterative parameter estimate methods, is its main advantage. Das et al. further point out that the NB is resistant to noise and unimportant characteristics [33]. The following equation is the basis of the NB:

$$y=\underset{{y}_{i}=\left\{l{\text{andslide}},\mathrm{ non}-{\text{landslide}}\right\}}{\mathrm{arg max}}P({y}_{i})\coprod_{i=1}^{14}P(\frac{{x}_{i}}{{y}_{i}})$$

where \(P({y}_{i})\) is the prior probability of \({y}_{i}\), \(P(\frac{{x}_{i}}{{y}_{i}})\) is the posterior prospect, and It is computed using:

$$P\left(\frac{{x}_{i}}{{y}_{i}}\right)=\frac{1}{\sqrt{2\pi \sigma }}{e}^{\frac{{-({x}_{i}-\mu )}^{2}}{2{\sigma }^{2}}}$$

Where \(\mu\) is the mean and \(\sigma\) is the standard deviation of \({x}_{i}\). Figure 3 shows the structure of NB.

Fig. 3
figure 3

Structure of NB

Dynamic Arithmetic Optimization Algorithm (DAOA)

The core arithmetic optimization algorithm has been upgraded with a novel accelerator function that integrates two dynamic attributes intended to amplify its efficacy. The dynamic variant adjusts the search phase and potential solutions within the optimization procedure, modifying the balance between exploration and exploitation. A noteworthy trait of \({\text{DAOA}}\) is its freedom from the need for initial parameter refinement, distinguishing it from contemporary metaheuristics.

DAOA’s dynamic accelerated function

In a dynamic environment, the dynamic accelerated function \(({\text{DAF}})\) has a significant impact on the search phase of the arithmetic optimization algorithm. The initial \({\text{Min}}\) and \({\text{Max}}\) values of the quicker function must be modified to account for the \({\text{AOA}}\). However, since a new downward function can take the place of the \({\text{DAF}}\), it would be preferable to have an algorithm that does not depend on changeable internal parameters. This is the presentation of the modification factor of the optimization algorithm:


Here, \({\text{It}}\) represents the current iteration number, \({{\text{It}}}_{{\text{Max}}}\) denotes the maximum number of iterations, and \(a\) is a constant value. This function is reduced with each iteration of the algorithm.

Dynamic candidate solution for DAOA

The dynamic properties of potential \(DAOA\) solutions are presented in this section. The exploitation and exploration phases are crucial for metaheuristic algorithms, and maintaining a proper balance between them is essential for the algorithm's success. By dynamically updating the positions of each solution based on the best solution so far found during the optimization process, the proposed dynamic version of the algorithm seeks to improve the exploitation and exploration phases. Equations (4) and (5) each have the dynamic candidate solution \(({\text{DCS}})\) function added to them in the enhanced version.

$${x}_{i,j}=\left({C}_{it+1}\right)=\left\{\begin{array}{c}best({x}_{j})\div (DCS+\in )\times ((U{B}_{j}-L{B}_{j})\times \mu +L{B}_{j})), r2<0.5\\ best({x}_{j})\times DCS\times ((U{B}_{j}-L{B}_{j})\times \mu +L{B}_{j})),Otherwise\end{array}\right.$$
$${x}_{i,j}=\left({C}_{it+1}\right)=\left\{\begin{array}{c}best\left({x}_{j}\right)-DCS\times ((U{B}_{j}-L{B}_{j})\times \mu +L{B}_{j})), r3<0.5 \\ best\left({x}_{j}\right)+DCS\times ((U{B}_{j}-L{B}_{j})\times \mu +L{B}_{j})),Otherwise\end{array}\right.$$

To consider the effects of the reducing percentage in candidate solutions, the \({\text{DCS}}\) function is introduced. As shown below, its value decreases with each algorithm iteration.

$$DCS\left(t+1\right)=DCS(t)\times 0.99$$

Empirical evidence from many search agents and iterations shows that the inclusion of potential solutions in DAOA dramatically accelerates the convergence rate of AOA. The quality of the solutions that are obtained improves due to these improvements. The ability of metaheuristic algorithms to operate without the need for any parameters is frequently regarded as advantageous. By utilizing adaptive parameters, the algorithm reduces the number of parameters that need to be tuned to just two: maximum iteration and population size. Contrast this with competing algorithms, which call for parameter adjustments for various issues. One of the shortcomings of this method is its adaptive mechanism, which relies on the iteration counter instead of fitness improvement. Algorithm 1 displays the DAOA pseudo-code, whereas Fig. 4 displays the DAOA flowchart.

figure a

Algorithm 1 Pseudo-Code of DAOA

Fig. 4
figure 4

Flowchart of DAOA

Smell agent optimization (SAO)

Salawudeen et al. [34] introduced the \({\text{SAO}}\) algorithm as a modern optimization technique, focusing on the correlation between smell-emitting objects and olfactory agents. In the sniffing mode [35], the \({\text{SAO}}\) algorithm comprises the agent’s abilities to detect and locate smell molecules and to make decisions regarding the search for the source of these molecules [36]. The agent uses the SAO algorithm to track the scent molecules during the trailing manner after making judgments during the sniffing mode [37]. Moreover, the SAO algorithm includes a random mode for the agent to prevent becoming stuck in local optimal solutions [38, 39].

Sniffing mode

Smell molecules are initialized using Eq. (8).

$${X}_{i}^{(m)}=\left[\begin{array}{ccc}{x}_{(\mathrm{1,1})}& {x}_{(\mathrm{1,2})}& {x}_{(1,D)}\\ \vdots & \vdots & \vdots \\ {x}_{(N,1)}& {x}_{(N,2)}& {x}_{(N,D)}\end{array}\right]$$

The terms \(D\), \(N\), and \(m\) may be used interchangeably to denote the total number of variables, iterations, and decision variables. One may use Eq. (9) to determine the ideal location of the agent.

$${X}_{i}^{(m)}=LB+{r}_{0}\times ({UB}_{i}-{LB}_{i})$$

Represented in the context is a \(random\) value between \(0\) and \(1\), along with the \({\text{upper}}\) and \(lower\) bounds by \({r}_{0}\), \({\text{UB}}\), and \({\text{LB}}\), respectively. The speed at which scent molecules disperse from their source or origin is given by Eq. (10).

$${v}_{i}^{(m)}=\left[\begin{array}{ccc}{v}_{(\mathrm{1,1})}& {v}_{(\mathrm{1,2})}& {v}_{(1,D)}\\ \vdots & \vdots & \vdots \\ {v}_{(N,1)}& {v}_{(N,2)}& {v}_{(N,D)}\end{array}\right]$$

Equation (11) is used to update the velocity of the dispersed molecules in a Brownian form.

$${x}_{i}^{m+1}={x}_{i}^{m}+{v}_{i}^{m+1}\times \Delta t$$

Assuming \(\Delta t\) is equal to \(1\).

Equation (12) calculates the velocity update in the smell molecules.


The updated velocity component, \(v\), is obtained using Eq. (13).


\(T, k\), and \(M\) represent the constants for temperature, molecule mass, and smell, individually.

Trailing mode

In this mode, the agent’s travel toward scent sources is modeled by Eq. (14) which shows the agent's search behavior.

$${x}_{i}^{m+1}={x}_{i}^{m}+{r}_{2}\times olf\times \left({x}_{{\text{agent}}}^{m}-{x}_{i}^{m}\right)-{r}_{3}\times olf\times ({x}_{{\text{worst}}}^{m}-{x}_{i}^{m})$$

Accidental values between \(0\) and \(1\), signified by \({r}_{3}\) and \({r}_{2}\), are used to decrease the impact of olfaction capacity \(olf\) on \({x}_{{\text{agent}}}^{m}\) and the effect of \(olf\) on \({x}_{{\text{worst}}}^{m}\)

Random mode

Equation (15) is used to illustrate the smell agent's erratic motion.


\({\text{SL}}\) refers to stage length, while \({r}_{4}\) is a random value that decreases its impact [40]. Figure 5 shows the flowchart of SAO.

Fig. 5
figure 5

Flowchart of SAO

Performance evaluation methods

In this study, various evaluation criteria for hybrid models are presented, emphasizing their correlation and error rates. The valuation metrics discussed in this discussion include mean absolute error \(({\text{MAE}})\), coefficient of correlation (R2), Nash-Sutcliffe efficiency \(({\text{NSE}})\), root mean square error \(({\text{RMSE}})\), and \(U95\). The mathematical equations for each of these metrics are listed below. An algorithm that excels in the train, validation, and test stages is one with an R2 value that is close to \(1\). Lower values of metrics like \({\text{RMSE}}\), \({\text{NSE}}\), and \({\text{MAE}}\), on the other hand, are preferred because they denote a lower level of model error.

Coefficient of correlation (R2)

$${R}^{2}={\left(\frac{{\sum }_{i=1}^{N}\left({h}_{i}-\overline{h }\right)\left({z}_{i}-\overline{z }\right)}{\sqrt{\left[{\sum }_{i=1}^{N}{\left({h}_{i}-h\right)}^{2}\right]\left[{\sum }_{i=1}^{N}{\left({z}_{i}-\overline{z }\right)}^{2}\right]}}\right)}^{2}$$

Root mean square error \(({\text{RMSE}})\)

$$RMSE=\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}{\left({z}_{i}-{h}_{i}\right)}^{2}}$$

Mean absolute error \(({\text{MAE}})\)


Uncertainty 95% \(({\text{U}}95)\)

$${U}_{95}=\frac{1.96}{N}\sqrt{ \sum_{i=1}^{N}{\left({h}_{i}-{z}_{i}\right)}^{2}+\sum_{i=1}^{N}{\left({h}_{i}-{z}_{j}\right)}^{2}}$$

Nash-Sutcliffe efficiency \(({\text{NSE}})\)

$$NSE=1-\frac{{\sum }_{i=1}^{N}{({h}_{i}-{z}_{i})}^{2}}{{\sum }_{i=1}^{N}{({z}_{i}-\overline{z })}^{2}}$$

The variables \(N\), which stand for the number of samples, \({h}_{i}\), \(\overline{h }\), and \(\overline{z }\), which stand for the mean predicted and measured values, respectively, and \({z}_{i}\), which alternatively stands for the measured value.


Table 2 presents the results of the hyperparameter tuning for three different models: NB, NBDA, and NBSA. The alpha and binarized parameters are the hyperparameters that were tuned. The alpha parameter is a smoothing factor that is used to control the fuzziness of the predictions. The binarize parameter is a threshold that is used to convert the fuzzy predictions into binary predictions. The results show that the best hyperparameter settings for the NB model are alpha = 1 and binarize = 0. For the NBDA model, the best hyperparameter settings are alpha = 1 and binarize = 0.70274. For the NBSA model, the best hyperparameter settings are alpha = 2 and binarize = 0.927.

Table 2 Result of hyperparameter for developed models

Results and discussion

The three models used in the study to forecast \({\text{UCS}}\) were \({\text{NB}}\), \({\text{NBDA}}\), and \({\text{NBSA}},\) as indicated in Table 3. Three distinct evaluation phases were used to assess their performance: train \((70\%)\), validation \((15\%)\), and test \((15\%)\). This careful distribution was done to guarantee fair assessments. The study produced more accurate and reliable UCS estimates using this configuration and cutting-edge methodology, which increased the accuracy of soil analyses and enabled better decision-making for various engineering and construction projects. These percentages were distributed purposefully based on empirical data consistently showing improved model performance in this framework. To further evaluate and compare the algorithms, the evaluation process used five statistical metrics: \({\text{NSE}}\), \({\text{MAE}}\), R2, \({\text{RMSE}}\), and \({\text{U}}95\). A pivotal aspect of the assessment revolved around the R2 values of the models, indicating the degree to which the self \(-\) determining variable elucidates variance in the reliant variable. Notably, the testing phase underscored the \({\text{NBDA}}\) model’s supremacy, boasting an exceptional predictive accuracy exemplified by an outstanding R2 value of \(0.992\), outshining its counterparts.

Table 3 Performance indices of proposed models

Conversely, the \({\text{NB}}\) model exhibited slightly lower R2 values during testing, measuring \(0.972\). Beyond R2 values, the study examined additional error indicators, notably the \({\text{RMSE}}\) spanning from \(108.69\) to \(237.40\). Worth noting is the observation that during testing, the \({\text{NB}}\) model displayed the highest \({\text{RMSE}}\), while the \({\text{NBDA}}\) model showcased the lowest during training. Likewise, the \({\text{U}}95\) metric indicated the \({\text{NB}}\) model’s peak value of \(542.44\) during validation, whereas the \({\text{NBDA}}\) model achieved the lowest value of \(300.52\) during training. In terms of \({\text{MAE}}\), the \({\text{NB}}\) model exhibited the highest at \(167.89\), while the \({\text{NBDA}}\) model emerged as the frontrunner, presenting the most favorable \({\text{MAE}}\) values. As for \({\text{NSE}}\), the \({\text{NBDA}}\) model demonstrated the highest and most favorable value of \(0.991\) during both the train and test stages. Despite the \({\text{NB}}\) model’s promising metrics in certain aspects, the comprehensive findings incontrovertibly established the \({\text{NBDA}}\) model’s superiority over \(NB\) and \({\text{NBSA}}\) across multiple phases. Ultimately, these outcomes strongly imply that the incorporation of \({\text{DAOA}}\) optimization significantly bolstered the \({\text{NB}}\) model’s \({\text{UCS}}\) prediction capabilities, positioning the \({\text{NBDA}}\) model as the optimal choice among the evaluated alternatives.

Table 4 indicates the comparison between the current and published study. The present study was compared with Hoque et al. [26], Ceryan et al. [25], and Sharma and Singh [41]. The comparison between the models was evaluated with R2 and RMSE metrics.

Table 4 Comparison between the current and published article

Figure 6 depicts a scatter plot that contrasts the predicted values of three hybrid models, NBDA, NBSA, and a single NB model with their corresponding actual values. The plot’s central line and three linear regressions overlaid represent the training, validation, and testing phases. The findings highlight a notably strong positive correlation between the predicted and actual values across all three models, underscoring the considerable predictive accuracy of these models. Notably, the scatter plot accentuates the superiority of NBDA over the other models. This is evident in the exact clustering of data points around the linear regression lines, underscoring its exceptional accuracy. In contrast, both NB and NBSA exhibit greater dispersion of data points. The linear regression lines of NBSA and NB models exhibit similar slope and intercept values, indicating comparable predictive capabilities between these two models.

Fig. 6
figure 6

The scatter plot for developed hybrid models

Figure 7 illustrates a Column plot showcasing two hybrid models, \({\text{NBDA}}\) and \({\text{NBSA}}\), alongside an individual \({\text{NB}}\) model, all used for predicting \({\text{UCS}}\). The plot consists of three subplots representing R2, \({\text{RMSE}}\), and \({\text{MAE}}\) scores for each model across their respective developmental phases. In the R2 subplot, it is evident that the \({\text{NBDA}}\) model achieves the highest and most favorable values during the train and test stages. Conversely, the \({\text{NB}}\) model records the lowest R2 values in the train and test stages. Shifting focus to the \({\text{RMSE}}\) subplot, it becomes apparent that the \({\text{NBDA}}\) model attains the lowest \({\text{RMSE}}\) value during the training phase, signifying its superior predictive accuracy in comparison to the other models.

Fig. 7
figure 7

The parameters comparison of developed models

On the contrary, the \({\text{NB}}\) model exhibits the highest \({\text{RMSE}}\) value within the same training phase. Examining the \({\text{MAE}}\) subplot, the \({\text{NBDA}}\) model consistently outperforms the other models across all three developmental stages, securing the best \({\text{MAE}}\) values. In summary, Fig. 7 provides a comprehensive visualization of the predictive performance of the hybrid models (\({\text{NBDA}}\) and \({\text{NBSA}}\)) and the individual \({\text{NB}}\) model in forecasting \({\text{UCS}}\). The plots reveal that the \({\text{NBDA}}\) model consistently demonstrates the highest predictive accuracy across various evaluation metrics and developmental phases.

Figure 8 indicates the comparison between the predicted and measured values. In this particular format, optimal alignment is achieved when the lines pertaining to predicted values coincide with the measured values. Notably, the NB model exhibits a pronounced dissimilarity, thereby contributing to an escalation in the percentage of error. Among the hybrid models, the most substantial disparity is observed in the overall performance of the NBSA. In contrast, the NBDA, characterized by its superior accuracy, manifests the lowest percentage of error.

Fig. 8
figure 8

Comparison between the predicted and measured UCS

The distribution of error percentages among the presented models during the previously mentioned train, validation, and test phases is shown in a histogram-density plot, as shown in Fig. 9. The \({\text{NBDA}}\) model, in particular, showed remarkable accuracy, maintaining the lowest error rates throughout all stages, ranging from – 10 to \(10\%\). In contrast, the error rate for the NB model varied between – 20% and \(20\%\) throughout the training phases. Although there was a discrepancy, all three models showed commendable accuracy in making predictions.

Fig. 9
figure 9

The error rate percentage for the hybrid models is based on the histogram density

A half-violin plot representing the error percentages connected to the models this study’s models is shown in Fig. 10. The NBDA model had an astounding 0% mean error rate throughout the training phase. Its error distribution was well-formed, with little dispersion and a normal curve. The distribution of errors was consistently favorable, maintaining values that remained below the \(20\%\) threshold. In contrast, the \({\text{NB}}\) model showed dispersion in both phases, featuring a normally distributed curve that was symmetrical and uniform. Despite this dispersion, the model managed to maintain its error percentage below \(50\mathrm{\%}\). Among the three models, \({\text{NBSA}}\) showed the most notable and varied discrepancies.

Fig. 10
figure 10

The half violin of errors among the developed models

Interestingly, a single outlier data point was identified during the assessment stage, accounting for over \(30\%\) of the dataset, an uncommon observation in statistical analysis. In terms of dispersion, the \({\text{NB}}\) model stood out, displaying a more comprehensive range compared to the other two models, with fewer instances of occurrences near zero. Overall, all three models demonstrated satisfactory performance; however, \({\text{NBDA}}\) exhibited superior outcomes.

Figure 11 illustrates the correlation among the variables NB, NBSA, and NBDA. The correlation coefficient serves as a metric indicating the extent of correlation between two variables, with values ranging from – 1 to \(1\). A correlation coefficient of \(1\) signifies a perfect positive correlation, indicating complete synchronization between the two variables. Conversely, a correlation coefficient of – 1 denotes a perfect negative correlation, representative of complete inverse synchronization. A correlation coefficient of \(0\) suggests no correlation between the variables. In the presented graph, NBDA obtained the most difference in standard deviation compared to other models, and NB had the lowest correlation coefficient. In general, NBSA achieved the most suitable values in standard deviation and correlation coefficient compared to other developed models.

Fig. 11
figure 11

Taylor diagram for the developed models

Limitation of study

The study’s exploration of diversified geological contexts may not fully capture all variations, limiting the generalizability of predictive models across diverse scenarios. While acknowledging algorithm sensitivity, the study lacks a comprehensive exploration of the challenges associated with fine-tuning Specific Naive Bayes models, leaving uncertainties in achieving optimal model robustness.

Real-time implementation suggestions lack a detailed examination of challenges, such as computational efficiency and responsiveness, hindering a thorough understanding of practical obstacles in real-world scenarios. Although proposing long-term performance assessment, the study lacks a clear framework for tracking models over time, leading to potential ambiguity in the assessment process. The encouragement to incorporate external factors for model realism lacks specific guidance on integration methods, limiting the depth of understanding of these factors’ impact on UCS prediction.

Future study

In order to enhance the applicability and generalizability of predictive models, future research endeavors within geomechanics should focus on diversifying the geological contexts considered. This involves incorporating datasets from various regions and geological formations to broaden the scope of understanding. A more comprehensive exploration of geological scenarios will contribute significantly to refining predictive models and their adaptability across diverse contexts.

Moreover, there is a need for further research dedicated to the optimization of meta-heuristic algorithms or the refinement of existing ones, particularly in the context of NB models. Comparative studies that explore multiple optimization algorithms can offer valuable insights, aiding in the identification of the most effective combinations that enhance the robustness and performance of these models. As the current study concentrates on the development and validation of predictive models, future research should extend its focus to the real-time implementation of these models in practical scenarios. This involves considerations of computational efficiency and responsiveness, particularly for on-the-fly predictions of UCS.

To ensure the reliability and longevity of proposed models, longitudinal studies tracking their performance under varying environmental conditions over extended periods are recommended. This approach will provide valuable insights into the sustained accuracy and adaptability of the models over time. Lastly, the incorporation of external factors to enhance model realism, such as environmental changes, weathering effects, or the presence of contaminants, should be considered. Integrating these factors into the predictive framework will contribute to a more holistic understanding of UCS prediction in geomechanical applications.


This study introduces an innovative approach to accurately predict Unconfined Compressive Strength \(({\text{UCS}})\) values. The methodology leverages the power of Machine Learning \(({\text{ML}})\) techniques, specifically focusing on Naive Bayes \(({\text{NB}})\) algorithms. This approach provides a cost-effective alternative while significantly reducing the time needed for \({\text{UCS}}\) predictions. The core of the \({\text{UCS}}\) prediction framework rests on a novel \({\text{ML}}\) model based on the \({\text{NB}}\) algorithm. This study illustrates how this model has the potential to revolutionize \({\text{UCS}}\) prediction. To enhance accuracy and minimize errors, two meta-heuristic algorithms, \({\text{DAOA}}\) and \({\text{SAO}}\), were applied. This effort resulted in the creation of three distinct models: \({\text{NBDA}}\), \({\text{NBSA}}\), and an individual \({\text{NB}}\) model. Laboratory samples from established articles were employed in the train, validation, and test stages to validate these models. An array of evaluation metrics, including R2, \({\text{RMSE}}\), \({\text{MAE}}\), \({\text{NSE}}\), and \({\text{U}}95\), were used to compare model performance. The study’s results demonstrated that the \({\text{NBDA}}\) models consistently achieved the highest R2 values, showcasing superior predictive capability.

  • ➢ In comparison, the standalone NB model exhibited the lowest R2 value, with a marginal difference of 1.2%. Throughout all phases, NBDA consistently outperformed other methods in precisely forecasting UCS, as evidenced by significantly lower error rates, a remarkably 57% lower RMSE, and a 50% lower MAE compared to NB.

While \({\text{NB}}\) and \({\text{NBSA}}\) demonstrated lower performance when measured against all statistical indices, their results were still deemed acceptable based on criteria assessments. In contrast, the \({\text{NBDA}}\) model consistently exhibited the most favorable performance during the training, validation, and testing phases. In conclusion, \({\text{ML}}\) models offer a reliable alternative to experimental techniques for predicting \({\text{UCS}}\), resulting in substantial time and effort savings. This study underscored the effectiveness of combining the \({\text{DAOA}}\) optimizer, yielding a synergistic partnership that yields accurate \({\text{UCS}}\) predictions. The study’s emphasis on the real-world applicability of ML models, particularly the NBDA model optimized with DAOA, underscores the potential for these models to serve as reliable alternatives for predicting UCS. The observed substantial reductions in time and effort contribute to the method's practical relevance and applicability in geomechanical applications, highlighting its potential for widespread adoption in various real-world scenarios.

Availability of data and materials

Data can be shared upon request.


  1. Bera A, Ghosh A (2011) Regression model for prediction of optimum moisture content and maximum dry unit weight of fine grained soil. Int J Geotech Eng 5(3):297–305

    Article  Google Scholar 

  2. Meyerhof GG (1976) Application of a continuum numerical model for pile driving analysis and comparison with a real case. J Geotech Eng Div 102(3):197–228

    Article  Google Scholar 

  3. Farahzadi L, Kioumarsi M. Application of machine learning initiatives and intelligent perspectives for CO2 emissions reduction in construction. Journal of Cleaner Production. 2023;384:135504.

  4. Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science (1979) 349(6245):255–260

    MathSciNet  Google Scholar 

  5. Livingston F. Implementation of Breiman’s random forest machine learning algorithm. ECE591Q Mach Learn J Paper. 2005: 1–13

  6. Hossein Alavi A, Hossein Gandomi A, Mollahassani A, Akbar Heshmati A, Rashed A (2010) Modeling of maximum dry density and optimum moisture content of stabilized soil using artificial neural networks. J Plant Nutr Soil Sci 173(3):368–379

    Article  Google Scholar 

  7. Park S-S (2011) Unconfined compressive strength and ductility of fiber-reinforced cemented sand. Constr Build Mater 25(2):1134–1138

    Article  Google Scholar 

  8. Ruffolo RM, Shakoor A (2009) Variability of unconfined compressive strength in relation to number of test samples. Eng Geol 108(1–2):16–23

    Article  Google Scholar 

  9. Das SK, Samui P, Sabat AK (2011) Application of artificial intelligence to maximum dry density and unconfined compressive strength of cement stabilized soil. Geotech Geological Eng 29:329–342

    Article  Google Scholar 

  10. Sathyapriya S, Arumairaj PD, Ranjini D (2017) Prediction of unconfined compressive strength of a stabilised expansive clay soil using ANN and regression analysis (SPSS). Asian J Res Soc Sci Humanit 7(2):109–123

    Google Scholar 

  11. Behnam Sedaghat G, Tejani G, and Kumar S. Predict the maximum dry density of soil based on individual and hybrid methods of machine learning, Advances in Engineering and Intelligence Systems. 2023; 002(3).

  12. Majdi A, Rezaei M (2013) Prediction of unconfined compressive strength of rock surrounding a roadway using artificial neural network. Neural Comput Appl 23:381–389

    Article  Google Scholar 

  13. Ghazavi M, Roustaie M (2010) The influence of freeze–thaw cycles on the unconfined compressive strength of fiber-reinforced clay. Cold Reg Sci Technol 61(2–3):125–131

    Article  Google Scholar 

  14. Narendra BS, Sivapullaiah PV, Suresh S, Omkar SN (2006) Prediction of unconfined compressive strength of soft grounds using computational intelligence techniques: a comparative study. Comput Geotech 33(3):196–208

    Article  Google Scholar 

  15. Onyelowe KC, Ebid AM, Onyia ME, Amanamba EC (2022) Estimating the swelling potential of non-carbon–based binder (NCBB)-treated clayey soil for sustainable green subgrade using AI (GP, ANN and EPR) techniques. Int J Low-Carbon Technol 17:807–815

    Article  Google Scholar 

  16. Naeini SA, Naderinia B, Izadi E (2012) Unconfined compressive strength of clayey soils stabilized with waterborne polymer. KSCE J Civil Eng 16:943–949

    Article  Google Scholar 

  17. Nazir R, Momeni E, Armaghani DJ, Amin MFM (2013) Correlation between unconfined compressive strength and indirect tensile strength of limestone rock samples. Electron J Geotech Eng 18(1):1737–1746

    Google Scholar 

  18. Onyelowe KC, Ebid AM, Aneke FI, Nwobia LI (2023) Different AI predictive models for pavement subgrade stiffness and resilient deformation of geopolymer cement-treated lateritic soil with ordinary cement addition. Int J Pave Res Technol 16(5):1113–1134

    Article  Google Scholar 

  19. Das SK. 10 - Artificial Neural Networks in Geotechnical Engineering: Modeling and Application Issues, X.-S. Yang, A. H. Gandomi, S. Talatahari, and A. H. B. T.-M. in W. Alavi Geotechnical and Transport Engineering, Eds., Oxford: Elsevier, 2013, pp. 231–270.

  20. Onyelowe KC, Ebid AM, Nwobia L (2021) Evolutionary prediction of soil loss from observed rainstorm parameters in an erosion watershed using genetic programming. Appl Environ Soil Sci 2021:1–15

    Article  Google Scholar 

  21. Onyelowe KC, Gnananandarao T, Ebid AM (2022) Estimation of the erodibility of treated unsaturated lateritic soil using support vector machine-polynomial and-radial basis function and random forest regression techniques. Clean Mater 3:100039

    Article  Google Scholar 

  22. Sahoo K, Sarkar P, and Robin Davis P. Artificial neural networks for prediction of compressive strength of recycled aggregate concrete. 2016.

  23. Onyelowe KC, Ebid AM, Nwobia L, Dao-Phuc L (2021) Prediction and performance analysis of compression index of multiple-binder-treated soil by genetic programming approach. Nanotechnol Environ Eng 6(2):28.

    Article  Google Scholar 

  24. Onyelowe KC, Ebid AM, Nwobia LI (2021) Predictive models of volumetric stability (durability) and erodibility of lateritic soil treated with different nanotextured bio-ashes with application of loss of strength on immersion; GP ANN and EPR performance study. Clean Mater 1:100006

    Article  Google Scholar 

  25. Ceryan N, Okkan U, Kesimal A (2013) Prediction of unconfined compressive strength of carbonate rocks using artificial neural networks. Environ Earth Sci 68:807–819

    Article  Google Scholar 

  26. Hoque MdI, Hasan M, Islam MS, Houda M, Abdallah M, Sobuz MdHR (2023) Machine learning methods to predict and analyse unconfined compressive strength of stabilised soft soil with polypropylene columns. Cogent Eng 10(1):2220492.

    Article  Google Scholar 

  27. Onyelowe KC, Aneke FI, Onyia ME, Ebid AM, Usungedo T (2023) AI (ANN, GP, and EPR)-based predictive models of bulk density, linear-volumetric shrinkage & desiccation cracking of HSDA-treated black cotton soil for sustainable subgrade. Geomech Geoeng 18(6):497–516.

    Article  Google Scholar 

  28. Ebid AM, Nwobia LI, Onyelowe KC, Aneke FI (2021) Predicting nanobinder-improved unsaturated soil consistency limits using genetic programming and artificial neural networks. Appl Comput Intell Soft Comput 2021:1–13

    Google Scholar 

  29. Onyelowe KC, Ebid AM, Onyia ME, Nwobia LI (2021) Predicting nanocomposite binder improved unsaturated soil UCS using genetic programming. Nanotechnol Environ Eng 6(2):39.

    Article  Google Scholar 

  30. Suman S, Mahamaya M, Das SK (2016) Prediction of maximum dry density and unconfined compressive strength of cement stabilised soil using artificial intelligence techniques. Int J Geosynth Ground Eng 2:1–11

    Article  Google Scholar 

  31. Alavi AH, Gandomi AH, Gandomi M, Sadat Hosseini SS (2009) Prediction of maximum dry density and optimum moisture content of stabilised soil using RBF neural networks. IES J Part A Civil Struct Eng 2(2):98–106

    Article  Google Scholar 

  32. Alavi AH, Gandomi A. H, and Mollahasani A. A genetic programming-based approach for the performance characteristics assessment of stabilized soil. Variants Evol Algorith Real-World Appl. 2012: 343–376

  33. Das I, Stein A, Kerle N, Dadhwal VK (2012) Landslide susceptibility mapping along road corridors in the Indian Himalayas using Bayesian logistic regression models. Geomorphology 179:116–125

    Article  Google Scholar 

  34. Salawudeen AT, Mu’azu MB, Yusuf A, Adedokun AE (2021) A novel smell agent optimization (SAO): an extensive CEC study and engineering application. Knowl Based Syst 232:107486

    Article  Google Scholar 

  35. Salawudeen AT, Mu’azu MB, Sha’aban YA, and Adedokun EA. On the development of a novel smell agent optimization (SAO) for optimization problems. In. 2nd International Conference on Information and Communication Technology and its Applications (ICTA 2018). Minna. 2018

  36. Meadows OA, Mu’Azu MB, Salawudeen AT (2002) A smell agent optimization approach to capacitated vehicle routing problem for solid waste collection. In. 2022 IEEE Nigeria 4th International Conference on Disruptive Technologies for Sustainable Development (NIGERCON), IEEE, New York p 1–5

  37. Vishnoi S, Nikolovski S, Raju M, Kirar MK, Rana AS, Kumar P (2023) Frequency stabilization in an interconnected micro-grid using smell agent optimization algorithm-tuned classical controllers considering electric vehicles and wind turbines. Energies (Basel) 16(6):2913

    Article  Google Scholar 

  38. Bankole AT, Moses SO, Ibitoye TY (2002) Smell agent optimization based supervisory model predictive control for energy efficiency improvement of a cold storage system. In. 2022 IEEE Nigeria 4th International Conference on Disruptive Technologies for Sustainable Development (NIGERCON), IEEE, New York p 1–5

  39. Wang S, Hussien AG, Kumar S, AlShourbaji I, Hashim FA (2023) A modified smell agent optimization for global optimization and industrial engineering design problems. J Comput Des Eng 10(6):2147-76.

  40. Salawudeen AT, Mu’azu MB, Yusuf A, and Adedokun EA. From smell phenomenon to smell agent optimization (SAO): a feasibility study. Proceedings of ICGET. 2018

  41. Sharma LK, Singh TN (2018) Regression-based models for the prediction of unconfined compressive strength of artificially structured soil. Eng Comput 34(1):175–186.

    Article  Google Scholar 

Download references


I would like to take this opportunity to acknowledge that there are no individuals or organizations that require acknowledgment for their contributions to this work.


This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations



The first draft of the manuscript was written by W W and the author commented on previous versions of the manuscript. The author read and approved the final manuscript.

Corresponding author

Correspondence to Weiqing Wan.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Table 5 Discussion of the test dataset

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wan, W. Enhancing unconfined compressive strength of stabilized soil with lime and cement prediction through a robust hybrid machine learning approach utilizing Naive Bayes Algorithm. J. Eng. Appl. Sci. 71, 84 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: