 Research
 Open access
 Published:
Enhancing unconfined compressive strength of stabilized soil with lime and cement prediction through a robust hybrid machine learning approach utilizing Naive Bayes Algorithm
Journal of Engineering and Applied Science volumeÂ 71, ArticleÂ number:Â 84 (2024)
Abstract
The unconfined compressive strength (UCS) of stabilized soil with lime and cement is a crucial mechanical factor in developing accurate geomechanical models. In the past, determining UCS required laborious laboratory testing of core samples or complex welllog analysis, both of which consumed many resources. This study introduces a novel method for realtime UCS prediction while acknowledging the need for efficiency. This method makes use of Specific Naive Bayes (NB) predictive models that are strengthened by the smell agent optimization (SAO) and the Dynamic Arithmetic Optimization Algorithm (DAOA), two reliable metaheuristic algorithms. Combining these algorithms improves prediction precision while streamlining the process. By examining UCS samples from various soil types obtained from earlier stabilization tests, these models are validated. This study identifies three different models: NBDA, NBSA, and a single NB. The individual insights each model provides work in concert to increase the overall UCS prediction accuracy. This approach represents a significant advancement in UCS prediction methodologies, revealing a quick and effective method with wideranging implications for various geomechanical applications. Metaheuristic algorithms combined with particular NB models produce promising results, opening up new possibilities for realtime UCS estimation across various geological scenarios. Especially noteworthy are the NBDA modelâ€™s impressive performance metrics. The entire dataset achieves an R^{2} value of 0.992 during testing. The RMSE of 108.69 for the NBDA model during the training phase also shows that it has the best performance overall. It consistently exhibits commendable generalization and predictive abilities that outperform those of the developed NB and NBSA models, highlighting its usefulness and effectiveness in practical applications.
Introduction
Background
Geotechnical engineering \(({\text{GE}})\) is no exception to the recent proliferation of machine learning \(({\text{ML}})\) techniques that have permeated many industries [1]. This increase in interest covers a broad range of tasks, from landslide detection to material property prediction. Developing predictive models specifically designed to tackle \({\text{GE}}\) challenges denotes the fusion of technological sophistication and exceptional efficacy [2, 3]. The imperative significance of using these models becomes more evident as the demand for complex predictive models increases across numerous geotechnical domains. This implementation journey is further supported by ongoing developmental steps that improve their ability to deal with the complex range of geotechnical intricacies [4, 5].
At the heart of engineering endeavors lies the compaction of loose soils, a pivotal process that elevates the weight per unit area of structures like earth dams and highway embankments. This compaction venture transcends mere strength enhancement; it fortifies soil endurance, augments loadbearing capacity, and stabilizes embankment slopes against settlement issues [6]. Beyond bolstering strength, compaction offers many benefits, including porosity, volume, permeability, density, and waterproofing enhancements. These improvements collectively enhance soil quality, amplifying its ability to support structural loads. The linchpin of geomechanical models, the unconfined compressive strength \(({\text{UCS}})\), stands as a pivotal metric in the realm of mechanical rock behavior [7, 8]. \({\text{UCS}}\) signifies the maximum compressive stress a rock can withstand under controlled, uniaxial loading before failure occurs. Rock mechanics, fusing theoretical concepts with realworld applications, illuminates how rocks respond to diverse stress scenarios [9,10,11]. The ramifications of rock failure extend to issues like solids production and wellbore instability, particularly pertinent in petroleum contexts. Access to \({\text{UCS}}\) data from subsurface formations holds critical importance in drilling operations. This data reservoir steers bit hydraulics dynamics determines optimal mud weights for drilling, regulates drilling costs, and optimizes drilling performance [12].
Laboratory tests conducted on extracted core samples, providing insights into actual stress state conditions and mechanical attributes, constitute the cornerstone of directly assessing mechanical rock properties [13]. These tests encompass an array of evaluations, spanning uniaxial and triaxial compressive strength tests, scratch tests, Schmidt hammer tests, and point load tests. These methods represent the gold standard for property determination [14]. However, the seamless profiling of \({\text{UCS}}\) along wellbores is impeded by challenges associated with acquiring representative core samples, such as high costs and timeintensive procedures. To circumvent this limitation, indirect methodologies are devised, bridging gaps by correlating rock characteristics with petrophysical welllog data [15].
The significance of \({\text{UCS}}\) extends beyond rocks to encompass various materials like soils and industrial byproducts, exerting a profound impact on foundation design, slope stability analysis, and structural resilience. \({\text{UCS}}\) emerges as a pivotal factor in the realm of stabilized materials, molding the appearance and functionality of pavements [16]. However, the computation of material \({\text{UCS}}\) involves navigating through an array of variables, physicochemical properties, types of cementitious admixtures, and curing time. These variables demand precisely orchestrated laboratory investigations and specialized equipment [17]. The credibility of these tests hinges on the pursuit of precision, reflected in specimen dimensions. The quest for alternative methodologies in unveiling the \({\text{UCS}}\) of stabilized materials, such as pond ashes, is spurred by the formidable nature of these tests, coupled with resourceintensive requirements and the challenge of locating representative samples [18].
A new era of soft computing methodologies has arisen in response to the evolving landscape of the past few decades, with artificial neural networks \(({\text{ANNs}})\) reigning supreme in the realm of \({\text{UCS}}\) prediction for soil and rock materials [19,20,21]. These studies underscore the potency of neural network paradigms. The backpropagation neural networkbased modeling algorithm is central to this paradigm, orchestrating a symphony of learning parameters, including learning rate, momentum, the optimal configuration of hidden layer nodes, and layer depth [22]. Prudent management of training iterations ensures robust predictive performance, guarding against the perils of overtraining [23, 24].
Literature review
Majdi and Rezaei [12] utilized artificial neural network \(({\text{ANN}})\) and multivariable regression analysis \(({\text{MVRA}})\) models to predict \({\text{UCS}}\), employing a database comprising 93 different rock samples. Comparison metrics, including R^{2}, variance accounted for \(({\text{VAF}})\), mean absolute error \(({\text{MAE}})\), and mean relative error \(({\text{MRE}})\), were used to evaluate model performance. The results clearly indicated the superior performance of the \({\text{ANN}}\) model compared to the \({\text{MVRA}}\) model in predicting \({\text{UCS}}\). Ceryan et al. [25] focused on employing the LevenbergMarquardt algorithmbased artificial neural network \(({\text{LM}}{\text{ANN}})\) to predict \({\text{UCS}}\). The developed LMANN model demonstrated superior performance compared to the multiple linear regression (REG) model, with an R^{2} of \(0.884\) and an \({\text{RMSE}}\) of 1.11 kN/m^{2}. Hoque et al. [26] introduced the Random Forest \(({\text{RF}})\) method for predicting the \({\text{UCS}}\) of polypropylenestabilized soft soil. The model's accuracy was assessed using various metrics, and their RF model achieved an R^{2} of \(0.8942\) and an \({\text{RMSE}}\) of \(0.250\)Â kN/m^{2}. The results indicate that the proposed RF and sequential models effectively predict the UCS of polypropylenestabilized soft soil. Notably, this approach proves to be more convenient and less timeconsuming than traditional laborintensive laboratory procedures.
Onyyelowe et al. [27] employed \({\text{AI}}\) based biinput predictive models to forecast the properties of black cotton soil \((BCS)\) treated with wastebased high silica content densified ash (HSDA). The A7 group BCS was treated with varying percentages of HSDA. Desiccation tests monitored changes in weight, diameter, height, and crack development over 30Â days. XRF and SEM analyses revealed enhanced pozzolanic strength and increased ettringite and gel formation in treated samples. Intelligent models (ANN, GP, EPR) predicted bulk density, linear shrinkage, crack width, and volumetric shrinkage. \({\text{EPR}}\) demonstrated superior accuracy in predicting bulk density and crack width (98.2% and 92.7%), while ANN excelled in predicting linear and volumetric shrinkages (98.8% and 99.3%). Ebid et al. [28] addressed challenges in unsaturated soils used for construction, particularly their unfavorable reactions to seasonal swell and shrink cycles. Additive stabilization processes were employed to enhance soil volume change characteristics. Supplementary binders from solid waste powder materials were introduced to mitigate environmental hazards associated with conventional cement use. Despite equipment limitations, intelligent prediction techniques, including genetic programming \(({\text{GP}})\) and \({\text{ANN}}\), were used to forecast consistency limits. Hybrid cement (HC), a blend of nanostructured quarry fines and hydratedlimeactivated nanostructured rice husk ash, was employed. Experimental data on varying HANRHA dosages formed the prediction database. Stabilization exercises showed substantial soil property improvements, and ANN outperformed GP in accuracy assessments. Nanostructuring, the binder material, contributed to successful soil improvement and efficient model predictions. Onyyelowe et al. [29] emphasized the critical role of soil stability and durability in foundation constructions, necessitating soil stabilization to meet design requirements. The research focused on predicting the UCS of unsaturated lateritic soil treated with HC using GP. HC, a blend of nanotextured quarry fines and hydrated limeactivated nanotextured rice husk ash, served as the binder material. GP was employed to forecast UCS with varying complexities. Results revealed the superiority of the fourlevel complexity GP model, demonstrating robustness and flexibility in predicting engineering problems with high accuracy (SSE 2.4%, R^{2}Â =Â 0.991).
Objective
The core focus of this investigation revolves around the prognostication of vital soil attributes, with particular emphasis on \({\text{UCS}}\) predictions, utilizing an \({\text{ML}}\) methodology. In light of the challenges linked to acquiring empirical data, this study centers on harnessing the power of the Naive Bayes \(({\text{NB}})\) algorithm. However, to elevate the performance of the \({\text{NB}}\) model, meticulous parameter tuning is indispensable. To navigate this hurdle, a fusion of two algorithms, namely, the smell agent optimization \(({\text{SAO}})\) and the Dynamic Arithmetic Optimization Algorithm \(({\text{DAOA}}),\) is strategically employed. The study sheds light on the profound and positive influence that refining the design and construction of \({\text{UCS}}\)related structures can exert on the infrastructure sector. By amassing an extensive dataset of \({\text{UCS}}\) values, the study facilitates comprehensive comparative analyses aimed at gauging the effectiveness of the proposed framework. The insights from the studyâ€™s outcomes offer valuable guidance for the accurate anticipation of \({\text{UCS}}\) within civil engineering ventures. The study's approach hinges on integrating the \({\text{NB}}\) algorithm into the \({\text{ML}}\) strategy, thereby forming the bedrock for \({\text{UCS}}\) prediction. The intricacies associated with procuring empirical \({\text{UCS}}\) data are mitigated through a process that optimizes the parameters of the \({\text{NB}}\) model, expertly orchestrated by the symbiosis of the \({\text{DAOA}}\) and \({\text{SAO}}\) algorithms. In essence, this research furnishes not only pragmatic directives but also indispensable knowledge for tackling UCS prediction, an indispensable facet entwined with soil behavior in civil engineering endeavors.
Methods
Data gathering
A comprehensive and exhaustive approach has been taken to evaluate the \({\text{UCS}}\) in soil. This effort involves considering various factors and has been pursued with great dedication. In this pursuit, the data is meticulously divided into three distinct sets: training \((70\%)\), validation \((15\%)\), and testing \((15\%)\). This distribution has been empirically validated and consistently proven to enhance the performance of predictive models. It is important to note that this method is finely tuned to incorporate six essential variables that cover a wide series of soil properties and the relative composition of soil components. The utilized dataset contains 187 samples. These components include cement, lime, liquid limit \(({\text{LL}})\), plastic limit \(({\text{PL}})\), and Plasticity Index \(({\text{PI}})\). Subsequently, these samples undergo rigorous laboratory tests succinctly described in TableÂ 1 [9, 30,31,32].

Soil: the percentage composition of soil in the sample being tested.

Cement: this is an important parameter in soil stabilization, where cement is often added to improve the strength and durability of the soil.

Lime: lime is another common additive used in soil stabilization to enhance properties such as workability and strength.

LL: the moisture level at which soil under typical circumstances changes from a plastic to a liquid form is known as the liquid limit. It is an important property for understanding the behavior of soil.

PL: the PL is the moisture content at which the soil starts behaving in a plastic manner and can be molded without breaking. It is another crucial parameter in soil mechanics.

PI: the PI is the difference between the LL and the PL. It provides a measure of the plasticity of the soil, indicating its ability to undergo deformation without cracking.

UCS: this is a measure of the ability of a soil sample to withstand axial loads without confinement. It is a critical parameter in geotechnical engineering, reflecting the soil's strength under unconfined conditions.
In order to mitigate the risks of overfitting or underfitting, a randomized permutation of the data, referred to as randperming, was performed. Furthermore, normalization techniques were applied to mitigate the impact of data outliers. Subsequently, to ensure robust model performance and assess generalizability, kfold crossvalidation was employed. Additionally, the Table 5 in Appendix has a discussion of the test dataset.
FigureÂ 1 illustrates a tabular presentation where each cell contains a numeric value that signifies the correlation coefficient between the variables stated in the corresponding row and column. The figure depicts a correlation matrix or coefficients representing the relationships among various soil propertyrelated variables. The variables identified include soil \((\%)\), cement \((\%)\), lime \((\mathrm{\%})\), \(\mathrm{LL }(\mathrm{\%}),\) \(\mathrm{PL }(\mathrm{\%})\), \(\mathrm{PI }(\mathrm{\%})\), and UCS \(({\text{kN}}/{{\text{m}}}^{2})\). These coefficients measure the strength and direction of a linear association between pairs of variables. The values fall within the range of \( 1\) to \(1\):

A positive number indicates a positive correlation, meaning that there is a tendency for both variables to grow when one increases.

On the other hand, a negative number indicates a negative correlation, meaning that one variable tends to decrease as the other increases.

A number that gets close to zero indicates that there is little to no linear connection between the variables.
In addition, Fig.Â 2 indicates the histogram distribution for the input and output variables.
Naive Bayes (NB)
Utilizing the Bayes theorem and supposing robust feature independence, the NB model is probabilistic in nature. Its simple design, which eliminates the need for complex iterative parameter estimate methods, is its main advantage. Das et al. further point out that the NB is resistant to noise and unimportant characteristics [33]. The following equation is the basis of the NB:
where \(P({y}_{i})\) is the prior probability of \({y}_{i}\), \(P(\frac{{x}_{i}}{{y}_{i}})\) is the posterior prospect, and It is computed using:
Where \(\mu\) is the mean and \(\sigma\) is the standard deviation of \({x}_{i}\). FigureÂ 3 shows the structure of NB.
Dynamic Arithmetic Optimization Algorithm (DAOA)
The core arithmetic optimization algorithm has been upgraded with a novel accelerator function that integrates two dynamic attributes intended to amplify its efficacy. The dynamic variant adjusts the search phase and potential solutions within the optimization procedure, modifying the balance between exploration and exploitation. A noteworthy trait of \({\text{DAOA}}\) is its freedom from the need for initial parameter refinement, distinguishing it from contemporary metaheuristics.
DAOAâ€™s dynamic accelerated function
In a dynamic environment, the dynamic accelerated function \(({\text{DAF}})\) has a significant impact on the search phase of the arithmetic optimization algorithm. The initial \({\text{Min}}\) and \({\text{Max}}\) values of the quicker function must be modified to account for the \({\text{AOA}}\). However, since a new downward function can take the place of the \({\text{DAF}}\), it would be preferable to have an algorithm that does not depend on changeable internal parameters. This is the presentation of the modification factor of the optimization algorithm:
Here, \({\text{It}}\) represents the current iteration number, \({{\text{It}}}_{{\text{Max}}}\) denotes the maximum number of iterations, and \(a\) is a constant value. This function is reduced with each iteration of the algorithm.
Dynamic candidate solution for DAOA
The dynamic properties of potential \(DAOA\) solutions are presented in this section. The exploitation and exploration phases are crucial for metaheuristic algorithms, and maintaining a proper balance between them is essential for the algorithm's success. By dynamically updating the positions of each solution based on the best solution so far found during the optimization process, the proposed dynamic version of the algorithm seeks to improve the exploitation and exploration phases. Equations (4) and (5) each have the dynamic candidate solution \(({\text{DCS}})\) function added to them in the enhanced version.
To consider the effects of the reducing percentage in candidate solutions, the \({\text{DCS}}\) function is introduced. As shown below, its value decreases with each algorithm iteration.
Empirical evidence from many search agents and iterations shows that the inclusion of potential solutions in DAOA dramatically accelerates the convergence rate of AOA. The quality of the solutions that are obtained improves due to these improvements. The ability of metaheuristic algorithms to operate without the need for any parameters is frequently regarded as advantageous. By utilizing adaptive parameters, the algorithm reduces the number of parameters that need to be tuned to just two: maximum iteration and population size. Contrast this with competing algorithms, which call for parameter adjustments for various issues. One of the shortcomings of this method is its adaptive mechanism, which relies on the iteration counter instead of fitness improvement. Algorithm 1 displays the DAOA pseudocode, whereas Fig.Â 4 displays the DAOA flowchart.
Smell agent optimization (SAO)
Salawudeen et al. [34] introduced the \({\text{SAO}}\) algorithm as a modern optimization technique, focusing on the correlation between smellemitting objects and olfactory agents. In the sniffing mode [35], the \({\text{SAO}}\) algorithm comprises the agentâ€™s abilities to detect and locate smell molecules and to make decisions regarding the search for the source of these molecules [36]. The agent uses the SAO algorithm to track the scent molecules during the trailing manner after making judgments during the sniffing mode [37]. Moreover, the SAO algorithm includes a random mode for the agent to prevent becoming stuck in local optimal solutions [38, 39].
Sniffing mode
Smell molecules are initialized using Eq. (8).
The terms \(D\), \(N\), and \(m\) may be used interchangeably to denote the total number of variables, iterations, and decision variables. One may use Eq. (9) to determine the ideal location of the agent.
Represented in the context is a \(random\) value between \(0\) and \(1\), along with the \({\text{upper}}\) and \(lower\) bounds by \({r}_{0}\), \({\text{UB}}\), and \({\text{LB}}\), respectively. The speed at which scent molecules disperse from their source or origin is given by Eq. (10).
Equation (11) is used to update the velocity of the dispersed molecules in a Brownian form.
Assuming \(\Delta t\) is equal to \(1\).
Equation (12) calculates the velocity update in the smell molecules.
The updated velocity component, \(v\), is obtained using Eq. (13).
\(T, k\), and \(M\) represent the constants for temperature, molecule mass, and smell, individually.
Trailing mode
In this mode, the agentâ€™s travel toward scent sources is modeled by Eq. (14) which shows the agent's search behavior.
Accidental values between \(0\) and \(1\), signified by \({r}_{3}\) and \({r}_{2}\), are used to decrease the impact of olfaction capacity \(olf\) on \({x}_{{\text{agent}}}^{m}\) and the effect of \(olf\) on \({x}_{{\text{worst}}}^{m}\)
Random mode
Equation (15) is used to illustrate the smell agent's erratic motion.
\({\text{SL}}\) refers to stage length, while \({r}_{4}\) is a random value that decreases its impact [40]. FigureÂ 5 shows the flowchart of SAO.
Performance evaluation methods
In this study, various evaluation criteria for hybrid models are presented, emphasizing their correlation and error rates. The valuation metrics discussed in this discussion include mean absolute error \(({\text{MAE}})\), coefficient of correlation (R^{2}), NashSutcliffe efficiency \(({\text{NSE}})\), root mean square error \(({\text{RMSE}})\), and \(U95\). The mathematical equations for each of these metrics are listed below. An algorithm that excels in the train, validation, and test stages is one with an R^{2} value that is close to \(1\). Lower values of metrics like \({\text{RMSE}}\), \({\text{NSE}}\), and \({\text{MAE}}\), on the other hand, are preferred because they denote a lower level of model error.
Coefficient of correlation (R^{2})
Root mean square error \(({\text{RMSE}})\)
Mean absolute error \(({\text{MAE}})\)
Uncertainty 95% \(({\text{U}}95)\)
NashSutcliffe efficiency \(({\text{NSE}})\)
The variables \(N\), which stand for the number of samples, \({h}_{i}\), \(\overline{h }\), and \(\overline{z }\), which stand for the mean predicted and measured values, respectively, and \({z}_{i}\), which alternatively stands for the measured value.
Hyperparameter
Table 2 presents the results of the hyperparameter tuning for three different models: NB, NBDA, and NBSA. The alpha and binarized parameters are the hyperparameters that were tuned. The alpha parameter is a smoothing factor that is used to control the fuzziness of the predictions. The binarize parameter is a threshold that is used to convert the fuzzy predictions into binary predictions. The results show that the best hyperparameter settings for the NB model are alphaÂ =Â 1 and binarizeÂ =Â 0. For the NBDA model, the best hyperparameter settings are alphaÂ =Â 1 and binarizeÂ =Â 0.70274. For the NBSA model, the best hyperparameter settings are alphaÂ =Â 2 and binarizeÂ =Â 0.927.
Results and discussion
The three models used in the study to forecast \({\text{UCS}}\) were \({\text{NB}}\), \({\text{NBDA}}\), and \({\text{NBSA}},\) as indicated in TableÂ 3. Three distinct evaluation phases were used to assess their performance: train \((70\%)\), validation \((15\%)\), and test \((15\%)\). This careful distribution was done to guarantee fair assessments. The study produced more accurate and reliable UCS estimates using this configuration and cuttingedge methodology, which increased the accuracy of soil analyses and enabled better decisionmaking for various engineering and construction projects. These percentages were distributed purposefully based on empirical data consistently showing improved model performance in this framework. To further evaluate and compare the algorithms, the evaluation process used five statistical metrics: \({\text{NSE}}\), \({\text{MAE}}\), R^{2}, \({\text{RMSE}}\), and \({\text{U}}95\). A pivotal aspect of the assessment revolved around the R^{2} values of the models, indicating the degree to which the self \(\) determining variable elucidates variance in the reliant variable. Notably, the testing phase underscored the \({\text{NBDA}}\) modelâ€™s supremacy, boasting an exceptional predictive accuracy exemplified by an outstanding R^{2} value of \(0.992\), outshining its counterparts.
Conversely, the \({\text{NB}}\) model exhibited slightly lower R^{2} values during testing, measuring \(0.972\). Beyond R^{2} values, the study examined additional error indicators, notably the \({\text{RMSE}}\) spanning from \(108.69\) to \(237.40\). Worth noting is the observation that during testing, the \({\text{NB}}\) model displayed the highest \({\text{RMSE}}\), while the \({\text{NBDA}}\) model showcased the lowest during training. Likewise, the \({\text{U}}95\) metric indicated the \({\text{NB}}\) modelâ€™s peak value of \(542.44\) during validation, whereas the \({\text{NBDA}}\) model achieved the lowest value of \(300.52\) during training. In terms of \({\text{MAE}}\), the \({\text{NB}}\) model exhibited the highest at \(167.89\), while the \({\text{NBDA}}\) model emerged as the frontrunner, presenting the most favorable \({\text{MAE}}\) values. As for \({\text{NSE}}\), the \({\text{NBDA}}\) model demonstrated the highest and most favorable value of \(0.991\) during both the train and test stages. Despite the \({\text{NB}}\) modelâ€™s promising metrics in certain aspects, the comprehensive findings incontrovertibly established the \({\text{NBDA}}\) modelâ€™s superiority over \(NB\) and \({\text{NBSA}}\) across multiple phases. Ultimately, these outcomes strongly imply that the incorporation of \({\text{DAOA}}\) optimization significantly bolstered the \({\text{NB}}\) modelâ€™s \({\text{UCS}}\) prediction capabilities, positioning the \({\text{NBDA}}\) model as the optimal choice among the evaluated alternatives.
TableÂ 4 indicates the comparison between the current and published study. The present study was compared with Hoque et al. [26], Ceryan et al. [25], and Sharma and Singh [41]. The comparison between the models was evaluated with R^{2} and RMSE metrics.
FigureÂ 6 depicts a scatter plot that contrasts the predicted values of three hybrid models, NBDA, NBSA, and a single NB model with their corresponding actual values. The plotâ€™s central line and three linear regressions overlaid represent the training, validation, and testing phases. The findings highlight a notably strong positive correlation between the predicted and actual values across all three models, underscoring the considerable predictive accuracy of these models. Notably, the scatter plot accentuates the superiority of NBDA over the other models. This is evident in the exact clustering of data points around the linear regression lines, underscoring its exceptional accuracy. In contrast, both NB and NBSA exhibit greater dispersion of data points. The linear regression lines of NBSA and NB models exhibit similar slope and intercept values, indicating comparable predictive capabilities between these two models.
FigureÂ 7 illustrates a Column plot showcasing two hybrid models, \({\text{NBDA}}\) and \({\text{NBSA}}\), alongside an individual \({\text{NB}}\) model, all used for predicting \({\text{UCS}}\). The plot consists of three subplots representing R^{2}, \({\text{RMSE}}\), and \({\text{MAE}}\) scores for each model across their respective developmental phases. In the R^{2} subplot, it is evident that the \({\text{NBDA}}\) model achieves the highest and most favorable values during the train and test stages. Conversely, the \({\text{NB}}\) model records the lowest R^{2} values in the train and test stages. Shifting focus to the \({\text{RMSE}}\) subplot, it becomes apparent that the \({\text{NBDA}}\) model attains the lowest \({\text{RMSE}}\) value during the training phase, signifying its superior predictive accuracy in comparison to the other models.
On the contrary, the \({\text{NB}}\) model exhibits the highest \({\text{RMSE}}\) value within the same training phase. Examining the \({\text{MAE}}\) subplot, the \({\text{NBDA}}\) model consistently outperforms the other models across all three developmental stages, securing the best \({\text{MAE}}\) values. In summary, Fig.Â 7 provides a comprehensive visualization of the predictive performance of the hybrid models (\({\text{NBDA}}\) and \({\text{NBSA}}\)) and the individual \({\text{NB}}\) model in forecasting \({\text{UCS}}\). The plots reveal that the \({\text{NBDA}}\) model consistently demonstrates the highest predictive accuracy across various evaluation metrics and developmental phases.
FigureÂ 8 indicates the comparison between the predicted and measured values. In this particular format, optimal alignment is achieved when the lines pertaining to predicted values coincide with the measured values. Notably, the NB model exhibits a pronounced dissimilarity, thereby contributing to an escalation in the percentage of error. Among the hybrid models, the most substantial disparity is observed in the overall performance of the NBSA. In contrast, the NBDA, characterized by its superior accuracy, manifests the lowest percentage of error.
The distribution of error percentages among the presented models during the previously mentioned train, validation, and test phases is shown in a histogramdensity plot, as shown in Fig.Â 9. The \({\text{NBDA}}\) model, in particular, showed remarkable accuracy, maintaining the lowest error rates throughout all stages, ranging from â€“Â 10 to \(10\%\). In contrast, the error rate for the NB model varied between â€“Â 20% and \(20\%\) throughout the training phases. Although there was a discrepancy, all three models showed commendable accuracy in making predictions.
A halfviolin plot representing the error percentages connected to the models this studyâ€™s models is shown in Fig.Â 10. The NBDA model had an astounding 0% mean error rate throughout the training phase. Its error distribution was wellformed, with little dispersion and a normal curve. The distribution of errors was consistently favorable, maintaining values that remained below the \(20\%\) threshold. In contrast, the \({\text{NB}}\) model showed dispersion in both phases, featuring a normally distributed curve that was symmetrical and uniform. Despite this dispersion, the model managed to maintain its error percentage below \(50\mathrm{\%}\). Among the three models, \({\text{NBSA}}\) showed the most notable and varied discrepancies.
Interestingly, a single outlier data point was identified during the assessment stage, accounting for over \(30\%\) of the dataset, an uncommon observation in statistical analysis. In terms of dispersion, the \({\text{NB}}\) model stood out, displaying a more comprehensive range compared to the other two models, with fewer instances of occurrences near zero. Overall, all three models demonstrated satisfactory performance; however, \({\text{NBDA}}\) exhibited superior outcomes.
FigureÂ 11 illustrates the correlation among the variables NB, NBSA, and NBDA. The correlation coefficient serves as a metric indicating the extent of correlation between two variables, with values ranging from â€“Â 1 to \(1\). A correlation coefficient of \(1\) signifies a perfect positive correlation, indicating complete synchronization between the two variables. Conversely, a correlation coefficient of â€“Â 1 denotes a perfect negative correlation, representative of complete inverse synchronization. A correlation coefficient of \(0\) suggests no correlation between the variables. In the presented graph, NBDA obtained the most difference in standard deviation compared to other models, and NB had the lowest correlation coefficient. In general, NBSA achieved the most suitable values in standard deviation and correlation coefficient compared to other developed models.
Limitation of study
The studyâ€™s exploration of diversified geological contexts may not fully capture all variations, limiting the generalizability of predictive models across diverse scenarios. While acknowledging algorithm sensitivity, the study lacks a comprehensive exploration of the challenges associated with finetuning Specific Naive Bayes models, leaving uncertainties in achieving optimal model robustness.
Realtime implementation suggestions lack a detailed examination of challenges, such as computational efficiency and responsiveness, hindering a thorough understanding of practical obstacles in realworld scenarios. Although proposing longterm performance assessment, the study lacks a clear framework for tracking models over time, leading to potential ambiguity in the assessment process. The encouragement to incorporate external factors for model realism lacks specific guidance on integration methods, limiting the depth of understanding of these factorsâ€™ impact on UCS prediction.
Future study
In order to enhance the applicability and generalizability of predictive models, future research endeavors within geomechanics should focus on diversifying the geological contexts considered. This involves incorporating datasets from various regions and geological formations to broaden the scope of understanding. A more comprehensive exploration of geological scenarios will contribute significantly to refining predictive models and their adaptability across diverse contexts.
Moreover, there is a need for further research dedicated to the optimization of metaheuristic algorithms or the refinement of existing ones, particularly in the context of NB models. Comparative studies that explore multiple optimization algorithms can offer valuable insights, aiding in the identification of the most effective combinations that enhance the robustness and performance of these models. As the current study concentrates on the development and validation of predictive models, future research should extend its focus to the realtime implementation of these models in practical scenarios. This involves considerations of computational efficiency and responsiveness, particularly for onthefly predictions of UCS.
To ensure the reliability and longevity of proposed models, longitudinal studies tracking their performance under varying environmental conditions over extended periods are recommended. This approach will provide valuable insights into the sustained accuracy and adaptability of the models over time. Lastly, the incorporation of external factors to enhance model realism, such as environmental changes, weathering effects, or the presence of contaminants, should be considered. Integrating these factors into the predictive framework will contribute to a more holistic understanding of UCS prediction in geomechanical applications.
Conclusions
This study introduces an innovative approach to accurately predict Unconfined Compressive Strength \(({\text{UCS}})\) values. The methodology leverages the power of Machine Learning \(({\text{ML}})\) techniques, specifically focusing on Naive Bayes \(({\text{NB}})\) algorithms. This approach provides a costeffective alternative while significantly reducing the time needed for \({\text{UCS}}\) predictions. The core of the \({\text{UCS}}\) prediction framework rests on a novel \({\text{ML}}\) model based on the \({\text{NB}}\) algorithm. This study illustrates how this model has the potential to revolutionize \({\text{UCS}}\) prediction. To enhance accuracy and minimize errors, two metaheuristic algorithms, \({\text{DAOA}}\) and \({\text{SAO}}\), were applied. This effort resulted in the creation of three distinct models: \({\text{NBDA}}\), \({\text{NBSA}}\), and an individual \({\text{NB}}\) model. Laboratory samples from established articles were employed in the train, validation, and test stages to validate these models. An array of evaluation metrics, including R^{2}, \({\text{RMSE}}\), \({\text{MAE}}\), \({\text{NSE}}\), and \({\text{U}}95\), were used to compare model performance. The studyâ€™s results demonstrated that the \({\text{NBDA}}\) models consistently achieved the highest R^{2} values, showcasing superior predictive capability.

âž¢ In comparison, the standalone NBÂ model exhibited the lowest R^{2} value, with a marginal difference of 1.2%. Throughout all phases, NBDAÂ consistently outperformed other methods in precisely forecasting UCS, as evidenced by significantly lower error rates, a remarkably 57%Â lower RMSE,Â and a 50%Â lower MAEÂ compared to NB.
While \({\text{NB}}\) and \({\text{NBSA}}\) demonstrated lower performance when measured against all statistical indices, their results were still deemed acceptable based on criteria assessments. In contrast, the \({\text{NBDA}}\) model consistently exhibited the most favorable performance during the training, validation, and testing phases. In conclusion, \({\text{ML}}\) models offer a reliable alternative to experimental techniques for predicting \({\text{UCS}}\), resulting in substantial time and effort savings. This study underscored the effectiveness of combining the \({\text{DAOA}}\) optimizer, yielding a synergistic partnership that yields accurate \({\text{UCS}}\) predictions. The studyâ€™s emphasis on the realworld applicability of ML models, particularly the NBDA model optimized with DAOA, underscores the potential for these models to serve as reliable alternatives for predicting UCS. The observed substantial reductions in time and effort contribute to the method's practical relevance and applicability in geomechanical applications, highlighting its potential for widespread adoption in various realworld scenarios.
Availability of data and materials
Data can be shared upon request.
References
Bera A, Ghosh A (2011) Regression model for prediction of optimum moisture content and maximum dry unit weight of fine grained soil. Int J Geotech Eng 5(3):297â€“305
Meyerhof GG (1976) Application of a continuum numerical model for pile driving analysis and comparison with a real case. J Geotech Eng Div 102(3):197â€“228
Farahzadi L, Kioumarsi M. Application of machine learning initiatives and intelligent perspectives for CO2 emissions reduction in construction. Journal of Cleaner Production. 2023;384:135504.
Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science (1979) 349(6245):255â€“260
Livingston F. Implementation of Breimanâ€™s random forest machine learning algorithm. ECE591Q Mach Learn J Paper. 2005: 1â€“13
Hossein Alavi A, Hossein Gandomi A, Mollahassani A, Akbar Heshmati A, Rashed A (2010) Modeling of maximum dry density and optimum moisture content of stabilized soil using artificial neural networks. J Plant Nutr Soil Sci 173(3):368â€“379
Park SS (2011) Unconfined compressive strength and ductility of fiberreinforced cemented sand. Constr Build Mater 25(2):1134â€“1138
Ruffolo RM, Shakoor A (2009) Variability of unconfined compressive strength in relation to number of test samples. Eng Geol 108(1â€“2):16â€“23
Das SK, Samui P, Sabat AK (2011) Application of artificial intelligence to maximum dry density and unconfined compressive strength of cement stabilized soil. Geotech Geological Eng 29:329â€“342
Sathyapriya S, Arumairaj PD, Ranjini D (2017) Prediction of unconfined compressive strength of a stabilised expansive clay soil using ANN and regression analysis (SPSS). Asian J Res Soc Sci Humanit 7(2):109â€“123
Behnam Sedaghat G, Tejani G, and Kumar S. Predict the maximum dry density of soil based on individual and hybrid methods of machine learning, Advances in Engineering and Intelligence Systems. 2023; 002(3). https://doi.org/10.22034/aeis.2023.414188.1129
Majdi A, Rezaei M (2013) Prediction of unconfined compressive strength of rock surrounding a roadway using artificial neural network. Neural Comput Appl 23:381â€“389
Ghazavi M, Roustaie M (2010) The influence of freezeâ€“thaw cycles on the unconfined compressive strength of fiberreinforced clay. Cold Reg Sci Technol 61(2â€“3):125â€“131
Narendra BS, Sivapullaiah PV, Suresh S, Omkar SN (2006) Prediction of unconfined compressive strength of soft grounds using computational intelligence techniques: a comparative study. Comput Geotech 33(3):196â€“208
Onyelowe KC, Ebid AM, Onyia ME, Amanamba EC (2022) Estimating the swelling potential of noncarbonâ€“based binder (NCBB)treated clayey soil for sustainable green subgrade using AI (GP, ANN and EPR) techniques. Int J LowCarbon Technol 17:807â€“815
Naeini SA, Naderinia B, Izadi E (2012) Unconfined compressive strength of clayey soils stabilized with waterborne polymer. KSCE J Civil Eng 16:943â€“949
Nazir R, Momeni E, Armaghani DJ, Amin MFM (2013) Correlation between unconfined compressive strength and indirect tensile strength of limestone rock samples. Electron J Geotech Eng 18(1):1737â€“1746
Onyelowe KC, Ebid AM, Aneke FI, Nwobia LI (2023) Different AI predictive models for pavement subgrade stiffness and resilient deformation of geopolymer cementtreated lateritic soil with ordinary cement addition. Int J Pave Res Technol 16(5):1113â€“1134
Das SK. 10  Artificial Neural Networks in Geotechnical Engineering: Modeling and Application Issues, X.S. Yang, A. H. Gandomi, S. Talatahari, and A. H. B. T.M. in W. Alavi Geotechnical and Transport Engineering, Eds., Oxford: Elsevier, 2013, pp. 231â€“270. https://doi.org/10.1016/B9780123982964.000106
Onyelowe KC, Ebid AM, Nwobia L (2021) Evolutionary prediction of soil loss from observed rainstorm parameters in an erosion watershed using genetic programming. Appl Environ Soil Sci 2021:1â€“15
Onyelowe KC, Gnananandarao T, Ebid AM (2022) Estimation of the erodibility of treated unsaturated lateritic soil using support vector machinepolynomial andradial basis function and random forest regression techniques. Clean Mater 3:100039
Sahoo K, Sarkar P, and Robin Davis P. Artificial neural networks for prediction of compressive strength of recycled aggregate concrete. 2016.
Onyelowe KC, Ebid AM, Nwobia L, DaoPhuc L (2021) Prediction and performance analysis of compression index of multiplebindertreated soil by genetic programming approach. Nanotechnol Environ Eng 6(2):28. https://doi.org/10.1007/s41204021001232
Onyelowe KC, Ebid AM, Nwobia LI (2021) Predictive models of volumetric stability (durability) and erodibility of lateritic soil treated with different nanotextured bioashes with application of loss of strength on immersion; GP ANN and EPR performance study. Clean Mater 1:100006
Ceryan N, Okkan U, Kesimal A (2013) Prediction of unconfined compressive strength of carbonate rocks using artificial neural networks. Environ Earth Sci 68:807â€“819
Hoque MdI, Hasan M, Islam MS, Houda M, Abdallah M, Sobuz MdHR (2023) Machine learning methods to predict and analyse unconfined compressive strength of stabilised soft soil with polypropylene columns. Cogent Eng 10(1):2220492. https://doi.org/10.1080/23311916.2023.2220492
Onyelowe KC, Aneke FI, Onyia ME, Ebid AM, Usungedo T (2023) AI (ANN, GP, and EPR)based predictive models of bulk density, linearvolumetric shrinkage & desiccation cracking of HSDAtreated black cotton soil for sustainable subgrade. Geomech Geoeng 18(6):497â€“516. https://doi.org/10.1080/17486025.2022.2090621
Ebid AM, Nwobia LI, Onyelowe KC, Aneke FI (2021) Predicting nanobinderimproved unsaturated soil consistency limits using genetic programming and artificial neural networks. Appl Comput Intell Soft Comput 2021:1â€“13
Onyelowe KC, Ebid AM, Onyia ME, Nwobia LI (2021) Predicting nanocomposite binder improved unsaturated soil UCS using genetic programming. Nanotechnol Environ Eng 6(2):39. https://doi.org/10.1007/s4120402100134z
Suman S, Mahamaya M, Das SK (2016) Prediction of maximum dry density and unconfined compressive strength of cement stabilised soil using artificial intelligence techniques. Int J Geosynth Ground Eng 2:1â€“11
Alavi AH, Gandomi AH, Gandomi M, Sadat Hosseini SS (2009) Prediction of maximum dry density and optimum moisture content of stabilised soil using RBF neural networks. IES J Part A Civil Struct Eng 2(2):98â€“106
Alavi AH, Gandomi A. H, and Mollahasani A. A genetic programmingbased approach for the performance characteristics assessment of stabilized soil. Variants Evol Algorith RealWorld Appl. 2012: 343â€“376
Das I, Stein A, Kerle N, Dadhwal VK (2012) Landslide susceptibility mapping along road corridors in the Indian Himalayas using Bayesian logistic regression models. Geomorphology 179:116â€“125
Salawudeen AT, Muâ€™azu MB, Yusuf A, Adedokun AE (2021) A novel smell agent optimization (SAO): an extensive CEC study and engineering application. Knowl Based Syst 232:107486
Salawudeen AT, Muâ€™azu MB, Shaâ€™aban YA, and Adedokun EA. On the development of a novel smell agent optimization (SAO) for optimization problems. In. 2nd International Conference on Information and Communication Technology and its Applications (ICTA 2018). Minna. 2018
Meadows OA, Muâ€™Azu MB, Salawudeen AT (2002) A smell agent optimization approach to capacitated vehicle routing problem for solid waste collection. In. 2022 IEEE Nigeria 4th International Conference on Disruptive Technologies for Sustainable Development (NIGERCON), IEEE, New York p 1â€“5
Vishnoi S, Nikolovski S, Raju M, Kirar MK, Rana AS, Kumar P (2023) Frequency stabilization in an interconnected microgrid using smell agent optimization algorithmtuned classical controllers considering electric vehicles and wind turbines. Energies (Basel) 16(6):2913
Bankole AT, Moses SO, Ibitoye TY (2002) Smell agent optimization based supervisory model predictive control for energy efficiency improvement of a cold storage system. In. 2022 IEEE Nigeria 4th International Conference on Disruptive Technologies for Sustainable Development (NIGERCON), IEEE, New York p 1â€“5
Wang S, Hussien AG, Kumar S, AlShourbaji I, Hashim FA (2023) A modified smell agent optimization for global optimization and industrial engineering design problems.Â J Comput Des Eng 10(6):214776.
Salawudeen AT, Muâ€™azu MB, Yusuf A, and Adedokun EA. From smell phenomenon to smell agent optimization (SAO): a feasibility study. Proceedings of ICGET. 2018
Sharma LK, Singh TN (2018) Regressionbased models for the prediction of unconfined compressive strength of artificially structured soil. Eng Comput 34(1):175â€“186. https://doi.org/10.1007/s0036601705288
Acknowledgements
I would like to take this opportunity to acknowledge that there are no individuals or organizations that require acknowledgment for their contributions to this work.
Funding
This research received no specific grant from any funding agency in the public, commercial, or notforprofit sectors.
Author information
Authors and Affiliations
Contributions
The first draft of the manuscript was written by W W and the author commented on previous versions of the manuscript. The author read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wan, W. Enhancing unconfined compressive strength of stabilized soil with lime and cement prediction through a robust hybrid machine learning approach utilizing Naive Bayes Algorithm. J. Eng. Appl. Sci. 71, 84 (2024). https://doi.org/10.1186/s44147024004088
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s44147024004088