 Research
 Open access
 Published:
The unconfined compressive strength estimation of rocks using a novel hybridization technique based on the regulated Gaussian processor
Journal of Engineering and Applied Science volume 71, Article number: 101 (2024)
Abstract
The unconfined compressive strength (UCS) of rocks is a crucial factor in geotechnical engineering, assuming a central role in various civil engineering undertakings, including tunnel construction, mining operations, and the design of foundations. The precision in forecasting UCS holds paramount importance in upholding the security and steadfastness of these endeavors. This article introduces a fresh methodology for UCS prognostication by amalgamating Gaussian process regression (GPR) with two pioneering optimization techniques: sand cat swarm optimization (SCSO) and the equilibrium slime mould algorithm (ESMA). Conventional techniques for UCS prediction frequently encounter obstacles like gradual convergence and the potential for becoming ensnared in local minima. In this investigation, GPR is the foundational predictive model due to its adeptness in managing nonlinear associations within the dataset. The fusion of GPR with cuttingedge optimizers is envisioned to elevate the precision and expeditiousness of UCS prognostications.
An extensive collection of rock samples, each accompanied by UCS measurements, is harnessed to assess the suggested methodology. The efficacy of the GPSC and GPES models is juxtaposed with the conventional GPR technique. The findings reveal that incorporating SCSO and ESMA optimizers into GPR brings about a noteworthy enhancement in UCS prediction accuracy and expedites convergence. Notably, the GPSC models exhibit exceptional performance, evidenced by an exceptional R^{2} value of 0.995 and an impressively minimal RMSE value of 1.913. These findings emphasize the GPSC model’s potential as an exceedingly auspicious tool for experts in the realms of engineering and geology. It presents a sturdy and dependable method for UCS prediction, a resource of immense value in augmenting the security and efficiency of civil engineering endeavors.
Introduction
Background
At the heart of engineering, initiatives lie densifying loose soils, an indispensable endeavor that amplifies the mass per unit area for constructions like earth dams and highway embankments. Compaction transcends mere strength augmentation; it fortifies the soil’s resilience, elevates its loadbearing capability, and steadies embankment inclines to mitigate settlement issues [1]. In addition to bolstering strength, compaction offers a multitude of benefits, encompassing enhancements in volume, porosity, density, permeability, and impermeability. These improvements collectively elevate the soil quality, augmenting its ability to sustain structural loads. The UCS, a fundamental component of geomechanical models, is pivotal in mechanical rock behavior [2, 3]. UCS signifies the highest compressive stress a rock can withstand under controlled, uniaxial loading prior to experiencing failure. The field of rock mechanics, amalgamating theoretical foundations with practical implementations, elucidates the response of rocks to diverse stress conditions [4, 5]. The ramifications of rock failure bear significance in areas such as the production of solid materials and the stability of wellbores, especially within the context of petroleum operations. UCS data derived from subsurface formations holds utmost importance in drilling activities. This wealth of information informs the intricacies of bit hydraulics, determines the ideal mud weights for drilling, manages drilling costs, and elevates drilling efficacy [6].
UCS data extracted from subsurface formations is highly significant in drilling operations. This reservoir of data enlightens us about the complexities of bit hydraulics, establishes the optimal mud weights for drilling, oversees drilling expenditures, and enhances drilling efficiency. The unconfined compression test (UCT) adheres to a standardized procedure endorsed by both the American Society for Testing and Materials (ASTM) and the International Society for Rock Mechanics (ISRM) [7,8,9]. Conducting direct UCS measurements in the laboratory consumes significant time and financial resources and requires the careful preparation of core samples. Meeting the latter requirement becomes arduous when working with frail, thinly layered, or heavily fractured rock formations. Numerous researchers advocate embracing indirect methodologies for UCS prediction to tackle the aforementioned difficulties linked to core sample preparation and testing. These examinations are rapid, easy to perform, portable, and costefficient. Indirect testing techniques, such as the point load index test (Is (50)), the Brazilian tensile strength test (BTS), and the ultrasonic test (\(Vp\)), are commonly utilized for forecasting UCS. These evaluations place less rigorous requirements on sample preparation in contrast to the UCS test. These associated index tests, especially when coupled with engineering knowledge, can provide a valuable initial assessment of UCS [3, 4, 10, 11].
Laboratory trials on extracted core samples, offering insights into genuine stress conditions and mechanical traits, establish the groundwork for directly appraising mechanical rock properties [8]. These assessments encompass a spectrum of tests, including uniaxial and triaxial compressive strength assessments, scratch trials, Schmidt hammer examinations, and point load tests. Collectively, these methodologies set the standard for property assessment [12]. Nevertheless, acquiring a continuous UCS profile along wellbores encounters challenges with procuring representative core samples, including substantial costs and timeintensive procedures. To overcome this limitation, indirect methodologies have been formulated to bridge gaps by establishing connections between rock properties and petrophysical welllog data [13]. The significance of UCS transcends rocks and encompasses various materials, including soils and industrial byproducts, exerting a substantial impact on foundation design, slope stability analysis, and structures’ resilience. In Equilibrium slime mould algorithm materials, UCS assumes a pivotal role in influencing both the structural integrity and operational performance of pavements [7].
Nonetheless, the determination of a material’s UCS involves addressing a plethora of variables, encompassing physicochemical characteristics, varieties of cementitious additives, and the duration of curing. These factors mandate carefully designed laboratory investigations and specialized equipment [14, 15]. The validity of these evaluations pivots on the pursuit of precision, as evidenced by the specifications of the employed specimens [16, 17]. The exploration of alternative methodologies to determine the UCS of stabilized materials, such as pond ashes, arises from the demanding nature of these assessments, the resourceintensive requirements, and the complexity of acquiring representative samples [18, 19].
Literature review
Momeni’s research [20] presents a PSObased model for predicting the UCS of granite and limestone. This model outperforms traditional methods in accuracy, validated through experimentation with 66 sample sets. Inputs such as point load index, rebound number, pwave velocity, and dry density contribute to its high predictive performance, particularly sensitive to dry density and rebound number. However, its universal applicability beyond granite and limestone types is cautioned, indicating a need for further refinement. Jahed Armaghani’s study [21] introduces an adaptive neurofuzzy inference system (ANFIS) for forecasting UCS and Young’s modulus (E) of granite, surpassing conventional methods like multiple regression analysis (MRA) and artificial neural networks (ANN) in accuracy. ANFIS achieves exceptional R^{2} values of 0.985 for UCS and 0.990 for E, with low rootmeansquare error (RMSE) and high variance accounted for (VAF) percentages, minimizing uncertainties in rock engineering projects. Armaghani’s subsequent study [22] focuses on sandstone samples from Malaysia, employing a hybrid ICAANN model to predict UCS. This model demonstrates high accuracy, indicating significant progress in geotechnical research and offering practical applicability in estimating UCS for sandstone.
Asteris’s research [23] enhances Schmidt hammer rebound number analysis by integrating N and Ltype measurements into a comprehensive database. Models utilizing backpropagating neural networks (BPNN), genetic programming (GP), and thirdorder linear equations achieve superior prediction accuracy, with the BPNN 1–71 model notably precise. The study proposes the a20 index as a preferable performance indicator over the Pearson correlation factor (R), advancing predictive modeling for rock characterization. Armaghani’s study [24] compares nonlinear prediction models for estimating the UCS of granitic rocks, highlighting ANFIS as the superior predictor. However, caution is advised regarding the universal application, emphasizing suitability primarily for similar rock types. Soft computing methods like ANFIS exhibit consistent superiority over ANN and NLMR, showcasing their potency in UCS estimation based on rock index properties. Yagiz’s research [25] explores the correlation between Schmidt hardness rebound values and the UCS of rock, offering empirical equations for preliminary design assessments. Contextspecific equations are recommended due to variations across geological formations, ensuring accurate estimations tailored to specific conditions. Yagiz’s subsequent research [26] investigates the impact of cycling integer variations in the slake durability index test on intact rock behavior. ANN proves effective in estimating crucial rock properties like UCS and modulus of elasticity (E), highlighting its utility in material characterization and geotechnical engineering research.
Objective
This study introduces an innovative machinelearning methodology aimed at achieving precise and optimal predictive results in geotechnical engineering applications. The hybridization approach implemented here focuses on enhancing the performance of Gaussian process regression (GPR) models to generate dependable outcomes. By incorporating two advanced and efficient optimizers, namely the sand cat swarm optimization (SCSO) and the equilibrium slime mould algorithm (ESMA), the development of these hybrid models surpasses the performance of conventional methods, marking a significant advancement. The evaluation of model results encompassed the use of established performance metrics such as R^{2} and RMSE, playing a crucial role in mitigating potential biases and providing a more precise understanding of the models’ effectiveness. The increased precision achieved by the hybrid models can enhance wellinformed decisionmaking in geotechnical engineering projects, thus reducing the risks associated with inaccurate estimates of unconfined compressive strength (UCS). The utilization of GPR as a foundational predictive model for handling nonlinear associations within rock datasets offers several key advantages. GPR’s inherent flexibility enables it to capture intricate nonlinear relationships, providing more accurate predictions compared to linear regression techniques.
Additionally, GPR offers uncertainty quantification, crucial in geological studies where data may be uncertain or noisy, facilitating better decisionmaking through probabilistic predictions. Its robustness to small datasets mitigates overfitting issues commonly encountered with traditional regression methods, while its interpretability offers insights into spatial correlations, which is vital for geological exploration. Moreover, GPR’s adaptability to varying scales ensures accurate predictions across datasets with different units, making it a potent tool for predictive modeling in geotechnical engineering and related fields. The selection of SCSO and ESMA for hybridization with GPR in UCS prediction is motivated by their effectiveness in addressing challenges such as gradual convergence and avoiding local minima. SCSO utilizes swarmbased approaches inspired by sand cat colonies for efficient exploration of solution spaces, while ESMA, inspired by slime mold behavior, dynamically adjusts search parameters to navigate complex landscapes effectively. Combining the strengths of SCSO and ESMA with GPR equips hybrid models to tackle such challenges, ensuring more robust and reliable UCS predictions in geotechnical engineering applications.
Methods
Gaussian process regression (GPR)
The probabilistic regression approach of GPR initiates with a training dataset, denoted as \(D=\{({y}_{w}, {x}_{w}), w=\mathrm{1,2},3,\dots ,W\}\), consisting of W pairs of vector inputs \({x}_{w} \in {\mathbb{R}}^{L}\). Utilizing this training dataset with noisy scalar output values (yn), GPR constructs a model capable of effectively extrapolating the output distribution to novel input locations. Presumably, external factors such as truncation or observational errors are responsible for the uncertainty in output noise. This noise is additive, characterized by a zeromean, stationary, and normally distributed nature [27].
GPR employs a Gaussian process (GP) to depict the latent variables of \(f\), with x functioning as an indicator for these variables. The objective is to confine the examination to functions for which the values exhibit Gaussian correlation. This is accomplished by utilizing a consistent Gaussian distribution for any finite set of {\(f({x}_{1}),\dots ,f({x}_{k})\)} with distinct indices. This equates to introducing a GP prior to functions within a Bayesian framework. By defining the mean function v(x) and the covariance function \(k(x, {x}{^\prime})\), functions can be conveniently described. This method simplifies predicting function values for new inputs, even with limited training data. The variance, denoted as \({s}_{noise}^{2}\), is utilized to represent the model’s noise.
The E[.] represents the expectation. Typically, the mean function is chosen to be 0, with the primary focus on the unobserved region of the input space. The behavior of the process is solely influenced by the covariance function, which, by definition, is symmetric positive semidefinite when evaluated for any pair of input space points [28]. The covariance function typically encompasses multiple hyperparameters that dictate the prior distribution of f(x). The squared exponential covariance function is a frequently employed choice [29].
In this context, k represents a norm defined within the input space. It is essential to highlight that as the distance between input pairs x and \({x}{^\prime}\) increases, the covariance function diminishes rapidly, signifying weaker correlations between \(f(x)\) and \(f({x}{^\prime})\). There are three hyperparameters at play: \({q}_{1}\) determines the upper limit for covariance, \({q}_{2}\) is a strictly positive hyperparameter dictating the rate at which correlation diminishes with increasing point separation, and \({q}_{3}\) serves as an additional hyperparameter, representing the unknown variance \({s}_{noise}^{2}\) in Eq. (1), even though it is not explicitly mentioned in Eq. (2). These hyperparameters are assembled into a vector denoted as (q), which is treated as the actualization of a random vector (Q). The realization that provides the closest fit to the dataset is selected for generating predictions using the training data. The following joint Gaussian distribution can be derived when it is assumed that the hyperparameters are already known in this study, with the vector of training latent variables represented as f and the vector of test latent variables as \({f}^{*}\):
The symmetric covariance matrix K is generated by computing the covariance between the \(i\_th\) variable in the group denoted by the first subscript and the \(j\_th\) variable in the group represented by the second subscript (where * is utilized as an abbreviation for \({f}^{*}\)). This computation involves the covariance function \(k(.,.)\) from Eq. (4) and the associated hyperparameters [30]. Figure 1 presents the GPR flowchart.
Sand cat swarm optimization (SCSO)
SCSO is an algorithm based on swarm behavior, taking inspiration from the hunting tactics of sand cats for its convergence to a solution. The primary stages of SCSO can be outlined as follows.

❖
Step 1: Initiating the algorithm—within the optimization problem, each sand cat corresponds to an array with a 1 × Dim dimension (where Dim signifies the number of decision variables), as illustrated in Fig. 2 [31]. Per this diagram, each Pos value must fall within the specified upper and lower bounds. Initially, an initialization matrix is generated, considering the problem’s dimension (\(n\times Dim\)) [32, 33]. The associated solution is regarded as the output value, with each subsequent iteration replacing it with a superior value. If no improved values are attained during the current iteration, the solutions will remain unchanged and unsorted.

❖
Step 2: Exploration phase (hunting for prey)—sand cats possess the capability to detect low frequencies below 2 kHz, and SCSO leverages this keen sense of hearing [34]. The auditory acuity of the sand cat is expressed by Eq. (5) and denoted as \({R}_{G}\).

$${R}_{G}= {S}_{M}(\frac{{S}_{M}\times Iter}{{Max}_{Iter}})$$(5)$$R=2\times {R}_{G}\times rand \left(0, 1\right){R}_{G}$$(6)
In Eq. (6), parameter R serves as the control factor for regulating the exploration and exploitation phases of the algorithm. The value of \({S}_{M}\) is set to 2. During the search, the sand cat stumbles upon a new position randomly within the sensitivity range. The sensitivity range (r) undergoes random variations to prevent getting stuck in local solutions.
In this equation, the parameter \(R\_G\) serves as a guide for the sensitivity range, denoted as r. Each sand cat’s position is represented by \({P}_{i}\). The sand cat seeks the prey’s location relative to the best candidate position \(({P}_{bc})\), the current position \(({P}_{c}^{t})\), and the sensitivity range (r), as described in Eq. (8):

❖
Step 3: The exploitation phase (prey attack)—when simulating the sand cat’s attack, the distance between the sand cat and its prey is expressed by Eq. (8) [35]. In the attack modeling procedure, it is posited that the sensitivity range forms a circular area, and a random angle determines the direction of the sand cat’s movement (α) chosen through the Roulette wheel selection function. The sensitivity range ranges from − 1 to 1 for the random selection of α within the [0o, 360o] range. As illustrated in Fig. 2, this circular motion results in the sand cat moving in various peripheral directions. As a result, the sand cat can reach the hunting location more swiftly. The process of attacking prey is defined by Eq. (10) [36].

❖
Step 4: Executing the SCSO algorithm—as previously stated, the exploration and exploitation phases control is governed by R and R_G. As per Eq. (10), R takes on a random value within the range [− 4, 4] due to the decrease of R_G from 2 to 0, as indicated in Eq. (9). Hence, in accordance with Eq. (10), when the value of R is less than or equal to 1, the sand cat will engage in attacking the prey; otherwise, it will search for the prey within the global domain under different circumstances.
In line with Eq. (11), the sand cat’s position undergoes updates during the exploration and exploitation.
Equilibrium slime would algorithm (ESMA)
The foraging behavior of slime mould presents a promising source of inspiration for developing effective and efficient optimization methods [37]. The starting position vector of each slime mold is randomly initialized through a randomization process.
The positioning model for the ith slime mould, represented as \({X}_{i}\) (\(j=\mathrm{1,2},...,N\)), in the next iteration (t + 1), is established using SMA as follows:
The \({\overrightarrow{{\text{X}}}}_{Gbest}\) denotes the value of the global best fitness achieved across iterations one to t. Additionally, the variables \({r}_{1}\) and \({r}_{2}\) correspond to random values within the range of [0, 1].
To eradicate and disseminate the slime mold, a probability denoted by \(z\) is utilized. Within the context of this study, z is a constant value of 0.03 [38]. Equation (14) sorts the fitness values in ascending order.
Equation (15) is employed to calculate \(\overrightarrow{{\text{U}}}\).
A random number, \({r}_{3}\), uniformly distributed within the range of [0, 1], is utilized. The local worst and best fitness values acquired during the current iteration are denoted by \(f_{Lworst}\mathrm{and}\;f_{Lbest}\), respectively. Equations (15–16) are employed to calculate these fitness values.
and
Below is the formula that defines the variable \({P}_{i}\), which represents the probability of selecting the trajectory of the ith slime mold:
For each\(i = 1, 2, . . . , N\), the fitness value of the ith slime mold in \({X}_{i}\) is determined by \(f\left({X}_{i}\right).\) The initial iteration’s global best fitness value up to the current iteration is represented by\({f}_{Gbest}\). The magnitude of the step size is indicated by \({\overrightarrow{step}}_{a}\) and is determined by a uniform distribution ranging from − a to a. Similarly, the size of the step, represented by\({\overrightarrow{step}}_{b}\), is determined by a uniform distribution ranging from − b to b. The values of a and b are determined by Eq. (19), which is a function of the current iteration t and the maximum iteration T:
Despite the SMA’s promising results, there is still room for improvement in the search process, as indicated by Eq. (24). It is essential to note that incorporating random slime molds can alter the trajectory of the search. Local minima can constrain the efficacy of the search process when selecting individuals \({\overrightarrow{X}}_{D}\) and \({\overrightarrow{X}}_{C}\) from a sample of N slime molds. This section introduces a new optimization technique called the equilibrium slime mould algorithm (EOSM). This algorithm replaces the position vector \({\overrightarrow{X}}_{A}\) with a vector derived from an equilibrium pool of four superior position vectors. The average position of this selection is then computed, guided by the EO concept. Equation (21) precisely defines the components of the equilibrium pool.
A set of fiveposition vectors is utilized to construct the equilibrium pool, represented by \({\overrightarrow{X}}_{eq,pool}\).
In ESMA, the position vector for the ith slime mold is\({X}_{i} (j = 1, 2, . . . , N)\), while the new iteration (t + 1) is represented by the following:
The position vector \({\overrightarrow{X}}_{eq}\) is obtained by randomly selecting a vector from the equilibrium pool. The algorithmic tool \(z\) is employed to facilitate exploration in the search process, ensuring ESMA’s effectiveness by preventing minimal local occurrence. An experimentally determined threshold value of 0.03 is utilized to achieve this objective. It is important to note that the ESMA algorithm modifies the position vector in the following iteration by combining the global best position, the local best position obtained from the bestsofar equilibrium pool, and a random vector. This approach allows for a balanced exploration–exploitation tradeoff.
Data gathering
The comprehensive data collection process encompasses the meticulous acquisition of 106 soil samples sourced from a diverse array of locations, ensuring a representative cross section. These samples are then meticulously subjected to rigorous laboratory testing, employing a systematic approach to analyze their composition, characteristics, and properties. The detailed findings of this exhaustive process are succinctly summarized in Table 1, with references provided for further scrutiny and validation [1, 39, 40]. Table 1 acquires data about the inputs requisite for predicting rocks’ unconfined compressive strength (UCS), which necessitates a meticulous approach. The following procedures were meticulously executed to attain the essential dataset:

1.
Bulk density (BD): Bulk density values were ascertained employing laboratorybased measurements. Rock samples, representative of the study scope, were subjected to a meticulous determination of mass and volume.

2.
Brazilian tensile strength (BTS): Measuring Brazilian tensile strength requires specialized laboratory equipment. Rock specimens were carefully prepared and subjected to tensile stress, with resulting values recorded.

3.
Dry density (DD): Dry density data was obtained through systematic laboratory assessments. This involved accurately determining the mass and volume of rock samples postdesiccation.

4.
Pwave velocity (\(Vp\)): Pwave velocity, a pivotal parameter, was derived from seismic investigations. Fieldbased seismic surveys were conducted, or existing seismic data conforming to the requisite rock types was employed.

5.
Shear strength (\(SRn\)): Laboratorybased tests were performed to establish shear strength properties. This encompassed the application of controlled stress conditions on rock specimens, with meticulous recording of the outcomes.

6.
Uniaxial compressive strength at 50% (Is (50)): The uniaxial compressive strength at 50% confining stress was determined via conventional uniaxial compression tests. These standardized tests subjected rock samples to axial loading until failure occurred, with precise measurement of strength values.

7.
Unconfined compressive strength (UCS): The ultimate target variable, unconfined compressive strength (UCS), was obtained through rigorous unconfined compression tests. These tests involved the application of axial load until rock sample failure was observed, allowing for the precise determination of UCS values.
The data acquisition process adhered to stringent quality control measures to ensure the dataset’s integrity, consistency, and accuracy in Fig. 3. This comprehensive dataset, comprising diverse rock types and conditions, serves as the foundation for the subsequent development of a robust machinelearning model for UCS prediction. It is noteworthy to mention that the dataset utilized in this study is observable in Table 5 in Appendix.
Performance evaluators
In this section, an overview is provided regarding a range of metrics employed to assess the performance of hybrid models, with a specific emphasis on their ability to quantify errors and correlations. These metrics serve as valuable tools for evaluating the efficacy of hybrid models in diverse applications. The metrics under discussion include rootmeansquared error (RMSE), coefficient of determination (R^{2}), meansquared error (MSE), mean relative absolute error (MRAE), and the ratio of RMSE to standard deviation (RSR). These metrics collectively form a comprehensive toolkit for evaluating and understanding the accuracy and reliability of hybrid models in realworld scenarios [41]. The accuracy and reliability of the proposed models (GPSC and GPES) in predicting UCS were validated using a variety of metrics and criteria beyond R^{2} and RMSE values. Specifically, in addition to R^{2} and RMSE, a range of supplementary metrics was employed to comprehensively assess the performance of the models. These included MSE, MRAE, and RSR. By utilizing this array of metrics, a holistic understanding of the predictive capabilities of the models was obtained, ensuring a thorough evaluation of their accuracy and reliability.

Coefficient correlation (R^{2}):

Rootmeansquare error (RMSE):

Mean square error (MSE):

Mean relative absolute error (MRAE):

The ratio of RMSE to standard deviation (RSR):
Respectively, the variables can be articulated as follows:

The predicted value is denoted as \({b}_{i}\).

m̅ and b̅ represent the measured and average predicted values, respectively.

The recorded value is indicated as \({m}_{i}\).

n signifies the sample size.

The critical value from the tdistribution relies on the selected confidence level and degrees of freedom, represented as t.
Results and discussion
Results of hyperparameters and convergence curves
In contrast to parameters, hyperparameters represent predefined specifications that are not inherently deduced from the dataset. These external configurations, incorporating elements such as learning rates and regularization strengths, are instrumental in delineating the behavioral characteristics of a model within the context of machine learning. Achieving optimal model performance relies significantly on the fundamental task of finetuning hyperparameters, necessitating rigorous experimentation and the utilization of sophisticated optimization methodologies. Table 2 intricately outlines the hyperparameter values linked with GPSC and GPES models, focusing particularly on n_restarts, length_scale, and alpha. As an illustration, the alpha hyperparameter value for GPSC was 0.2 and for GPES was 0.26.
Figure 4 depicts a graph demonstrating the evolution of RMSE across iterations. The horizontal axis signifies the iteration number, ranging from 0 to 200, while the vertical axis denotes the corresponding RMSE values, which fall within the range of 0 to 10. The graphs started with a higher RMSE and progressively decreased with each subsequent iteration, finally converging to a lower RMSE value by the 200th iteration. Analyzing the convergence curves, the GPSC model commenced with an initial RMSE of 9.1, whereas the GPES model started slightly lower at RMSE = 8.1. Throughout the convergence process, both models demonstrated consistent reductions in RMSE, ultimately reaching values below 4 by the 200th iteration. In summary, while both models exhibited improvement, the GPSC model outperformed with a final RMSE value of 2.6.
Comparison of models’ performance
In this section, a comparative analysis of the results produced by the proposed model was undertaken, employing two frameworks: single models and hybrid models. More precisely, hybrid variations are formulated by combining GPR with the sand cat swarm optimization (GPSC) and the equilibrium slime mould algorithm (GPES). For these models under consideration, 70% of the UCS inputs were designated for the training phase, with the remaining 30% split into 15% for validation and 15% for testing purposes. To comprehensively evaluate the outcomes and guarantee impartial results, a suite of metrics, including R^{2}, RMSE, MSE, MRAE, and RSR, were utilized. In the context of the R^{2} metric, values nearing 1 signify excellent results. Conversely, for the error indicators, values approaching 0 indicate precise outcomes.
Table 3 presents the outcomes as assessed by the models using the specified metrics. The performance of the GPR model is considered subpar, as indicated by its RMSE values of 4.145 in training and 6.313 in testing. Nevertheless, the integration of optimizers has resulted in a notable enhancement in the precision of the GPR model. Among the hybrid models, the GPSC model stands out with the highest accuracy, achieving an R^{2} value of 0.995 and an RMSE value of 1.913 in training. Furthermore, the GPES model has produced moderate results, achieving an R^{2} value of 0.988 in training. While both optimizers have improved the performance of the conventional GPR, the GPSC has yielded exceptionally accurate results.
The detailed performance information provided in the upcoming figures can be consulted to perform a more comprehensive evaluation of the model’s predictive capability regarding UCS. Figure 5 illustrates a scatter plot representing the models’ performance, taking into account their R^{2} and RMSE values. The scatter plot includes circular shapes of different colors, denoting the training, testing, and validation phases. These shapes are distributed around a central line, symbolizing an ideal outcome with an R^{2} value of 1. The GPR model exhibits noticeable data dispersion, highlighting its limited accuracy.
In contrast, the GPSC and GPES models display improved performance compared to the standalone GPR model. The data points for GPSC are closely grouped around the central line, suggesting a more favorable outcome. Nonetheless, some broader dispersions can be observed in the case of GPES.
Figure 6 demonstrates the relationship between the predicted and measured values of the GPR base models. In this illustration, the black lines represent the measured values. When the predicted values align precisely with these measured values, it signifies the model’s accuracy.
As shown in Fig. 6, there is a noticeable difference between the data produced by the GPR model and the measured values, especially during the testing phase. Conversely, the GPSC model demonstrates remarkably accurate results, with a nearly perfect alignment between its predicted and measured values, particularly during the testing phase. On the other hand, the GPES model shows a notable lack of precision between sample numbers 20 and 40, which renders it less accurate than the GPSC model.
Evaluating the error percentage of the proposed models can provide further insights into their performance and aid in identifying the most optimal one. Figure 7 illustrates the normal distribution of errors in the models. The GPR model exhibits the highest frequency of errors within the range of − 80 to 80% error values. Nonetheless, the GPSC model demonstrates a pronounced skewness, with a substantial frequency of errors clustering near 0%. Conversely, the GPES model exhibits a moderate performance relative to the others, with errors from around − 10 to 10%. Among these models, the GPSC model distinguishes itself for its precision and consistently reliable results.
To obtain a thorough grasp of the errors across the models, refer to the scatter interval illustrated in Fig. 8. The GPSC model’s accuracy is revealed by the closely grouped data points, predominantly falling within the error range of − 10 to 10%. This procedure stands in contrast to the dispersion noticed in the other models. It highlights the dependability of the GPSC model’s predictions, as clustering data points within a relatively narrow error range signify its consistent and precise performance. In contrast, the dispersion observed in the error distribution of the other models indicates a broader variability in their predictions.
A comparison: present investigation versus prior studies
Table 4 meticulously elucidates the findings derived from a comprehensive array of previous studies focused on predicting UCS within the field. These collective outcomes furnish an expansive groundwork against which the findings of our present study can be thoroughly evaluated and compared. Regarding this table, several models, such as ANN and RF, have been used to predict the UCS of rock samples. Among all these models, the RF model, which is related to a study by Hoque et al. [42], registered the least error value while registering the least R^{2} value as well. On the other hand, the study of Sharma and Singh [43] employing ModelI simultaneously registered the highest R^{2} and RMSE values. Opposite of the previous study, in this study, GPSC model experienced a high R^{2} value of 0.995 and a low RMSE value of 1.913.
Conclusions
The prediction of the unconfined compressive strength (UCS) of rocks through machine learning has emerged as a promising and impactful area of research in the field of geotechnical engineering. This endeavor has significant implications for a wide range of civil engineering projects, including tunneling, mining, and foundation design, where an accurate assessment of UCS is crucial for ensuring the safety and stability of structures. The study emphasizes machine learning’s crucial role, particularly hybrid models, in enhancing the accuracy and reliability of UCS predictions. Conventional methods face difficulties in capturing complex rock data relationships, leading to slow convergence and local minima issues. Machine learning techniques, such as Gaussian process regression (GPR) coupled with advanced optimizers like sand cat swarm optimization (GPSC) and equilibrium slime mould algorithm (GPES), effectively tackle these challenges. Furthermore, evaluating these models using various performance metrics, including R^{2}, RMSE, MSE, MRAE, and RSR, provides a comprehensive understanding of their capabilities. Based on the results, it can be concluded as follows:

The integration of SCSO and ESM optimization algorithms into the unified GPR model led to notable enhancements in R^{2} values. In particular, SCSO contributed to a 2.1% improvement, and ESM contributed to a 1.3% increase. This underscores the effectiveness of incorporating these optimization techniques into the GPR framework to boost the accuracy of UCS predictions.

Among all the models assessed, the GPR model exhibited the least favorable performance. This was apparent from its recording of the highest error value, notably 6.313 for RMSE during validation, and the lowest R^{2}, which was at 0.955 during validation. Taken together, these metrics suggest that the GPR model lacks accuracy in predicting UCS values.

The findings unquestionably highlight the outstanding precision of the GPSC model in contrast to the GPR and GPES models. This superiority is notably apparent in its remarkable performance, marked by an RMSE value of 1.913 and the highest R^{2} value of 0.995.

Based on the findings obtained from the analysis, it can be inferred that the prediction models utilized for estimating the UCS were developed incorporating the parameters under investigation. Upon comparing the experimental findings with the prognostications generated by the proposed models, it was discerned that these models exhibited a notable level of accuracy in forecasting the UCS values. These results strongly imply that the proposed models demonstrate efficacy in predicting UCS and hold promise for diverse applications in geotechnical engineering. However, it is imperative to acknowledge that the applicability of these models across varying soil conditions might be delimited by the specificity of the dataset employed in their development. Furthermore, the intricate nature introduced by the utilization of multiple models and the sensitivity to the selection of metaheuristic algorithms necessitates careful consideration when evaluating their interpretability and practical utility.
Availability of data and materials
Data can be shared upon request.
References
Hossein Alavi A, Hossein Gandomi A, Mollahassani A, Akbar Heshmati A, Rashed A (2010) Modeling of maximum dry density and optimum moisture content of stabilized soil using artificial neural networks. J Plant Nutr Soil Sci 173(3):368–379
Park SS (2011) Unconfined compressive strength and ductility of fiberreinforced cemented sand. Constr Build Mater 25(2):1134–1138
Ruffolo RM, Shakoor A (2009) Variability of unconfined compressive strength in relation to number of test samples. Eng Geol 108(1–2):16–23
Das SK, Samui P, Sabat AK (2011) Application of artificial intelligence to maximum dry density and unconfined compressive strength of cement stabilized soil. Geotech Geol Eng 29:329–342
Sathyapriya S, Arumairaj PD, Ranjini D (2017) Prediction of unconfined compressive strength of a stabilised expansive clay soil using ANN and regression analysis (SPSS). Asian J Res Soc Sci Humanit 7(2):109–123
Majdi A, Rezaei M (2013) Prediction of unconfined compressive strength of rock surrounding a roadway using artificial neural network. Neural Comput Appl 23:381–389
Naeini SA, Naderinia B, Izadi E (2012) Unconfined compressive strength of clayey soils stabilized with waterborne polymer. KSCE J Civ Eng 16:943–949
Ghazavi M, Roustaie M (2010) The influence of freeze–thaw cycles on the unconfined compressive strength of fiberreinforced clay. Cold Reg Sci Technol 61(2–3):125–131
Sedaghat B, Tejani GG, Kumar S (2023) Predict the maximum dry density of soil based on individual and hybrid methods of machine learning. Adv Eng Intell Syst 002(03). https://doi.org/10.22034/aeis.2023.414188.1129
Suman S, Mahamaya M, Das SK (2016) Prediction of maximum dry density and unconfined compressive strength of cement stabilised soil using artificial intelligence techniques. Int J Geosynth Gr Eng 2:1–11
Ceryan N, Okkan U, Kesimal A (2013) Prediction of unconfined compressive strength of carbonate rocks using artificial neural networks. Environ earth Sci 68:807–819
Narendra BS, Sivapullaiah PV, Suresh S, Omkar SN (2006) Prediction of unconfined compressive strength of soft grounds using computational intelligence techniques: a comparative study. Comput Geotech 33(3):196–208
Sivrikaya O, Togrol E, Komur M (2004) Determination of unconfined compressive strength by artificial neural network. 10th national congress of soil mechanics and foundation engineering, Istanbul, Turkey
Nazir R, Momeni E, Armaghani DJ, Amin MFM (2013) Correlation between unconfined compressive strength and indirect tensile strength of limestone rock samples. Electron J Geotech Eng 18(1):1737–1746
Yılmaz I, Sendır H (2002) Correlation of Schmidt hardness with unconfined compressive strength and Young’s modulus in gypsum from Sivas (Turkey). Eng Geol 66(3–4):211–219
Kelessidis VC (2011) Rock drillability prediction from in situ determined unconfined compressive strength of rock. J South African Inst Min Metall 111(6):429–436
Armaghani DJ et al (2021) Predicting the unconfined compressive strength of granite using only two nondestructive test indexes. Geomech Eng 25(4):317–330
Grima MA, Babuška R (1999) Fuzzy model for the prediction of unconfined compressive strength of rock samples. Int J rock Mech Min Sci 36(3):339–349
Zaid M, Sadique MR, Samanta M (2020) Effect of unconfined compressive strength of rock on dynamic response of shallow unlined tunnel. SN Appl Sci 2(12):2131
Momeni E, Armaghani DJ, Hajihassani M, Amin MFM (2015) Prediction of uniaxial compressive strength of rock samples using hybrid particle swarm optimizationbased artificial neural networks. Measurement 60:50–63
Jahed Armaghani D, Tonnizam Mohamad E, Momeni E, Narayanasamy MS, Mohd Amin MF (2015) An adaptive neurofuzzy inference system for predicting unconfined compressive strength and Young’s modulus: a study on Main Range granite. Bull Eng Geol Environ 74:1301–1319
Armaghani DJ, Amin MFM, Yagiz S, Faradonbeh RS, Abdullah RA (2016) Prediction of the uniaxial compressive strength of sandstone using various modeling techniques. Int J Rock Mech Min Sci 85:174–186
Asteris PG et al (2021) Soft computing based closed form equations correlating L and Ntype Schmidt hammer rebound numbers of rocks. Transp Geotech 29:100588
Jahed Armaghani D, Tonnizam Mohamad E, Hajihassani M, Yagiz S, Motaghedi H (2016) Application of several nonlinear prediction tools for estimating uniaxial compressive strength of granitic rocks and comparison of their performances. Eng Comput 32:189–206
Yagiz S (2009) Predicting uniaxial compressive strength, modulus of elasticity and index properties of rocks using the Schmidt hammer. Bull Eng Geol Environ 68:55–63
Yagiz S, Sezer EA, Gokceoglu C (2012) Artificial neural networks and nonlinear regression techniques to assess the influence of slake durability cycles on the prediction of uniaxial compressive strength and modulus of elasticity for carbonate rocks. Int J Numer Anal Methods Geomech 36(14):1636–1650
Mehdipour P et al (2014) Application of Gaussian process regression (GPR) in estimating underfive mortality levels and trends in Iran 1990–2013, study protocol
Rasmussen CE, Williams CKI (2005) Gaussian processes for machine learning (adaptive computation and machine learning) the mit press. Cambridge, MA, USA, pp 69–106
Wang B, Chen T (2015) Gaussian process regression with multiple response variables. Chemom Intell Lab Syst 142:159–165
Wan ZY, Sapsis TP (2017) Reducedspace Gaussian process regression for datadriven probabilistic forecast of chaotic dynamical systems. Phys D Nonlinear Phenom 345:40–55
Wu D, Rao H, Wen C, Jia H, Liu Q, Abualigah L (2022) Modified sand cat swarm optimization algorithm for solving constrained engineering optimization problems. Mathematics 10(22):4350
Seyyedabbasi A, Kiani F (2023) Sand cat swarm optimization: a natureinspired algorithm to solve global optimization problems. Eng Comput 39(4):2627–2651
Sedaghat B, Javadzade Khiavi A, Naeim B, Khajavi E, Taghavi Khanghah AR, Sedaghat H (2023) The utilization of a Naïve Bayes model for predicting the energy consumption of buildings. J Artif Intell Syst Model 01(01). https://doi.org/10.22034/JAISM.2023.422292.1003
Li Y, Wang G (2022) Sand cat swarm optimization based on stochastic variation with elite collaboration. IEEE Access 10:89989–90003
Aghaei VT, SeyyedAbbasi A, Rasheed J, AbuMahfouz AM (2023) Sand cat swarm optimizationbased feedback controller design for nonlinear systems. Heliyon 9(3):191
Qtaish A, Albashish D, Braik M, Alshammari MT, Alreshidi A, Alreshidi EJ (2023) Memorybased sand cat swarm optimization for feature selection in medical diagnosis. Electronics 12(9):2042
Yin S, Luo Q, Zhou Y (2022) EOSMA: an equilibrium optimizer slime mould algorithm for engineering design problems. Arab J Sci Eng 47(8):10115–10146
Naik MK, Panda R, Abraham A (2021) An entropy minimization based multilevel colour thresholding technique for analysis of breast thermograms using equilibrium slime mould algorithm. Appl Soft Comput 113:107955
Taffese WZ, Abegaz KA (2022) Prediction of compaction and strength properties of amended soil using machine learning. Buildings 12(5):613
Alavi AH, Gandomi AH, Gandomi M, Sadat Hosseini SS (2009) Prediction of maximum dry density and optimum moisture content of stabilised soil using RBF neural networks. IES J Part A Civ Struct Eng 2(2):98–106
Botchkarev A (2018) Performance metrics (error measures) in machine learning regression, forecasting and prognostics: properties and typology. arXiv Prepr. arXiv1809.03006 14:45–79
Hoque MI, Hasan M, Islam MS, Houda M, Abdallah M, Sobuz MHR (2023) Machine learning methods to predict and analyse unconfined compressive strength of stabilised soft soil with polypropylene columns. Cogent Eng 10(1):2220492. https://doi.org/10.1080/23311916.2023.2220492
Sharma LK, Singh TN (2018) Regressionbased models for the prediction of unconfined compressive strength of artificially structured soil. Eng Comput 34(1):175–186. https://doi.org/10.1007/s0036601705288
Acknowledgements
This work was sponsored in part by Scientific Research Fund of Hunan Provincial Education Department (23C0343) and Guiding Science and Technology program (2022YZKJZD007).
Funding
This research received no specific grant from any funding agency in the public, commercial, or notforprofit sectors.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. Data collection, simulation, and analysis were performed by “LH, SL, and EG.” The first draft of the manuscript was written by “LH,” and all authors commented on previous versions of the manuscript. All authors have read and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Huang, L., Li, S. & Guo, E. The unconfined compressive strength estimation of rocks using a novel hybridization technique based on the regulated Gaussian processor. J. Eng. Appl. Sci. 71, 101 (2024). https://doi.org/10.1186/s44147024004168
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s44147024004168