Skip to main content

Innovative compressive strength prediction for recycled aggregate/concrete using K-nearest neighbors and meta-heuristic optimization approaches


This paper presents a groundbreaking method for predicting the compressive strength (Fc) of recycled aggregate concrete (RAC) through the application of K-nearest neighbors (KNN) analysis. The task of designing mixture proportions to achieve the desired Fc can be remarkably intricate, owing to the intricate interplay among the components involved. Machine learning (ML) algorithms have exhibited considerable promise in tackling this complexity effectively. In pursuit of enhanced prediction accuracy, this research introduces a semi-empirical approach that seamlessly integrates strategies, including optimization techniques. This study incorporates two meta-heuristic methods, the Fire Hawk optimizer (FHO) and Runge–Kutta optimization (RUK) to enhance model accuracy. The research results reveal three separate models: KNFH, KNRK, and a single KNN model, each providing valuable insights for precise Fc prediction. Remarkably, the KNFH model stands out as a top performer, boasting an impressive R2 value of 0.994 and a meager RMSE value of 1.122. These findings not only validate the accuracy and reliability of the KNFH model but also highlight its effectiveness in predicting Fc outcomes. This approach holds great promise for precise Fc forecasting in the construction industry. Integrating meta-heuristic algorithms significantly improves model accuracy, leading to more reliable forecasts with profound implications for construction projects and their outcomes. This research marks a significant advancement in predicting Fc using ML, offering valuable tools for engineers and builders.


Compressive strength (\({F}_{c}\)) stands as a pivotal parameter within structural engineering and construction materials. It functions as a fundamental gauge of a material’s ability to endure axial loads, those forces that compress or shorten it. More precisely, \({F}_{c}\) quantifies the maximum axial stress a material, typically concrete, can withstand without succumbing to failure or collapse [1,2,3,4]. This characteristic carries immense significance in the planning and construction of vital structures like buildings, bridges, dams, and various infrastructure projects. A comprehensive grasp of \({F}_{c}\) proves indispensable to engineers and architects, as it holds direct sway over structural integrity and safety. Factors such as the composition of concrete mixes, curing conditions, and environmental factors wield substantial influence over \({F}_{c}\). Consequently, researchers and professionals are pursuing refining their comprehension and predictive methodologies in this realm. Recent years have borne witness to the deployment of advanced techniques such as machine learning, finite element analysis, and non-destructive testing, all aimed at augmenting the precision of \({F}_{c}\) predictions. Moreover, the evolution of concrete technology, encompassing the incorporation of supplementary cementitious materials and alternative aggregates, has ushered in an era of more sustainable construction practices. Notably, these innovations have not come at the expense of compressive strength; in many instances, they have improved it [5,6,7]. In essence, the study of \({F}_{c}\) serves as the linchpin in guaranteeing the durability and reliability of civil engineering structures. The ceaseless march of research and innovation in this field redefines the future of construction materials and practices, ensuring that they align with the demands of an ever-expanding construction industry while considering environmental sustainability [8,9,10].

The relentless expansion of the construction industry necessitates vast quantities of aggregates, primarily employed as one of the primary constituents in concrete production. In stark contrast, the demolition of aging structures begets an abundance of discarded concrete, often occupying precious landfill space, engendering severe environmental concerns such as land depletion. This predicament has spurred the exploration of recycling and repurposing demolished concrete as an eco-friendly alternative to non-renewable virgin aggregates [11,12,13]. The utilization of recycled concrete aggregate (RCA), derived from the crushing of demolished concrete, has emerged as a promising solution, capable of ameliorating the sustainability of natural resources while mitigating the adverse environmental repercussions associated with the mere disposal of demolished concrete. Nevertheless, it is essential to acknowledge that RCA differs in properties from natural aggregate (NA).

Consequently, the physical and mechanical attributes of RAC crafted from RCA exhibit disparities compared to their natural aggregate concrete (NAC) counterparts. These distinctions chiefly arise due to the higher porosity and water absorption characteristics exhibited by RCA in contrast to NA [14,15,16]. One pivotal mechanical property in the concrete industry, the elastic modulus, gauges a material’s deformation is particularly noteworthy. RAC generally demonstrates a lower elastic modulus value when compared to NAC formed with an equivalent water-to-cement ratio (w/c). Various researchers have proposed equations aimed at correlating the elastic modulus of concrete with other properties, such as \({F}_{c}\). However, it is essential to acknowledge that these equations are primarily rooted in experimental data gathered from NAC, casting doubt upon their applicability to RAC.

A more nuanced approach is indispensable in light of the complex and multifaceted nature of experimental trials, particularly those teeming with a myriad of parameters, some of which exert only marginal influence on outcomes [17]. Computer scientists have responded to this challenge by crafting selection algorithms founded on data-driven models [18]. These algorithms exhibit a remarkable capacity to discern the most pivotal independent variables, promptly trimming the dimensionality of the input matrix and, in turn, enhancing efficiency. The domain of engineering components, systems, and materials is experiencing an escalating demand for soft computing tools in predictive modeling. This upward trajectory underscores the continued prominence of machine learning (ML) models, particularly artificial neural networks (ANNs), lauded for their adeptness in generating precise outcome predictions closely mirroring empirical observations [19,20,21]. In an era marked by the relentless march of technology, these data-driven tools are revolutionizing our capacity to predict the \({F}_{c}\) of RAC, providing invaluable insights into the behavior of this environmentally conscious construction material.

This study is dedicated to refining the accuracy of predictions concerning the \({F}_{c}\) development in RAC by improving the K-nearest neighbors (KNN) model. However, realizing the full predictive potential of KNN necessitates the meticulous optimization of its parameters. To tackle this challenge, the study integrates two competent optimization algorithms: the Fire Hawk optimizer (FHO) and Runge–Kutta optimization (RUK). This amalgamation aims to amplify the efficiency of processes associated with both the design and construction of \({F}_{c}\) in RAC, ultimately conferring benefits upon the infrastructure sector and the constructed environment. To validate the robustness of the proposed framework, an extensive dataset about \({F}_{c}\) is employed. A comprehensive comparative analysis is meticulously conducted to establish its superiority over conventional optimization methods. Esteemed statistical metrics, including R2, RMSE, and MSE, are harnessed with precision to assess the performance of the ML models incorporated in this research.


Data gathering

The study thoroughly investigated the compressive strength (\({F}_{c}\)) of recycled aggregate concrete (RAC) while considering multiple variables. To enhance the efficiency of analysis, the dataset was meticulously partitioned into three distinct subsets: a training set (70%), a validation set (15%), and a testing set (15%). The study made use of Table 1’s thorough analysis of input variables crucial to concrete production to predict \({F}_{c}\) behavior using a KNN model. Understanding and controlling the final concrete product’s quality relies heavily on these factors. Four hundred forty-one observations make up the dataset used in this study, ensuring robust statistical properties. The study provides a thorough explanation of each variable below:

Table 1 The statistic properties of the input variable of FC
  1. 1.

    Water-to-cement ratio (w/c)

This variable represents the proportion of water to cement in the concrete mix, ranging from 0.30 to 1.03. It has an average of 0.55 and a standard deviation of 0.15. A lower value indicates reduced water content, typically leading to stronger concrete.

  1. 2.

    Coarse aggregate to cement ratio (CA/C)

Denoting the ratio of coarse aggregates to cement spans from 1.00 to 7.40, with an average of 3.32 and a standard deviation of 1.21. CA/C significantly influences the structural properties of concrete.

  1. 3.

    Cement fineness (r)

This variable measures the fineness of cement particles, with values ranging from 0.00 to 1.00. The average is 0.52, with a standard deviation of 0.39. Finer cement particles can enhance both workability and strength.

  1. 4.

    Fine aggregate to total aggregate ratio (FA/TA)

The ratio of fine aggregates to total aggregates varies from 0.00 to 0.58, with an average of 0.40 and a standard deviation of 0.07. This ratio significantly impacts concrete workability and long-term durability.

  1. 5.

    Specific gravity of saturated surface-dry aggregates (SG)

The specific gravity of saturated surface-dry aggregates ranges from 0.00 to 6.23. The average is 2.28, with a standard deviation of 0.76, reflecting aggregate density.

  1. 6.

    Water absorption of aggregates (\({\text{W}}{\text{a}}\))

This variable represents the water absorption capacity of aggregates, with values spanning from 0.00 to 28.00. The average is 3.71, with a standard deviation of 3.06. Lower water absorption is desirable for concrete quality.

Based on a dataset of 441 observations, this in-depth analysis of these variables offers vital insights for optimizing concrete mix designs to achieve desired strength and performance characteristics. The statistical properties provide invaluable information for quality assurance and determining variability in concrete production [22]. Marginal histograms, which are visual representations of the distributions of specific variables along the edges of a scatter plot or two-dimensional graph, are shown in Fig. 1. They give a quick overview of the distribution of the data, making it easier to spot trends, outliers, and patterns within each variable while also visualizing how those variables relate to one another.

Fig. 1
figure 1

The marginal histograms plot for input and output

K-nearest neighbor's (KNN)-based

The \(KNN\) algorithm makes predictions based on the most frequently occurring feedback from \(K\) data points nearest the test point. Before applying the algorithm, it is essential to address the normalization of these parameters using Eq. (1).


Afterward, utilize Eq. (2) to compute the Euclidean distance between the test and data points.


Equation (2) calculates the distance \(H\) between the original data points \(({x}_{i})\) and the test point \(({x}_{j})\) using Euclidean distance, where \(m\) is the number of argument points [23]. However, since different parameters have varying impacts on thermal comfort even when the exact value is changed, such as a \(1^\circ C\) change in air temperature has a more significant impact than a \(1\%\) change in air humidity to remove the inconsistent effects of indoor thermal parameters on thermal comfort, it is necessary to modify the Euclidean distance for all parameters using Eq. (3).


The weight \(({w}_{h})\) assigned to each indoor thermal parameter that impacts thermal comfort [24]. Distances are calculated to determine the \(K\) data points closest to the test point [25]. The feedback from the subjects at the current test point is then taken to be the feedback that occurs the most frequently among these \(K\) data points. Cross-validation can be used to determine the value of \(K\), which establishes the quantity of necessary data points. It is crucial to pick a \(K\) value that is in the middle between the two extremes. The model may be overly sensitive to sample points close to the test point if \(K\) is too small, leading to an excessive amount of interference from noise points. On the other hand, if \(K\) is too high, the model’s accuracy might suffer. The flowchart of \(KNN\) has been shown in Fig. 2.

Fig. 2
figure 2

The flowchart of the KNN mode

Fire Hawk optimizer (FHO)

The \(FHO\) steps are introduced in this section. The starting population \(X\) of \(FHO\) is given a value, and it has \(N\) solutions with \(D\) values [26]. This procedure is shown as in Eq. (4).

$${X}_{ij}=rand\times \left({U}_{j}-{L}_{j}\right)+{L}_{j}, j=\mathrm{1,2},\dots ,D$$

\({U}_{j}\) and \({L}_{j}\) are utilized in Eq. (4) to represent the search domain’s boundaries at dimension j. A random value is indicated by rand \(<span class='reftype'>[0, 1]</span>\). Each solution \({X}_{i}\) then calculates its fitness value and the best one (\({X}_{b}\)) as having the highest fitness value. The best \(n\) solutions are then used to construct the fire Hawks \(({FH}_{l,l} = 1, 2,...,n),\) while the rest refer to the prey \(({PR}_{k,k}= 1, 2,...,m).\) The distance between \(FH\) and \(PR\) is then calculated as follows:

$${D}_{lk}=\sqrt{{({x}_{2}-{x}_{1})}^{2}+{({y}_{2}-{y}_{1})}^{2} , } l=\mathrm{1,2},\dots ,n, k=\mathrm{1,2},\dots ,m$$

The following equation will then be used to modify the value of \(FK\).

$${FH}_{l}\left(t+1\right)={FH}_{l}\left(t\right)+\left({r}_{1}\times {X}_{b}-{r}_{2}\times {FH}_{n}\left(t\right)\right), l=\mathrm{1,2},\dots ,n$$

where there is one Fire Hawk, \({FH}_{n}\left(t\right)\). \({r}_{1}\) and \({r}_{2}\) are random values found in the range \(<span class='reftype'>[0, 1]</span>\). The safe prey area is then allocated, and this is shown using the formula below to find the safe position (\({SP}_{l}\)) inside the Fire Hawk region [27].

$${SP}_{l}=\frac{\sum_{q=1}^{r}{PR}_{q}}{r}, q=\mathrm{1,2},\dots ,r , l=\mathrm{1,2},\dots ,n$$

The next step involves simulating animal behavior via \(PK\) movement within the \(FH\) zone. This simulation updates the prey’s position as follows:

$${PR}_{q}\left(t+1\right)={PR}_{q}\left(t\right)+\left({r}_{3}\times {FH}_{l}-{r}_{4}\times {SP}_{l}\left(t\right)\right), l=\mathrm{1,2},\dots ,n, q=\mathrm{1,2},\dots ,r$$

After that, the following formula updates the safe location outside the \(l\) th \(FH\).

$$SP=\frac{\sum_{k=1}^{m}{PR}_{k}}{r}, k=\mathrm{1,2},\dots ,m$$

The prey then changes its location based on the calculation below.

$${PR}_{q}\left(t+1\right)={PR}_{q}\left(t\right)+\left({r}_{5}\times {FH}_{a}-{r}_{6}\times SP\left(t\right)\right), l=\mathrm{1,2},\dots ,n, q=\mathrm{1,2},\dots ,r$$

The stop criteria are then checked to see if they have been satisfied. If they have, the best solution is the output of \(FHO\); otherwise, the updating process is repeated [28].

Runge–Kutta optimization (RUN)

The \(RUN\) optimization algorithm is based on the Runge–Kutta method \((RKM),\) which was employed to compute solutions associated with differential equations of the first order. The \(RUN\) algorithm’s mathematical formulation comprises a series of stages, which are elaborated upon below:

  • The initialization stage involves creating the initial solutions for \(N\) agents based on the search space’s boundaries \([LB,UB]\). This is accomplished by employing the subsequent Eq. (11):

$$\begin{array}{c}{Z}_{ij}={LB}_{j}+{r}_{1}\times \left({UB}_{j}-{LB}_{j}\right)\\ i=\mathrm{1,2}\dots ,N,j=\mathrm{1,2},\dots ,P\end{array}$$

The formula takes into account the dimension of the problem, denoted by \(P\), \({LB}_{j}\), and \({UB}_{j}\) signify the lower and upper limits of the \(j{\text{th}}\) variable in the solution set \({Z}_{ij}\), where \(i\) ranges from \(1 to N\), representing the overall quantity of search agents [29].

  • During the solution refinement stage, the \(RUN\) algorithm employs a search mechanism \((SM)\) that utilizes the \(RKM\) to modify the current solution’s position at every iteration [30, 31]. This mechanism is expressed as follows:

$${Z}_{i}= \left\{\begin{array}{c}{Z}_{CF}{+S}_{FM}+\mu \times randn\times {Z}_{mc} , if rand \le 0.5\\ {Z}_{mF}+{S}_{FM}+\mu \times randn\times {Z}_{ra} , otherwise\end{array}\right.$$

In Eq. (11), \({Z}_{CF}=({Z}_{c}+r\times SF\times g\times {Z}_{c})\) and \({S}_{FM}=SF\times SM\). \({Z}_{ra}=({Z}_{r1}-{Z}_{r2})\), \({Z}_{MF}=\left( {Z}_{M}+r\times SF\times g\times {Z}_{m}\right)\), and \({Z}_{MC}=({Z}_{m}-{Z}_{c})\). The integer value \(r\), which lies between − 1 and 1, is utilized to alter the direction of the search process. On the other hand, the symbols \(g\) and \(\mu\) are random numbers ranging from \(0 to 2\) and \(0 to 1\), respectively. The adaptive factor \(SF\) is specified as follows:

$$\begin{array}{c}SF=2\times \left(0.5-rand\right)\times f\\ f=a\times {\text{exp}}\left(-b\times rand\times \left(\frac{t}{tmax}\right)\right)\end{array}$$

The total number of iterations is represented by \(tmax\). The values of \({Z}_{c}\) and \({Z}_{m}\) used in Eq. (14) are defined as follows:

$${Z}_{c}= \varphi \times {Z}_{i}+(1-\varphi )\times {Z}_{r1}$$
$${Z}_{m}= \varphi \times {Z}_{b}+(1-\varphi )\times {Z}_{pb}$$

Equation (15) includes a randomly generated number represented by the \(\varphi ,\) which lies between 0 and 1. Here, \({Z}_{b}\) and \({Z}_{pb}\) denote the best agent at each iteration and the \(best-so-far\) agent, respectively. The \(SM\) parameter mentioned in Eq. (11) is updated using the following formula:

$$\begin{array}{c}{\text{SM}}=\frac{1}{6}\left({Z}_{RK}\right)\Delta Z;\\ {Z}_{RK}= {k}_{1}+2\times {k}_{2}+2\times {k}_{3}+{k}_{4}\\ \begin{array}{c}{{\text{k}}}_{1}=\frac{1}{2\Delta {\text{Z}}}({\text{rand}}\times {{\text{Z}}}_{{\text{w}}}-{\text{u}}\times {{\text{Z}}}_{{\text{b}}})\\ {{\text{k}}}_{2}=\frac{1}{2\Delta {\text{Z}}}\left({\text{rand}}\times {({\text{Z}}}_{{\text{w}}}+{rand}_{1}\times {{\text{k}}}_{1}\times \Delta {\text{Z}}\right)-{\text{UZ}}\\ \begin{array}{c}{{\text{k}}}_{3}=\frac{1}{2\Delta {\text{Z}}}\left({\text{rand}}\times {({\text{Z}}}_{{\text{w}}}+{rand}_{1}\times {(\frac{1}{2}{\text{k}}}_{2})\times \Delta {\text{Z}}\right)-{{\text{UZ}}}_{b}\\ {{\text{k}}}_{4}=\frac{1}{2\Delta {\text{Z}}}\left({\text{rand}}\times {({\text{Z}}}_{{\text{w}}}+{rand}_{1}\times {{\text{k}}}_{3}\times \Delta {\text{Z}}\right)-{{\text{UZ}}}_{b2}\\ \begin{array}{c}u=round(1+rand)\times (1-rand)\\ UZ=(U\times {Z}_{b}+{rand}_{2}\times {k}_{1}\times \Delta Z)\\ \begin{array}{c}{UZ}_{b}=\left(U\times {Z}_{b}+{rand}_{2}\times (\frac{1}{2}{k}_{2})\times \Delta Z\right)\\ {UZ}_{b2}=(U\times {Z}_{b}+{rand}_{2}\times {k}_{3}\times \Delta Z)\end{array}\end{array}\end{array}\end{array}\end{array}$$

The symbols \({rand}_{1}\) and \({rand}_{2}\) represent random numbers. The \(\Delta Z\) value is calculated as follows:

$$\begin{array}{c}\Delta Z=2\times rand\times \left|Stp\right|;\\ Stp=rand\times (\left({Z}_{b}-rand\times {Z}_{avg}\right)+y)\\ y=rand({Z}_{n}-rand\times \left(u-l\right))\times {\text{exp}}(-4\times \frac{t}{tmax})\end{array}$$

The values of \({Z}_{w}\) and \({Z}_{b}\) are updated according to the following equations:





• During the enhanced solution quality stage, various operators are employed to improve the convergence rate and avoid local optima. The objective is to enhance the quality of solutions, which is achieved through the following process:

$$\begin{array}{c}{Z}_{new2}= \left\{\begin{array}{c}{Z}_{new1}+r\times \omega \times \left|\left({Z}_{new1}-{Z}_{avg}\right)+randn\right| , if \omega <1\\ \left({Z}_{new1}\times {Z}_{avg}\right)+r\times \omega \times {Z}_{na} otherwise\end{array}\right.\\ {Z}_{na}=\left|\left(u\times {Z}_{new1}-{Z}_{avg}\right)+randn\right|, c=5\times rand \\ \omega =rand\left(\mathrm{0,2}\right).{\text{exp}}\left(-c\left(\frac{t}{tmax}\right)\right),{Z}_{avg}=\frac{{Z}_{r1}+{Z}_{r2}+{Z}_{r3}}{3}\end{array}$$
$${Z}_{new1}= \delta \times {Z}_{avg}+(1-\delta )\times {Z}_{b}$$

The formula in Eq. (19) involves a random number, which lies between 0 and 1, and an integer number \(r\) that can take on the values of \(1, 0,\) or \(-1\). According to [30], if the fitness value of \({Z}_{new2}\) is not superior to the fitness value of \({Z}_{i}\), then there is another opportunity to update the value of \({Z}_{i}\). This can be achieved by utilizing the subsequent Eq. (20):

$$\begin{array}{c}{Z}_{new3}=\left({Z}_{new2}-{r}_{1}\times {Z}_{new2}\right)+SF\times {D}_{Z}\\ {D}_{Z}=({r}_{2}\times {Z}_{RK}+\left(v\times {Z}_{b}-{Z}_{new2}\right))\end{array}$$

This equation involves a random value \({r}_{1}\), \({r}_{2}\), and \({r}_{3}\). The value of \(v\) is computed as twice the difference of \({r}_{3}\) and \(0.5\), where \({r}_{3}\) is a random number in the range \(<span class='reftype'>[0, 1]</span>\).

Performance evaluation methods

This study introduces several criteria for evaluating hybrid models according to their correlations and error rates. The evaluation metrics looked at include root mean square error (RMSE), mean absolute relative error (MARE), coefficient correlation (R2), mean square error (MSE), and U95. The relevant formulas for each of these metrics are given below. An algorithm that achieves a high R2 value near 1 performs excellently in the three training, validation, and testing phases. In contrast, metrics with lower values, like RMSE and MSE, are preferred because they show that the model has less error.

$${R}^{2}={\left(\frac{{\sum }_{i=1}^{M}\left({p}_{i}-\overline{p }\right)\left({l}_{i}-\overline{l }\right)}{\sqrt{\left[{\sum }_{i=1}^{M}{\left({p}_{i}-p\right)}^{2}\right]\left[{\sum }_{i=1}^{M}{\left({l}_{i}-\overline{l }\right)}^{2}\right]}}\right)}^{2}$$
$$RMSE=\sqrt{\frac{1}{M}{\sum }_{i=1}^{M}{\left({l}_{i}-{p}_{i}\right)}^{2}}$$
$$MARE=\frac{1}{M}\sum_{i}^{M}\frac{\left|{l}_{i}-{p}_{i}\right|}{\left|\overline{l }-\overline{p }\right|}$$
$${U}_{95}=\frac{1.96}{M}\sqrt{ \sum_{i=1}^{M}{\left({l}_{i}-{p}_{i}\right)}^{2}+\sum_{j=1}^{M}{\left({l}_{i}-{p}_{j}\right)}^{2}}$$

Equations (2125) use the variables \(M\) to indicate the number of samples, \({p}_{i}\) to represent the predicted value,\(\overline{p }\) and \(\overline{l }\) to denote the mean predicted and measured values, respectively, and \({l}_{i}\) to indicate the measured value alternatively.

Results and discussion

Findings and detailed explanation for Table 2

Table 2 The result of developed models for KNN

The study employed three distinct models, namely KNN, KNFH, and KNRK, to forecast compressive strength (\({F}_{c}\)) of recycled aggregate concrete. These models underwent comprehensive evaluation across three phases: training, validation, and testing, with careful data partitioning to ensure fairness. The evaluation process incorporated five vital statistical metrics, including R2, RMSE, MARE, U95, and MSE, to facilitate a detailed comparison of model performance. Table 2 shows the results of the developed models, and the comparison between the models is as follows:

  • The primary focus of the evaluation centered on R2 values, which indicate the extent to which the independent variable explains variations in the dependent variable. Notably, the KNFH model demonstrated exceptional predictive accuracy, achieving a superior R2 value of 0.994 during training and consistently outperforming the alternative models. In contrast, the KNN model yielded slightly lower R2 values of 0.977 during training.

  • Furthermore, an in-depth analysis of other error indicators, particularly RMSE, revealed a range spanning from 1.122 to 2.529. Impressively, the KNFH model exhibited the lowest error, while the KNN model exhibited relatively higher errors.

  • During the training phase, the KNFH model displayed the lowest MARE value of 0.028, suggesting its superiority. In contrast, the KNN and KNRK models exhibited higher MARE values of 0.052 and 0.044, respectively.

  • In terms of MSE and U95 during training, the KNFH model also produced the lowest values, with an MSE of 1.259 and a U95 of 3.110. Interestingly, in the training phase, the MSE and U95 values for the KNN model were the highest.

The study’s findings undeniably demonstrated that the KNFH model outperformed the KNN and KNRK models in specific phases. However, when selecting a model for real-world applications, it is vital to consider additional factors such as model complexity, computational efficiency, and ease of implementation. In conclusion, the results provide compelling evidence that FHO optimization successfully enhanced the KNN model’s predictive capabilities in predicting \({F}_{c}\).

Enhanced presentation of figures in the results section

Figure 3 displays a scatter plot that evaluates the performance of hybrid models during three stages: training, validation, and testing. The evaluation is based on two crucial criteria, R2 and RMSE. R2 measures the similarity between predicted and observed values, while RMSE quantifies the prediction error dispersion. The KNFH model’s data points were closely grouped around the central line, indicating its outstanding accuracy across all three phases. The tight clustering between predicted and actual values suggests minimal dispersion and a high level of agreement. On the other hand, the KNRK and KNN models had data points that were more evenly spread around the central line, indicating similar performance levels. However, compared to the KNFH model, this broader dispersion suggests a higher error and somewhat lower accuracy in the KNRK and KNN models.

Fig. 3
figure 3

Plotting the dispersion of evolved hybrid models

In Fig. 4, there is a line plot that compares projected and observed values of \({F}_{c}\) of RAC. This visual representation is divided into three main sections: training, validation, and testing. The accuracy of this representation depends on how closely the projected behavior matches the observed behavior. The KNFH model predicts values slightly higher than actual measurements, causing slight differences in performance between the three phases. The KNN and KNRK models show minimal deviation between projected and measured points but are less precise than the KNFH model, with a significant gap between projected and measured points.

Fig. 4
figure 4

The comparison of predicted and measured values

Figure 5 presents a drop-line plot depicting the error percentages of the models developed in this study. The majority of data points cluster around the 14.96% mark, underscoring KNFH as the model with the lowest error rate. In contrast, both KNN and KNRK exhibit a broader range of error percentages, with a substantial number of values surpassing 37.94% and 19.13%. Notably, the right-skewed distributions of KNN and KNRK highlight data points with significantly higher error percentages. This observation underscores KNFH’s superior accuracy and serves as a visual representation of the error percentage distributions for the developed models.

Fig. 5
figure 5

The error rate percentage for the models is based on the vertical drop line plot

Figure 6 presents a scatter interval plot that effectively illustrates the error percentages associated with the models examined in this study. Notably, KNFH emerges as the top performer, boasting an outstanding mean error rate of 0%. Its error distribution consistently remains below the 10% threshold, and the data displays minimal dispersion, closely resembling a normal distribution curve. In contrast, KNN’s performance is characterized by dispersion across all phases. This model exhibits a more symmetrical and uniform normal distribution, with error percentages not exceeding 25%. The behavior of KNRK stands out due to its unique characteristics. This model showcases the most pronounced and diverse discrepancies among the three. Interestingly, a single outlier datum contributes to over 15% of the dataset, an unusual occurrence in statistical analysis. This further emphasizes the distinct nature of KNFH’s performance.

Fig. 6
figure 6

The scatter interval plot of errors comparison of proposed models


Experimental studies aimed at comprehending the distinct properties of compressive strength (\({F}_{c}\)) of recycled aggregate concrete (RAC) has significantly increased in recent years. Due to its complex and nonlinear nature, it has been challenging to establish a precise correlation between the composition variables and \({F}_{c}\) using conventional statistical methods. The solution to this problem requires a robust and sophisticated methodology that can glean valuable information from the vast amount of experimental data. Such a strategy ought to offer precise estimation methods and perceptions of the complex issues involved in nonlinear materials science. Machine learning (ML), a potent tool capable of revealing hidden patterns within complex datasets, plays a crucial role. With these considerations in mind, this study is dedicated to harnessing the cutting-edge capabilities of ML, particularly the K-nearest neighbors (KNN) model, to predict \({F}_{c}\) of RAC. The foundation of this endeavor rests upon a meticulously curated dataset comprising 441 test experiments and 6 input parameters extracted from an extensive compilation of published literature. To enhance the predictive potential of the KNN model, two meta-heuristic algorithms, namely the Fire Hawk optimizer (FHO) and the Runge–Kutta optimization (RUK), have been seamlessly integrated. The effectiveness and predictive prowess of these models in estimating \({F}_{c}\) of RAC properties are quantified through a range of performance evaluation metrics, which are elaborated upon in a dedicated section. The following vital outcomes emerge from this comprehensive evaluation:

  • Among the proposed models, the KNFH variants demonstrate remarkable outcomes, yielding the highest R2 values. Although the KNN model had a slightly lower R2 score, the difference was negligible. Regarding error rates, KNFH outperforms KNN and KNRK, exhibiting a significant 1.7% reduction. The elevated R2 values and reduced error rates underscore the impressive predictive capabilities of KNFH.

  • Notably, the KNFH model consistently displays the lowest RMSE values across all phases, highlighting its remarkable dependability and accuracy in forecasting \({F}_{c}\). KNFH’s RMSE is noticeably 77% lower than that of the KNN model, clearly demonstrating the model’s improved prediction accuracy.

The findings unequivocally establish KNFH as the superior performer, outshining KNN and earning the top model accolade in this study due to its exceptional performance.

Availability of data and materials

Data can be shared upon request.


  1. Shah HA et al (2022) Application of machine learning techniques for predicting compressive, splitting tensile, and flexural strengths of concrete with metakaolin. Materials 15(15):5435.

    Article  Google Scholar 

  2. Shi H, Xu B, Zhou X (2009) Influence of mineral admixtures on compressive strength, gas permeability and carbonation of high performance concrete. Constr Build Mater 23(5):1980–1985.

    Article  Google Scholar 

  3. Morel J-C, Pkla A, Walker P (2007) Compressive strength testing of compressed earth blocks. Constr Build Mater 21(2):303–309

    Article  Google Scholar 

  4. Moutassem F, Chidiac SE (2016) Assessment of concrete compressive strength prediction models. KSCE J Civ Eng 20:343–358

    Article  Google Scholar 

  5. Ni H-G, Wang J-Z (2000) Prediction of compressive strength of concrete by neural networks. Cem Concr Res 30(8):1245–1250.

    Article  Google Scholar 

  6. Sadowski Ł, Nikoo M, Nikoo M (2018) Concrete compressive strength prediction using the imperialist competitive algorithm. Computers and Concrete, An International Journal 22(4):355–363

    Google Scholar 

  7. Nikoo M, Torabian Moghadam F, and Sadowski L (2015) Prediction of concrete compressive strength by evolutionary artificial neural networks, Adv Mat Sci Eng, vol. 2015.

  8. Asteris PG, Skentou AD, Bardhan A, Samui P, Pilakoutas K (2021) Predicting concrete compressive strength using hybrid ensembling of surrogate machine learning models. Cem Concr Res 145:106449

    Article  Google Scholar 

  9. Duan Z-H, Kou S-C, Poon CS (2013) Prediction of compressive strength of recycled aggregate concrete using artificial neural networks. Constr Build Mater 40:1200–1206.

    Article  Google Scholar 

  10. Mousavi SM, Aminian P, Gandomi AH, Alavi AH, Bolandi H (2012) A new predictive model for compressive strength of HPC using gene expression programming. Adv Eng Softw 45(1):105–114

    Article  Google Scholar 

  11. Folino P, Xargay H (2014) Recycled aggregate concrete–mechanical behavior under uniaxial and triaxial compression. Constr Build Mater 56:21–31.

    Article  Google Scholar 

  12. Shi C, Li Y, Zhang J, Li W, Chong L, Xie Z (2016) Performance enhancement of recycled concrete aggregate–a review. J Clean Prod 112:466–472

    Article  Google Scholar 

  13. Wardeh G, Ghorbel E, Gomart H (2015) Mix design and properties of recycled aggregate concretes: applicability of Eurocode 2. Int J Concr Struct Mater 9:1–20

    Article  Google Scholar 

  14. Lovato PS, Possan E, Dal Molin DCC, Masuero ÂB, Ribeiro JLD (2012) Modeling of mechanical properties and durability of recycled aggregate concretes. Constr Build Mater 26(1):437–447

    Article  Google Scholar 

  15. Duan ZH, Poon CS (2014) Properties of recycled aggregate concrete made with recycled aggregates with different amounts of old adhered mortars. Mater Des 58:19–29.

    Article  Google Scholar 

  16. Xu JJ, Zhao XY, Chen ZP, Liu JC, Xue JY, Elchalakani M (2019) Novel prediction models for composite elastic modulus of circular recycled aggregate concrete-filled steel tubes. Thin-Walled Structures 144:106317

    Article  Google Scholar 

  17. Zhou ZH (2021) Machine learning. Springer Nature.

  18. Wang H, Lei Z, Zhang X, Zhou B (2016) J. Peng, Machine learning basics, Deep learning, pp 98–164

    Google Scholar 

  19. Ceryan N, Okkan U, Kesimal A (2013) Prediction of unconfined compressive strength of carbonate rocks using artificial neural networks. Environ Earth Sci 68:807–819

    Article  Google Scholar 

  20. Akbulut S, Kalkan E, and Celik S (2003) Artificial neural networks to estimate the shear strength of compacted soil samples, in Int Conf New Dev Soil Mech Geotech Eng pp. 285–290.

  21. Sahoo K, Sarkar P, and Robin Davis P (2016) Artificial neural networks for prediction of compressive strength of recycled aggregate concrete.

  22. Golafshani EM, Behnood A (2018) Automatic regression methods for formulation of elastic modulus of recycled aggregate concrete. Appl Soft Comput 64:377–400.

    Article  Google Scholar 

  23. Xiong L, Yao Y (2021) Study on an adaptive thermal comfort model with K-nearest-neighbors (KNN) algorithm. Build Environ 202:108026

    Article  Google Scholar 

  24. Uddin S, Haque I, Lu H, Moni MA, Gide E (2022) Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci Rep 12(1):6256

    Article  Google Scholar 

  25. Abu Alfeilat HA, et al. (2019) Effects of distance measure choice on k-nearest neighbor classifier performance: a review, Big Data  7:221–248.

  26. Azizi M, Talatahari S, Gandomi AH (2023) Fire Hawk optimizer: a novel metaheuristic algorithm. Artif Intell Rev 56(1):287–363

    Article  Google Scholar 

  27. Shishehgarkhaneh MB, Azizi M, Basiri M, Moehler RC (2022) BIM-based resource tradeoff in project scheduling using fire hawk optimizer (FHO). Buildings 12(9):1472

    Article  Google Scholar 

  28. Hosseinzadeh M et al (2023) A cluster-based trusted routing method using fire hawk optimizer (FHO) in wireless sensor networks (WSNs). Sci Rep 13(1):13046

    Article  Google Scholar 

  29. Chen H, Ahmadianfar I, Liang G, Bakhsizadeh H, Azad B, Chu X (2022) A successful candidate strategy with Runge-Kutta optimization for multi-hydropower reservoir optimization. Expert Syst Appl 209:118383

    Article  Google Scholar 

  30. Ahmadianfar I, Heidari AA, Gandomi AH, Chu X, Chen H (2021) RUN beyond the metaphor: an efficient optimization algorithm based on Runge Kutta method. Expert Syst Appl 181:115079

    Article  Google Scholar 

  31. Yousri D et al (2022) Modified interactive algorithm based on Runge Kutta optimizer for photovoltaic modeling: justification under partial shading and varied temperature conditions. IEEE Access 10:20793–20815

    Article  Google Scholar 

Download references


I would like to take this opportunity to acknowledge that there are no individuals or organizations that require acknowledgment for their contributions to this work.


This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations



All authors contributed to the study’s conception and design. Data collection, simulation, and analysis were performed by “Min Duan”.

Corresponding author

Correspondence to Min Duan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duan, M. Innovative compressive strength prediction for recycled aggregate/concrete using K-nearest neighbors and meta-heuristic optimization approaches. J. Eng. Appl. Sci. 71, 15 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: