Skip to main content

Research on road surface crack detection based on SegNet network


To enhance the precision and reliability of road crack detection, this study introduces an innovative neural network architecture. Strategies were implemented to effectively address the issue of overfitting resulting from the intricacy of the proposed SegCrackNet. Dropout layers, multi-level output fusion, and T-bridge block structures are employed in the network. This optimization allows for a more comprehensive exploitation of contextual information, demonstrating its instrumental role in the efficient detection of subtle variations. Experimental findings clearly demonstrate substantial improvements when compared to other network models. On the Crack500, Crack200, and pavement images datasets, remarkable enhancements in the average Intersection over Union (IoU) scores were observed, with increases of 4.3%, 9.4%, and 3.7%, respectively.


In the realm of road crack detection, machine vision technology has significantly advanced [1]. Through the analysis of images and video data [2], machine vision systems demonstrate a remarkable ability to detect and identify indicators of road crack [3]. These systems assume a central role in monitoring and early warning, thereby contributing to the mitigation of potential road crack disaster risks and the protection of human lives and property [4]. Conventional techniques for detecting pavement cracks encompass those rooted in wavelet transformation [5], image thresholding [6, 7], and minimum-path methodologies [8]. To offer a more precise portrayal of crack intensity and characteristics, certain approaches have introduced concepts such as free-form anisotropy [9] and morphological filters [10, 11]. Furthermore, Ayenu-Prah employed a combination of two-dimensional empirical mode decomposition and Sobel edge detectors for collaborative crack detection [12]. By harnessing convolutional neural networks for the analysis of acquired pavement crack images [13], a more effective treatment of surface details is achieved, leading to enhanced accuracy in distinguishing surface cracks. Shi’s research team employed random forest techniques to smooth cracks, effectively mitigating the impact of speckle noise [14]. In a complementary vein, Xie and colleagues incorporated feature pyramid techniques in conjunction with the HED (holistically nested edge detection) network [15]. They employed a multistage refining algorithm to adjust weight distribution from the top to the bottom layers, resulting in improved recognition of challenging samples. Furthermore, the utilization of encoder-decoder structures has proven highly beneficial in addressing issues related to pavement cracks, effectively resolving Seq2Seq problems. Zou fellow researchers harnessed the SegNet encoder-decoder architecture to construct a deep parsing network [16], cleverly fusing the convolutional features generated by encoder and decoder networks, giving rise to the DeepCrack model. Schmugge and his team treated crack detection as a crucial segmentation task, employing deep learning techniques to segment each pixel into different crack and background categories [17]. Additionally, Zhang and scholars applied generative adversarial networks to crack detection [18], enhancing the UNet [19] model. This approach cannot only handle large-scale crack images but also address the “all-black” issue by implementing an asymmetric UNet structure. Yusof and researchers adopted a classification approach for pavement crack detection, introducing deep neural networks (DNN) for pixel-level classification of crack images [20]. Li and their team achieved a significant enhancement in detection accuracy across three datasets [21]. Their proposed network facilitates pixel-level unsupervised and reliable fusion of pavement crack detection. To further augment network performance [22], this study introduces a novel structure of the feature pyramid, which more effectively captures complex crack edge features, thereby improving overall network performance. However, there are still many challenges in the detection of pavement cracks due to the physical morphology of cracks, such as uneven thickness, discontinuous distribution and complex detection of edge features, and the detection accuracy needs to be further improved.

To enhance the precision and reliability of road crack detection, a high-precision convolutional neural network for pavement crack images is presented in this paper. Our approach innovatively centers on developing a crack detection model utilizing the SegNet [23] network architecture. By strategically integrating dropout layers [24], optimizing receptive field [25] balance, and implementing multi-level output fusion [26] techniques, an advanced SegNet network was engineered, significantly enhancing the effectiveness of crack detection. The model demonstrates exceptional generalization capabilities, attaining remarkable accuracy in detection and exhibiting robust resistance to interference. Furthermore, it excels in handling diverse complex scenarios, enabling swift and accurate crack identification. The results from our experiments unequivocally establish that under identical conditions, our proposed methodology outperforms the other three methods to a significant degree.


SegNet is a widely recognized fully convolutional neural network extensively used for pixel-level image segmentation tasks. To enhance the precision and reliability of road crack detection, this paper harnesses the SegNet architecture as the foundational framework for a fully convolutional neural network named SegCrackNet. This network is intricately designed to manage complex datasets. The SegCrackNet architecture consists of an encoder, a bridge unit, and a decoder, depicted in Fig. 1. Its key strength resides in efficiently managing complex data while fully preserving the original information.

Fig. 1
figure 1

SegCrackNet model architecture

Initially, the encoder network processes crack images to extract crucial feature information. Subsequently, after a series of convolution and pooling operations, the feature maps generated by the encoder are directed to the bridge unit. The primary function of the bridge unit is to preserve image resolution while expanding the perceptual field. This enhances crack detection accuracy and enables a deeper comprehension of contextual information. Subsequently, these feature maps are directed to the decoder network. Decoding operations in the decoder network combines feature information from various hierarchical levels, culminating in the final feature representation.

The bridge block

In the Deeplabv2 semantic segmentation model, the ASPP (atrous spatial pyramid pooling) network, introduced in [27], utilizes dilated convolutions with varying dilation rates to capture multi-scale object information. Features from diverse scales are extracted separately and subsequently interconnected and merged through convolutional layers. This method enhances the receptive field size while minimizing resolution loss.

However, addressing the specific challenge of detecting narrow cracks, the utilization of ASPP introduces numerous high-dilation-rate dilated convolutions, indicating that capturing multi-scale information comprehensively falls short of the intended exhaustiveness. To tackle this issue and improve image resolution, this study expands upon the DenseASPP [28] approach and embraces a dense connection methodology, depicted in Fig. 2. This approach encompasses a wider range of receptive fields and integrates dilated convolutions with varying dilation rates within the bridge unit. This, in turn, alleviates the scarcity of the multi-scale perspective, leading to an enhanced image resolution.

Fig. 2
figure 2

Internal convolution in bridge unit

Dilated convolutions expand the convolutional kernel by introducing zero padding between its elements. If we denote \(\alpha\) as the dilation factor for dilated convolutions and k as the size of the initial convolutional kernel, Eq. 1 expresses the relationship between the resulting kernel size after dilation and the original kernel size.

$$k = k + (k - 1)(a - 1)$$

Within the bridge unit, the network structure incorporates a series of 3 × 3 convolutional layers with \(\alpha\) taking on values of 1, 2, 3, and 4, each corresponding to distinct convolutional operations: when \(\alpha\) is set to 1, it signifies a standard convolution operation. When \(\alpha\) is assigned the value of 2, it performs a convolution with a dilation factor of 2, resulting in a convolutional kernel of approximately 5 × 5 in size. When \(\alpha\) is defined as 3, it involves convolution with two dilation steps, generating a convolutional kernel of approximately 7 × 7 in dimensions. Furthermore, 1 × 3 and 3 × 1 convolutions are integrated with the dilated convolution in the latter part to enhance the network’s nonlinear fitting capacity and reduce the output channels. This collective configuration collectively shapes the network structure, as illustrated in Fig. 2.

The calculation method for the receptive field of the current layer is defined as follows, as shown in Eq. (2).

$${R_i} = ({R_{i - 1}} - 1) \times {S_i} + {a_i}({K_i} - 1) + 1$$

In the equation, \({R_i}\) represents the receptive field, \({S_i}\) represents the stride, \({a_i}\) represents the dilation rate, and \({K_i}\) represents the size of the convolutional kernel. The specific size of the receptive field is calculated based on the network’s structure and parameters. This formula is used to help determine the perceptual range of the current layer’s neurons regarding the input image. When \({a_i}\) equals 1, dilated convolution is equivalent to regular convolution, and the receptive field formula is as shown in Eq. (3).

$${R_i} = ({R_{i - 1}} - 1) \times {S_i} + {K_i}$$

By employing Eq. (2) and Eq. (3), data regarding the receptive field size for the dense connection segment can be acquired. The precise numerical values can be found in Table 1.

Table 1 Receptive fields of each layer

Based on the data in Table 1, it is evident that using dense connections can theoretically extend the width of the receptive field to 21 × 21. However, an excessively large receptive field might result in feature loss. Thus, based on experimental observations, receptive field sizes linked to dilation rates of 1, 2, 3, and 4 prove more appropriate as they sustain the perceptual range without causing excessive information loss. This selection aids in achieving a favorable balance between receptive field size and network performance.

Multi-level output fusion

In the ever-evolving landscape of technology, many fully convolutional neural networks increasingly rely solely on the output of the final layer to determine the detection outcome. Yet, as information flows from high-level to low-level layers, there is a potential risk of losing crucial features. Additionally, differences in lighting conditions among images and variations in the count of crack targets versus background pixels can exacerbate the vanishing gradient problem. This may lead to slower model training and compromise its overall performance.

To address this challenge, our approach is inspired by the enhancement principles outlined in HED. This method calculates a loss function at the final convolutional layer of each stage within the decoder. The strength of this approach lies in its ability to gather multi-scale and multi-level feature information. SegCrackNet integrates output feature data from all decoder layers to form the ultimate feature representation. Additionally, each unit has a varying receptive field size, enhancing a more focused consolidation of feature information in the final representation.

The network structure depicted in Fig. 3 employs a multi-level output fusion strategy. In this configuration, the output of each unit incorporates 1 × 1 convolutional layers designed to minimize the propagation distance of feature information. Subsequently, the resulting probability maps are resized to align with the dimensions of the input image and then merged. Finally, the ultimate output is generated using a 1 × 1 convolutional kernel. This approach efficiently preserves essential edge information in the image without excessive data sampling, achieving a delicate balance between speed and accuracy. For multi-level output fusion, both the channels of the convolution and deconvolution operations are configured to 1 to maintain computational efficiency.

Fig. 3
figure 3

Multi-level output fusion network architecture

The dropout layer

SegNet comprises two versions, with the alternative version referred to as Bayesian SegNet, depicted in Fig. 4. Both networks display similar overall architectures, as evident from the figure. However, in terms of network structure, Bayesian SegNet improves upon the SegNet model by integrating a dropout layer with a dropout probability of 0.5 in both the encoding and decoding modules. This addition aims to mitigate overfitting of the model's weights, thereby strengthening the network’s learning capacity. The rest of the structure remains unchanged.

Fig. 4
figure 4

Bayesian SegNet network architecture

In this study, inspired by the Bayesian SegNet network, we incorporate a dropout layer with a probability of 0.3 into the pooling index process of SegNet. This addition aims to alleviate network overfitting, thereby enhancing the network’s ability to generalize.

Network training

Training and testing datasets

Considering varying resolutions in the Crack500, Crack200, and pavement images dataset, it is recommended to standardize image resolutions to 480 × 320 for enhanced model training and evaluation. Accordingly, the practical approach of resizing all images to a consistent 480 × 320 resolution is adopted to simulate real-world scenarios. Moreover, data preprocessing involves techniques such as rotation and cropping, which mitigate overfitting and prevent the model from becoming overly adapted to the training data. Data preprocessing is crucial for enabling the model to reliably perform geological disaster detection across various resolutions and perspectives. The standardization of resolutions and the application of data augmentation methods reinforce the model’s robustness, enabling effective generalization across a broader range of real-world scenarios. This is crucial for accurate and stable geological disaster detection.

The learning rate

Choosing an appropriate learning rate is undeniably critical for successful model training. Setting it excessively high or low can lead to problems like exploding gradients or slow convergence. Implementing a learning rate schedule with a decaying rate, as proposed in “decay curve formula 1,” offers a wise solution to tackle this issue.

Gradually reducing the learning rate during training enhances the model’s convergence. Typically, the initial learning rate is set high to enable swift convergence in the early training phases. Then, it is methodically reduced to refine the model’s parameters and prevent overshooting. This method achieves a balance between swift convergence initially and precise parameter adjustments as training advances. This process significantly mitigates common issues in learning rate selection, fostering a more stable and effective training process.

$$new\_lr = lr * \frac{1}{1 + lr * decay\_step}$$

Our introduced learning rate update formula is a widely used method for dynamically reducing the learning rate throughout training [29]. Let us dissect its components: new_lr signifies the updated learning rate post-decay, lr stands for the current learning rate, and decay_rate is fixed at 0.98, indicating the speed of learning rate reduction. Decay_rate indicates the speed of learning rate reduction over time, while decay_step defines the intervals for updating the learning rate.

The loss function

Choosing the Tversky loss function proves to be a prudent strategy for handling imbalanced data, providing an effective means to balance performance in such situations. This function is tailored for scenarios with uneven class distributions, enabling us to prioritize either precision or recall based on the specific problem requirements.

$${\text{Loss}}\left({\text{t}},{\text{p}}\right)={\alpha }^{*}Tversky\left(t,p\right)+{\left(1-a\right)}^{*}\left({L}_{1}\left(t,\right)\right)$$

Our presented experimental results in Table 1 offer valuable insights into the relationship between the loss function and model performance. Within these experiments, “loss” represents the current loss function value, where “t” signifies the target (ground truth) and “p” signifies the model’s prediction. Analyzing these outcomes improves our understanding of our model’s efficiency and how the chosen loss function influences its performance.

Table 2 demonstrates the fine-tuning of the mean Intersection over Union (MIoU) concerning the baseline SegNet framework, aiming for optimal results. These outcomes are critical for evaluating and refining the model, aiding in making informed decisions about the model’s architecture, training parameters, and the selected loss function, all aimed at achieving the desired performance for the specific problem. Specifically, the most favorable outcomes were observed when the MIoU value reached 0.7, highlighting that an MIoU of 0.7 serves as a benchmark for exceptional performance and reliably evaluates the model’s quality. Establishing an MIoU threshold of 0.7 sets a performance benchmark for the model and serves as a valuable standard to evaluate model performance across various datasets or model variations.

Table 2 MIoU values for different \(\alpha\) in SegNet

Detection accuracy evaluation metrics

This paper adopts standard segmentation and edge detection metrics to evaluate road crack detection performance. These metrics include precision, F1 score, recall, and Mean Intersection over Union (MIoU). They are valuable tools for assessing the model’s performance from multiple perspectives, covering accuracy, recall, and edge detection precision. Precision and recall provide insights into road crack identification accuracy, while MIoU quantifies edge detection quality. Using this metric combination enables a comprehensive evaluation of the model’s performance in road crack detection.

Results and discussion

All experiments in this research were conducted using the PyTorch deep learning framework within a 64-bit Windows 10 environment. The experiments were performed on a server equipped with an Intel E5-2650 v4 processor (2.20 GHz), 80 GB RAM, and an Nvidia GeForce GTX 1080 Ti (11 GB) graphics card. The software environment utilized Python 3.7, and PyCharm served as the coding environment. These hardware and software configurations provided the necessary resources for conducting deep learning experiments.

In this paper, SegNet served as the fundamental network, and we conducted comparative experiments comparing it with U-Net, ResUNet, the DeepCrack network, and the newly introduced network. U-Net, characterized by a U-shaped architecture, shares structural similarities with the baseline network in this paper. ResUNet, on the other hand, is a U-Net model designed as a complete residual network aiming for a balance between speed and performance, with six residual learning units and two convolutional layers. DeepCrack, a recently introduced crack detection model, is based on SegNet’s foundational structure, integrating multi-scale features from various layers in both the encoder and decoder, leading to favorable detection outcomes. Additionally, ablative experiments evaluated the performance enhancements of these three models. These comparisons enhance our understanding of diverse network architectures’ performance in road crack detection tasks.

Experimental platform and parameter configuration

Using mini-batch stochastic gradient descent (SGD), we utilized a batch size of 6 and fine-tuned the initial learning rate to 0.04. The model underwent extensive training for 1000 epochs. After this rigorous training, we comprehensively analyzed and evaluated the proposed convolutional neural network using three distinct road crack detection image datasets: Crack200, Crack500, and pavement images. This comparative analysis aims to assess the model’s performance across a range of diverse datasets.

The Crack200 dataset includes 206 RGB pavement images with a size of 800 × 600. Among them, 20 images were tested, and the remaining 186 images were used as the training set. The Crack500 dataset collects 500 datasets of pavement cracks with a size of about 2000 × 1500 pixels. Due to the size, quality, usability, and other reasons of the images, the researchers carefully processed each image, divided it into 16 independent spaces, and only saved the part of 1000 cracks, so that the training dataset of crack500 was greatly expanded, reaching 1896 images, while 348 images were used for experiments to obtain more accurate results, and 1124 images were used for experiments. It consists of 250 training charts, 50 test charts, and 200 test charts. The pavement images dataset is a total of 7237 pavement images in the dataset. A total of 5789 images in the dataset were used for training, and 1448 images were used for testing.

Quantitative analysis of test results

The experimental results across the three datasets are thoroughly documented in Tables 3, 4, 5 and 6. These tables detail the average Intersection over Union (IoU), precision, recall, and F1 score for different models. This data exhaustively evaluates the performance of various models, enabling a comprehensive comparison to understand their efficacy in road crack detection tasks.

Table 3 MIoU metrics on three datasets
Table 4 Performance metrics on the Crack500 dataset
Table 5 Performance metrics on the Crack200 dataset
Table 6 Performance metrics on the pavement images dataset

Table 3 presents a quantitative analysis using the mean Intersection over Union (MIoU) as the metric. Compared to U-Net, the proposed model showed significant MIoU improvements of 0.043, 0.094, and 0.037 on the Crack500, Crack200, and pavement images dataset, respectively. Compared to ResUNet, the improvements were 0.079, 0.098, and 0.049. For DeepCrack, the enhancements were 0.090, 0.056, and 0.043, respectively. These results highlight the significant performance improvement achieved by the proposed model across all three datasets.

Table 4 presents the quantitative analysis results for the network from this paper on the Crack500 dataset. Compared to U-Net, the precision (Pr), recall (Re), and F1 score improved by 1.22%, 1.41%, and 1.32%, respectively. Compared to ResUNet, there were improvements of 1.22%, 1.41%, and 1.62% in these metrics. Particularly noteworthy is that against DeepCrack, there were even more substantial improvements, with increases of 2.42%, 2.81%, and 2.61%, respectively. These results highlight significant performance improvements in precision, recall, and F1 score for the proposed model on the Crack500 dataset.

Table 5 exhibits the quantitative analysis results for the network in this paper on the Crack200 dataset. Compared to U-Net, precision (Pr), recall (Re), and F1 score showed substantial improvements of 4.84%, 4.99%, and 4.91%, respectively. Against the ResUNet network, these metrics showed even more significant enhancements at 8.99%, 9.19%, and 9.10%. Furthermore, compared to the DeepCrack network, there were improvements of 1.88%, 1.72%, and 1.81%, respectively. These outcomes underscore significant performance enhancements in precision, recall, and F1 score for the proposed model on the Crack200 dataset.

Table 6 offers an extensive quantitative analysis of this study on the pavement image dataset. Compared to U-Net, our method showed considerable improvements in accuracy, recall, and F1 score, with increases of 3.18%, 3.10%, and 3.11%, respectively. Against ResUNet, these metrics showed substantial enhancements, recording improvements of 12.07%, 10.70%, and 11.43%. Additionally, compared to the DeepCrack network, improvements of 2.70%, 3.24%, and 2.97% were observed. These findings highlight significant performance enhancements in accuracy, recall, and F1 score for our method on the Pavement image dataset.

The reason why DeepCrack performs better on the Crack200 but worse on the Crack500 is that the crack data in the Crack500 dataset exhibits greater edge roughness, so they are also coarser at the edges, rather than a smooth distribution. This indicates that the designed SegCrackNet network has stronger feature extraction performance for crack images with rough edges.

Ablation experiments

Table 7 displays the outcomes of ablation experiments performed using SegCrackNet on the pavement images dataset. The fundamental network architecture is based on the SegNet model. During these experiments, integrating dropout layers within the skip connections improved model performance, resulting in a noteworthy 1.27% increase in Mean Intersection over Union (MIoU). Additionally, combining receptive field balancing with dropout layers led to a remarkable 6.69% enhancement in MIoU. These findings confirm the pivotal significance of thoughtfully designed network receptive fields in improving model performance.

Table 7 Ablation experiment metrics on the pavement images dataset

The importance of network receptive fields was notably highlighted through the implementation of multi-level output fusion techniques. Among these techniques, the use of dropout layers demonstrated the most commendable performance. Additionally, using receptive field-balanced networks yielded the most promising results, demonstrating improvements in MIoU and F1 score by 4.13% and 3.16%, respectively. These findings highlight the significant importance of receptive fields and multi-level output fusion in enhancing the effectiveness of models intended for road crack detection.

Network test results

Figures 5, 6, and Fig. 7 visually depict the results of the ablation experiments, where subfigures (a) to (d) represent SegCrackNet, SegCrackNet with dropout layers, SegCrackNet with bridge units, and SegCrackNet enhanced with multi-level output fusion techniques, respectively. A comparative analysis of these detection result images emphasizes the clear superiority of the network model developed in this paper, showcasing improved accuracy in crack detection and superior generalization capability. These results offer additional evidence of the effectiveness of our approach, particularly regarding its performance on the pavement images dataset.

Fig. 5
figure 5

Test results on the Crack500 dataset

Fig. 6
figure 6

Test results on the Crack200 dataset

Fig. 7
figure 7

Test results on the pavement images dataset


This manuscript presents an innovative neural network architecture meticulously designed to enhance the precision and reliability of road crack detection. Combined with the characteristics of the crack image itself, the receptive field of the network is increased to a certain extent by introducing dilated convolutions with different expansion rates for dense connections, and the network cannot only pay attention to subtle changes but also effectively use contextual information. In addition, the addition of multilayer output fusion technology also helps to identify objects of various sizes more accurately, thereby improving the segmentation effect of the network. In the subsequent research, a dataset of specific surface materials in a defined area under environmental conditions or structural changes will be constructed and then trained, thereby expanding the versatility of specific surface materials in a limited area and improving their generalization ability, potentially playing a pivotal role in the field of road crack detection.

Availability of data and materials

The data used to support the findings of this study can be shared upon request.


  1. D. Yin, B. Zhang, J. Yan, Y. Luo, T. Zhou, and J. Qin, “CoWNet: a correlation weighted network for geological hazard detection,” Knowledge-Based Systems, p. 110684, 2023.

  2. Cao J, Zhang Z, Du J, Zhang L, Song Y, Sun G (2020) Multi-geohazards susceptibility mapping based on machine learning—a case study in Jiuzhaigou, China. Nat Hazards 102:851–871

    Article  Google Scholar 

  3. Ma J et al (2022) A comprehensive comparison among metaheuristics (MHs) for geohazard modeling using machine learning: insights from a case study of landslide displacement prediction. Eng Appl Artif Intell 114:105150

    Article  MathSciNet  Google Scholar 

  4. Ma Z, Mei G (2021) Deep learning for geological hazards analysis: data, models, applications, and opportunities. Earth Sci Rev 223:103858

    Article  Google Scholar 

  5. Ranjbar S, Nejad FM, Zakeri H (2021) An image-based system for pavement crack evaluation using transfer learning and wavelet transform. Int J Pavement Res Technol 14:437–449

    Article  Google Scholar 

  6. Wang W et al (2019) Pavement crack image acquisition methods and crack extraction algorithms: a review. J Traffic Transportation Eng (English Edition) 6(6):535–556

    Article  MathSciNet  Google Scholar 

  7. Hoang N-D, Nguyen Q-L (2019) A novel method for asphalt pavement crack classification based on image processing and machine learning. Eng Comput 35:487–498

    Article  Google Scholar 

  8. R. Salini, B. Xu, and P. Paplauskas, “Pavement distress detection with picucha methodology for area-scan cameras and dark images,” Stavebni obzor-Civil Engineering Journal, vol. 26, no. 1, 2017.

  9. S. Bhat, S. Naik, M. Gaonkar, P. Sawant, S. Aswale, and P. Shetgaonkar, “A survey on road crack detection techniques,” in 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), 2020: IEEE, pp. 1–6.

  10. Cubero-Fernandez A, Rodriguez-Lozano FJ, Villatoro R, Olivares J, Palomares JM (2017) Efficient pavement crack detection and classification. EURASIP J Image Video Process 2017:1–11

    Article  Google Scholar 

  11. Sun B-C, Qiu Y-J (2007) Automatic identification of pavement cracks using mathematic morphology. Int Conf Transportation Eng 2007:1783–1788

    Article  MathSciNet  Google Scholar 

  12. Ayenu-Prah A, Attoh-Okine N (2008) Evaluating pavement cracks with bidimensional empirical mode decomposition. EURASIP J Adv Signal Process 2008:1–7

    Article  Google Scholar 

  13. Hsieh Y-A, Tsai YJ (2020) Machine learning for crack detection: review and model performance comparison. J Comput Civ Eng 34(5):04020038

    Article  Google Scholar 

  14. Shi Y, Cui L, Qi Z, Meng F, Chen Z (2016) Automatic road crack detection using random structured forests. IEEE Trans Intell Transp Syst 17(12):3434–3445

    Article  Google Scholar 

  15. S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1395–1403.

  16. Zou Q, Zhang Z, Li Q, Qi X, Wang Q, Wang S (2018) Deepcrack: learning hierarchical convolutional features for crack detection. IEEE Trans Image Process 28(3):1498–1512

    Article  ADS  MathSciNet  Google Scholar 

  17. S. J. Schmugge, L. Rice, J. Lindberg, R. Grizziy, C. Joffey, and M. C. Shin, “Crack segmentation by leveraging multiple frames of varying illumination,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 2017: IEEE, pp. 1045–1053.

  18. Zhang K, Zhang Y, Cheng H-D (2020) CrackGAN: pavement crack detection using partially accurate ground truths based on generative adversarial learning. IEEE Trans Intell Transp Syst 22(2):1306–1319

    Article  Google Scholar 

  19. H. Huang et al., “Unet 3+: a full-scale connected unet for medical image segmentation,” in ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2020: IEEE, pp. 1055–1059.

  20. N. Yusof et al., “Deep convolution neural network for crack detection on asphalt pavement,” in Journal of Physics: Conference Series, 2019, vol. 1349, no. 1: IOP Publishing, p. 012020.

  21. Li H, Song D, Liu Y, Li B (2018) Automatic pavement crack detection by multi-scale image fusion. IEEE Trans Intell Transp Syst 20(6):2025–2036

    Article  Google Scholar 

  22. Zhang A et al (2017) Automated pixel-level pavement crack detection on 3D asphalt surfaces using a deep-learning network. Comput-Aided Civil Infrastructure Eng 32(10):805–819

    Article  Google Scholar 

  23. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  PubMed  Google Scholar 

  24. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  Google Scholar 

  25. W. Luo, Y. Li, R. Urtasun, and R. Zemel, “Understanding the effective receptive field in deep convolutional neural networks,” Advances in neural information processing systems, vol. 29, 2016.

  26. Wang G, Zhang N, Liu W, Chen H, Xie Y (2022) MFST: a multi-level fusion network for remote sensing scene classification. IEEE Geosci Remote Sens Lett 19:1–5

    CAS  Google Scholar 

  27. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  PubMed  Google Scholar 

  28. Fan L, Wang W-C, Zha F, Yan J (2018) Exploring new backbone and attention module for semantic segmentation in street scenes. IEEE Access 6:71566–71580

    Article  Google Scholar 

  29. Wu X et al (2023) An end-to-end multiple side-outputs fusion deep supervision network based remote sensing image change detection algorithm. Signal Process 213:109203

    Article  Google Scholar 

Download references


Not applicable.


This work was supported in part by the New Engineering Research and Practice Project under contract 2023-LGYXGK-09.

Author information

Authors and Affiliations



CG proposed the concept and design scheme and wrote the original draft. WG analyzed the experimental data and managed the project. DZ was a major contributor in writing the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Cunge Guo.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, C., Gao, W. & Zhou, D. Research on road surface crack detection based on SegNet network. J. Eng. Appl. Sci. 71, 54 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: