A learning-based approach to fault detection and fault-tolerant control of permanent magnet DC motors

In the context of Industry 4.0, which prioritizes intelligent and efficient solutions for industrial systems, this paper introduces an innovative methodology for fault detection and fault-tolerant control of DC motors. Leveraging the capabilities of machine learning and reinforcement learning, our approach aims to achieve optimal performance while maintaining a low computational burden. At the heart of our strategy lies a reinforcement learning-enhanced proportional-integral controller meticulously designed for precise positioning of DC motors. Through extensive comparative analysis, we establish the superiority of this controller in terms of precision, efficiency, and user accessibility when compared to traditional techniques. To ensure robust fault detection, we synergize a model-based observer with Mahalanobis distance-based outlier analysis, creating a swift and accurate diagnostic method for sensor faults. In cases of sensor malfunctions, an internal model-based control strategy comes into play, enabling the system to uphold its effectiveness despite disruptions. The effectiveness of our proposed methods is vividly demonstrated through simulations in the MATLAB environment, utilizing a DC motor subjected to sensor failures. The results unequivocally highlight the advantages of our approach, showcasing improved precision, faster operation, cost-effectiveness, and streamlined simplicity. As such, our approach finds suitability for industrial applications. In our quest to strike a delicate balance between performance and complexity, our techniques are purposefully crafted to provide intelligent yet pragmatic solutions that promote reliability, safety, and sustainability. This paper contributes to the evolving landscape of intelligent industrial solutions by offering a comprehensive framework that optimizes performance while minimizing complexity and costs. In doing so, we lay the foundation for a more efficient and resilient industrial ecosystem.


Introduction
The fourth industrial revolution has ushered in a transformative era in industrial control, marked by the integration of artificial intelligence (AI).This integration not only empowers machines with autonomous decision-making but also enhances the adaptability and flexibility of industrial control systems.By leveraging AI algorithms and machine learning techniques, these systems achieve improved efficiency, reduced human errors, and optimized performance.The copious data generated by industrial processes are effectively harnessed by AI for tasks such as performance optimization, early anomaly detection, and enhanced product quality.This fusion of AI with industrial control streamlines processes, enhances reliability, and propels businesses towards heightened competitiveness and profitability [1][2][3].

Motivations and related works
Fault detection (FD) and fault-tolerant control (FTC) are fundamental functions within industrial control systems, ensuring safety and reliability.FD identifies deviations or malfunctions in the control system and the equipment it governs, facilitating timely intervention to mitigate potential disruptions.FTC, on the other hand, enables control systems to operate seamlessly in the presence of faults or malfunctions, often through adaptive control strategies or redundancy mechanisms.These components collectively enhance system reliability, minimize downtime, and optimize maintenance costs [4,5].
The convergence of AI, FD, and FTC has captivated researchers, driving the integration of these fields.Recent years have seen a concerted effort to amalgamate AI techniques with fault management strategies, resulting in intelligent systems capable of diagnosing and compensating for faults.This review examines contemporary studies that interlace AI methodologies with FD and FTC across diverse industrial systems.From robotic manipulators to unmanned aerial vehicles, injection molding processes, and DC servo motors, these studies offer a comprehensive exploration of neural networks, deep learning, reinforcement learning, iterative learning control, and other AI methodologies.They illustrate how AI can enhance fault detection accuracy and facilitate fault tolerance, even in intricate industrial systems fraught with uncertainties and time delays.The synthesis of AI and fault management holds promise to revolutionize industrial control systems by imbuing them with intelligent fault-handling capabilities.
Within this context, the present paper embarks on a journey to unlock the transformative potential of AI for fault management in industrial control systems.Armed with an array of methodologies, we delve into AI's capacity to revolutionize fault detection and tolerance, contributing to the efficiency, resilience, and reliability of industrial processes.By scrutinizing existing literature, this review aims to illuminate the promise and possibilities that AI offers to the realm of fault management in industrial settings.Subsequent sections delve into AI-driven approaches for fault diagnosis and fault-tolerant control across diverse industrial domains.By bridging the gap between AI and fault management, this review seeks to enrich the knowledge landscape and propel the industrial sector into a new era of intelligent fault management.
Recent research has yielded a spectrum of fault detection and diagnosis (FDD) strategies, showcasing their potential to tackle the intricate challenges posed by industrial systems.Notable contributions include: Qian et al. [6] offer an encompassing overview of autoencoder (AE)-based representation learning for FDD in industrial processes.Their review surveys state-of-the-art monitoring strategies and future research prospects.Sun and Ma [7] introduce an enhanced kernel learning data-driven (EKLDD) algorithm for identifying multiple faults in industrial systems.Incorporating dynamic features and considering measurement noise, the EKLDD method employs the FARMAX technique to capture variable interactions in time series data.The authors propose a monitoring scheme grounded in fault lines and angle statistics, validating its applicability through simulations and real-world case studies.
In [8], a novel deep learning framework named ADL-FDI4 is presented for fault diagnosis in Industry 4.0 applications.ADL-FDI4 adeptly combines long short-term memory (LSTM), convolutional neural networks (CNN), and graph CNN (GNN) to handle heterogeneous data.A branch-and-bound procedure is employed for parameter optimization, leading to superior detection rates, reduced running times, and enhanced energy efficiency compared to baseline solutions.Xueyu Li et al. [9] propose an off-policy reinforcement learning (RL) algorithm for fault-tolerant control in industrial processes.This model-free RL algorithm learns from system trajectory data and solves a linear quadratic zero-sum game using the game algebraic Riccati equation (GARE).Simulation results on an injection molding process showcase the algorithm's potential as an effective and efficient fault-tolerant control approach.
Leveraging iterative learning control (ILC), [10] combines ILC with fault-tolerant control to mitigate process faults and guide the plant to a predefined reference trajectory.The effectiveness of this method is demonstrated through simulation of a DC servo motor.Additionally, [11] presents an incremental learning approach based on CNN-AE for novelty fault detection in rotary systems.The method is validated through an experimental case study on a fault machinery simulator, showcasing its potential to adapt to complex real-world manufacturing environments.
By harnessing Bayesian deep learning models, [12] proposes an advanced approach to FDD that accounts for prediction uncertainty.This entails using Automatic Differentiation Variational Inference (ADVI) for Bayesian inference, extracting prediction uncertainty information, and integrating it into a risk function.Experimental validation on both open-source datasets and real case studies demonstrates the superiority of this approach over classical deep learning models.
Furthermore, [13] introduces a neural network state observer-based robust adaptive quantized iterative learning output feedback control (RAQILOFC) strategy for rigidflexible coupled robotic systems (RFCRSs).The method's utilization of a neural network state observer facilitates accurate estimation of angular velocities, crucial for precise trajectory tracking.Wu, Kang, and Yao's method [14] proposes a learning observer for fault diagnosis and fault-tolerant control of manipulators, comprising both fault estimation and sliding mode fault-tolerant control.Verification through simulation underscores its effectiveness.
Fukai Zhang et al. [15] present a learning-based active Fault-Tolerant Control (FTC) scheme for robot manipulators, accommodating uncertainties and actuator faults.By employing dynamic learning theory and radial basis function networks, the FTC method demonstrates effective fault detection and isolation, alongside fault compensation for improved control performance.
In [16], a fault-tolerant control approach is presented for faulty fixed-wing unmanned aerial vehicles (UAVs).This strategy combines prescribed performance functions and a PID-type filter to mitigate tracking errors in the presence of actuator faults.A composite learning algorithm incorporating neural networks and disturbance observers enhances fault tolerance for non-linearities, verified through Lyapunov stability analysis and experimental results.
These contributions collectively illustrate the diverse approaches and methodologies employed in modern FDD strategies, highlighting the rapid evolution and ongoing innovation within this critical field of study.
The industrial landscape resonates with the prevalence of permanent magnet DC (PMDC) motors, a pivotal class of electric motors that employ permanent magnets to generate magnetic fields.Designed for precision control of speed and torque, PMDC motors find applications in domains such as robotics, machine tools, and automated manufacturing equipment.These motors, characterized by a simple construction comprising a stator and rotor housing permanent magnets and windings respectively, are seamlessly integrated with DC power sources and controllers.This synergy results in an efficient, durable, and easily controlled mechanism that orchestrates precise motion control.
The versatility and longevity of PMDC motors have rendered them indispensable in sectors where precise motion control is imperative.However, this ubiquity also necessitates robust control mechanisms capable of maintaining performance in the presence of faults and failures.It is within this context that the exploration of fault detection and fault-tolerant control strategies for PMDC motors assumes significance.
A plethora of methodologies has emerged to address the challenges of fault management in PMDC motors.Dilmi et al. [17] present a hybrid control strategy for the active fault-tolerant control of brushless DC motors (BLDCMs).Their approach integrates interval type-2 fuzzy logic control with second-order sliding mode control, enhancing both static and dynamic performance.Notably, their strategy includes a fault detection algorithm that promptly identifies phase current imbalances and short-circuit faults, enabling timely fault mitigation.
Advancing the domain of fault tolerance, G. Sajitha et al. [18] propose a novel fault-tolerant control (FTC) strategy tailored for brushless DC (BLDC) motor drives in electric vehicle applications.Their scheme adeptly employs direct torque control (DTC) under normal conditions and smoothly transitions to field-oriented control (FOC) in the event of voltage sensor failure.This approach ensures continuous drive system operation, enhancing reliability.
In pursuit of robust fault-tolerant strategies for PMDC motors, Umm-e-Aimen et al. [19] introduce a method that combines multiple models switching and tuning (MMST) with linear quadratic tracking (LQT).This approach, characterized by its robustness, effectively tracks time-varying sinusoidal reference signals despite actuator faults.The combination of robust fault detection and isolation (FDI) strategies and efficient decision-making mechanisms culminates in a comprehensive fault-tolerant solution.
The integration of AI and fault management strategies is evident in the work of Chu et al. [20], who employ neural network techniques for fault diagnosis in BLDC motors.Their exploration of neural network variants, including convolutional neural networks (CNNs) and deep neural networks (DNNs), reveals fault identification accuracy surpassing 95%.
Recent research [21] in BLDC motor control introduces an adaptive fractional order PID (FOPID) controller, enhanced by the Artificial Bee Colony (ABC) algorithm.This controller is designed to improve BLDC motor performance under various speed and load conditions, addressing challenges like settling time, steady-state error fluctuations, power instability, and nonlinearity.The controller prioritizes enhanced controllability.Additionally, [21] focuses on optimizing the FOPID controller through integration with the ABC algorithm within a self-tuned regulator framework.This approach finetunes the controller's performance by minimizing a predefined objective function while adhering to critical inequality constraints.The study acknowledges Hall effect sensor limitations and proposes employing a Kalman filter for speed estimation, mitigating sensor-related complexities.Simulation-based assessments underscore the superiority of the ABC-tuned FOPID controller in terms of time-domain behavior, control effort, and performance indices compared to traditional tuning methods.Experimental validation confirms the practical utility of this approach within real-world operational scenarios.
Vanchinathan and Valluvan [22] introduce an inventive strategy centered around the Bat algorithm (BA) for optimal tuning of fractional-order proportional integral derivative (FOPID) controllers aimed at regulating rotor speed in sensorless brushless direct current (BLDC) motors.By integrating the BA into this context, the study introduces a pioneering optimization algorithm capable of fine-tuning various FOPID controller parameters, including Kp, Ki, Kd, lambda, and mu.The core focus of [22] is achieving desired speed control and robust performance through meticulously tuned FOPID closed-loop speed controllers, leveraging the Bat algorithm.The study extensively evaluates dynamic system behavior, emphasizing critical time-domain specifications such as peak time, overshoot percentage, settling time, rise time, and steady-state error.Compared with the Artificial Bee Colony (ABC) optimization method and modified genetic algorithm (MGA), the proposed Bat algorithm-based FOPID controller excels in enhancing transient characteristics and reducing steady-state error, showcasing its potential to elevate control performance.
Study [23] introduces an innovative approach termed the Whale Optimization Algorithm (WOA) for optimal tuning of a fractional-order proportional integral and integral-order controller ( FOPI ).The research focuses on the sensorless speed control of a permanent magnet brushless DC (PMBLDC) motor powered by solar photo-voltaic (PV) systems.Operating under varying conditions, including speed changes, solar PV output voltage fluctuations, load variations, integrated scenarios, and uncertain controller parameters, the study addresses challenges associated with uncertainties, nonlinearity, and poor controllability.To enhance motor control performance, optimization algorithms, namely Bat algorithm (BA), Grey Wolf Optimization (GWO), and WOA, are proposed for FOPI controller optimization.The study uses MATLAB 2019a/Simulink to develop and compare the effectiveness of proposed controllers, demonstrating that the WOA-optimized FOPI controller outperforms other methods in terms of various performance metrics and control efforts, making it an attractive choice for solar PV-fed sensorless speed control of PMBLDC motors.
These are but a few examples from the dynamic landscape of AI-driven control and fault management strategies for DC motors.Each study encapsulates a unique facet of fault detection, diagnosis, and tolerance, underscoring the transformative potential of AI in enhancing control and fault management across diverse industrial domains.Through the exploration of these innovative methodologies, this review seeks to unravel the complexities of AI-empowered fault management and its implications for industrial control systems.
Within this context, our paper introduces a pioneering approach to address the intricacies of fault detection and fault-tolerant control in PMDC motors.This innovative framework capitalizes on the amalgamation of reinforcement learning, machine learning techniques, and established industrial control strategies, engendering a paradigm shift in how these systems are managed.
Distinguishing itself from earlier studies, our approach embarks on a twin-pronged journey.The first facet involves swift and precise detection of sensor failures, orchestrated through a learning-enhanced observer.This observer augments traditional sensing mechanisms with learning-driven insights, enabling rapid identification of sensor malfunctions even in complex operational scenarios.
Complementing this proactive detection, the second facet of our approach revolves around maintaining consistent and reliable performance post-fault.To achieve this, we harness the power of internal model-based fault-tolerant control.This strategy deftly adapts to fault-induced perturbations, mitigating their impact and ensuring continuous, safe, and optimal operation.
Notably, the proposed methodology extends beyond its application to PMDC motors, encompassing its versatility for addressing complex system faults across industries.By merging the sophistication of reinforcement learning, the acumen of machine learning techniques, and the robustness of industrial control strategies, we forge a comprehensive solution that optimizes performance, precision, and cost-effectiveness.This amalgamation is designed with pragmatism in mind, ensuring that the proposed approach retains its utility even within the dynamic realm of Industry 4.0.

Key contributions
This paper rests on two pillars of innovation.Firstly, we harness the power of cutting-edge artificial intelligence tools, particularly reinforcement learning, to chart a new course in the landscape of fault detection and fault-tolerant control of PMDC motors.This integration transcends the confines of traditional methodologies, venturing into a realm of adaptability and intelligence that befits the complexity of industrial systems.
Secondly, our approach is characterized by its unwavering commitment to practicality.We recognize the imperatives of real-world implementation and, therefore, engineer our framework with computational efficiency and simplicity at its core.This emphasis on minimizing complexity and cost renders our approach well-suited for adoption within the industrial arena, particularly under the banner of Industry 4.0.By doing so, we endeavor not only to bolster fault management strategies but also to catalyze the transformation of industrial control processes into intelligent, accessible, and forward-looking endeavors.

PMDC motor system
Direct current (DC) motors are electromechanical systems that convert electrical energy into rotational mechanical energy.With attributes like high torque, controllable speed range, portability, and compatibility with various control methods, DC motors find wide applications in control systems.Among them, permanent magnet DC motors utilize a permanent magnet to generate the stator's magnetic field.A schematic of a PMDC motor is depicted in Fig. 1.
Here, we present a linear approximation of the equations governing a PMDC motor system [24].
The motor air-gap flux φ(t) is directly proportional to the field current, as expressed in Eq. ( 1) [24].
We assume that the motor torque T m (t) is linearly related to both φ(t) and armature cur- rent i a (t) .If a constant field current is sustained in a field coil, the Laplace transform of motor torque can be described as [24]: Here, K m is influenced by the magnetic material's permeability.The input voltage applied to the armature determines the armature current as given by Eq. ( 3) [24].
Furthermore, the relationship between T L (s) and other torque components is captured by Eq. ( 7) [24].
Here, J, b, and T d (s) represent friction, inertia, and disturbance torque, respectively.Based on Eqs. ( 2) and ( 5)-( 7), we derive the position control dynamics of the PMDC motor, which can be succinctly represented by the transfer function presented in Eq. ( 8) [24]: This equation encapsulates the intricate interplay between motor characteristics and control variables, providing a concise mathematical expression of the PMDC motor's position control dynamics A graphical representation of the PMDC motor's position control dynamics is presented in Fig. 2.

Reinforcement learning framework and agent-environment interaction
The implementation of a reinforcement learning framework involves several key components that collectively drive the learning process: Environment, Agent, State, Action, and Reward.In this section, we provide a more precise explanation of how these components interact within our study to achieve fault-tolerant control of permanent magnet DC motors. (5)

Agent and environment
In our proposed approach, the environment represents the PMDC motor system, while the agent takes on the role of the PI controller.This alignment establishes an interactive loop where the agent, as the controller, interacts with the environment, aiming to optimize the control strategy.The dynamic interplay between the agent and the environment is at the heart of the reinforcement learning paradigm, enabling the agent to learn optimal control policies through trial and error.

Agent-environment interaction
The interaction between the agent and the environment is a pivotal aspect of the RL process.At each time step, the agent selects an action to be applied to the environment, akin to a control signal.This action directly influences the motor's behavior, affecting its response to the control effort exerted by the agent.The agent's decision-making process is guided by its observations and learned policy, with the objective of steering the motor's behavior toward the desired trajectory.

States and observations
The state of the environment, as perceived by the agent, is encapsulated in its observations.These observations consist of two crucial components: the reference tracking error ( θ ref − θ ) and the integral of this error ( (θ ref − θ) dt).The reference tracking error reflects the disparity between the desired reference position ( θ ref ) and the actual position of the motor ( θ ).The agent's observations provide insights into these discrepan- cies, allowing it to make informed decisions regarding the optimal action to be taken.

Rewards and learning
The agent's action triggers a response from the environment, which generates a reward signal.This reward plays a pivotal role in guiding the agent's learning process.The reward function (Reward) is designed to achieve a balance between minimizing the tracking error ( θ ref − θ ) and controlling the control effort (u(t)).Notably, the control effort (u(t)) reflects the amplitude of the control signal applied by the agent.By minimizing the control effort, the agent ensures that control actions remain within practical limits, enhancing the stability and feasibility of the controlled system.
In conclusion, the reinforcement learning framework leverages the interactive relationship between the agent and the environment to optimize control strategies.Through the cyclical process of action, observation, and reward, the agent refines its policy to enhance position tracking accuracy and manage control signal magnitudes effectively.This understanding underscores the synergy between key RL components in achieving fault-tolerant control of permanent magnet DC motors

Primary control strategy
Reinforcement learning was selected as the approach for fault-tolerant control in this study due to its unique advantages over other advanced control algorithms.RL stands out as a powerful methodology for learning optimal control policies in intricate and uncertain environments, rendering it particularly well-suited for fault-tolerant control of permanent magnet DC motors.This technique is proficient in managing non-linear systems and addressing non-convex optimization challenges, which often prove difficult or infeasible for traditional control algorithms.Furthermore, RL offers the ability to adapt to dynamic conditions and minimize computational complexity by learning optimal control policies directly from empirical data.While various advanced algorithms exist in the field of control, RL has showcased promising outcomes across a broad spectrum of applications, including the realm of fault-tolerant control for diverse systems.Hence, RL was deliberately chosen as the preferred methodology for fault-tolerant control in this study, capitalizing on its distinctive strengths to tackle the complexities associated with systems like permanent magnet DC motors.
Our primary control strategy hinges on the proportional-integral (PI) control method, which forms the foundation of our control scheme.We embark on a comprehensive evaluation of two distinct techniques for adjusting the controller coefficients: the renowned Ziegler-Nichols tuning method and the cutting-edge reinforcement learning tuning technique.The Ziegler-Nichols method, a widely adopted and pragmatic approach to tuning PID controllers, finds its place in the industry.In this study, we specifically employ the closed-loop Ziegler-Nichols frequency tuning method.The transfer function of the PI controller, a cornerstone of our methodology, is expounded as follow: Here, the symbols K p and K i signify the proportional and integral gains, respectively.This approach entails initializing the system within a closed-loop configuration, incorporating proportional control with an almost negligible gain.Gradually, we amplify the controller gain until the output initiates continuous oscillations at a specific frequency.Resultantly, the amplified gain is designated as the ultimate gain ( K u ), while the oscilla- tion period is referred to as the ultimate period ( T u ).Employing these crucial values, we proceed to compute the controller coefficients as delineated in the following: Aligned with the framework illustrated in Fig. 3, the implementation of the RLbased fault-tolerant control methodology employs the twin-delayed deep deterministic policy gradient (TD3) algorithm to fine-tune the PI controller gains, K p and K i .For the development and implementation of the RL framework, the MATLAB/ Simulink environment was utilized.This algorithm was chosen over its counterparts (9) Fig. 3 Reinforcement learning-based control framework due to its model-free nature, dispensing with the requirement for an environment model to make decisions.Additionally, its online learning capacity enables real-time policy updates based on incoming observations.The TD3 algorithm enhances stability and learning speed by employing two critic networks instead of one.Within the TD3 agent, three pivotal components comprise the architecture: the actor-network and two critic networks, collectively approximating the long-term reward.The actornetwork takes on the role of determining appropriate actions in response to observed states, while the critic networks evaluate the potential long-term consequences of those actions.By minimizing the mean-squared error between their predictions and actual rewards, the critic networks enhance their accuracy in predicting long-term rewards.
The TD3 algorithm extends from DDPG (Deep Deterministic Policy Gradients), addressing certain limitations of the latter.This extension is realized through the use of two critic networks, the introduction of a delay between the actor and critic networks, and the employment of target networks for both.These adjustments contribute to the stabilization of the learning process and the overall enhancement of algorithmic performance.TD3 has demonstrated notable efficacy across diverse applications, spanning control problems, robotics, and video game contexts, rendering it a suitable choice for addressing the challenge at hand.Further insights into this algorithm can be found in [25].
The architecture and hyperparameters of the neural network, specifically the actornetwork and critic networks employed in the TD3 algorithm, have been meticulously determined for optimal performance in the MATLAB/Simulink environment.The actor-network serves the purpose of selecting appropriate actions based on the observed state, with the gains of the PI controller, K p and K i , represented as absolute weights within the actor-network.
The architecture of the neural network encompasses layers designed to efficiently capture the intricate relationships within the control problem.The hyperparameters, such as the number of neurons in each layer, activation functions, and learning rates, were carefully tuned to ensure effective learning and convergence.
The RL agent comprises a policy for generating actions and a methodology for iteratively updating this policy by maximizing a defined reward function.Observations from the environment, in this case, the PMDC motor, provide the input for the RL agent's interaction with the system.These observations encompass the reference tracking errors ( θ ref − θ ) and their corresponding integrals ( (θ ref − θ) dt).The primary objective of the RL agent is to optimize the reward function by minimizing position tracking errors and control effort.The policy for action generation is embodied in the actor-network-a neural network that maps observations to specific actions.Notably, the gains of the PI controller, K p and K i , are encoded as absolute weights within the actor-network.The reward function, as defined, captures the essence of this optimization endeavor: Breaking down the components of the reward function, it encompasses two primary terms.The first term, 2(θ ref − θ) , directly addresses the objective of diminishing (11) the disparity between the desired reference position (θ ref ) and the actual position (θ) of the motor.This term is intrinsically linked to the attainment of precise and reliable position tracking, thereby enhancing the overall quality of control.
The second term, 0.01u(t), is of paramount significance in its role as a regulatory mechanism for the control effort.This term echoes the emphasis placed on optimizing control actions while simultaneously averting the escalation of control signal magnitudes to impractical levels.The weight of 0.01 assigned to this term underscores the intentional approach of moderating control effort within our framework.
Collectively, the multidimensional reward function we present seeks to harmonize accurate position tracking with judicious control effort management.Our intention remains rooted in the creation of control strategies that not only excel in performance but also uphold the sustainability and operational practicality of the controlled system.
Throughout the training process, carried out in the MATLAB environment, the actor and critic networks undergo iterative updates aimed at identifying the optimal control policy that maximizes the reward function.Notably, the critic networks play a pivotal role in evaluating the long-term implications of the control gains K p and K i , achieving this by mapping observations to approximations of the anticipated longterm reward.This comprehensive training environment provides a robust platform for developing and fine-tuning the reinforcement learning-based fault-tolerant control strategy for permanent magnet DC motors.
For our PI controller, we adopt a simplified representation in the form of a neural network with a singular fully connected layer.The critical K p and K i gains of the PI controller are precisely mirrored by the absolute weights within the actor-network.These essential parameters are summarized in Table 1.
Further insight into the controllers' comparative performance is provided by Fig. 4, where we juxtapose the step responses of the two controllers.These comparative simulations were conducted within the Matlab/Simulink environment, utilizing the following model parameters in Table 2.
A more granular assessment of performance characteristics is offered by Table 3, which delves into various performance criteria of the two controllers.
Here, "RT, " "TT, " and "ST" correspond to rise time, transient time, and settling time, respectively.The outcomes elucidated within Table 2 compellingly underscore the superior performance of the reinforcement learning-based (RL) controller in comparison to its counterpart.Consequently, the RL-based controller forms the bedrock of our approach to implementing fault-tolerant control.By harnessing the enhanced capabilities of the RL-based controller, this approach fortifies the system's resilience, enabling stable operation even when confronted with faults or other external disturbances.
It is noteworthy to emphasize that the contrast with the Ziegler-Nichols tuning method stems from an industrial perspective, where a pragmatic one-size-fitsall approach is often favored over intricate model-based techniques.The Z-N tuning method furnishes an initial estimate to render the control loop operational, which can subsequently be fine-tuned based on the actual dynamics of the system.In practical scenarios, recurrent model updates can prove unfeasible; hence, the necessity for a straightforward method to establish the initial settings accurately.The RL-based approach we employ endeavors to surmount certain limitations intrinsic to the Z-N method, all while preserving a manageable complexity to align with industrial applicability.

Fault tolerant control
Figure 5 illustrates the proposed fault-tolerant control strategy, designed to swiftly identify system faults and reconfigure the control system to ensure stability and sustained performance.The strategy comprises three pivotal elements: residual generation, residual evaluation, and fault-tolerant control (FTC) logic.
For fault detection, an observer-based technique inspired by methods presented in [26] is employed.This technique involves comparing the system's output with the observer's output to generate a residual signal, which is subsequently subjected to analysis using a machine learning approach centered around the Mahalanobis distance metric.The Mahalanobis distance metric, renowned for assessing the distance between a sample point and a distribution, is harnessed to facilitate tasks such as multivariate anomaly detection, imbalanced dataset classification, and one-class classification.
In the event that the fault detection unit detects erroneous sensor data, the controller seamlessly transitions to utilizing its internal model to derive dependable observations until the sensor fault is rectified.A comprehensive elucidation of this process will be expounded upon in subsequent sections.

Observer design
To effectively address the influence of sensor measurement noise, our approach employs a Kalman filter, a robust method renowned for its efficacy in estimating and controlling linear systems.The system's dynamics can be encapsulated within the following linear discrete-time model [5]: Here, f(k) represents the fault vector, v(k) accounts for measurement noise, and w(k) characterizes process noise.Both noise sources are presumed to adhere to a zero-mean white noise pattern, their covariance matrix is stipulated as follows, in accordance with [5]: The Kalman observer design for this system adopts the subsequent form [5]: The Kalman observer gain K is ascertained through the following equation, derived from [5]: Furthermore, the computation of K necessitates determining the symmetric and semipositive solution P to the ensuing discrete algebraic Riccati equation, as detailed in [5]: For the assessment of potential faults, a residual signal is defined as follows: Subsequent analysis of this residual signal is delegated to the residual evaluation unit, empowered by machine learning techniques.

Residual evaluation
The Mahalanobis distance emerges as a potent and versatile metric for detecting outliers, finding widespread utility across diverse research domains and practical applications.Especially well-suited for high-dimensional data marked by correlated variables and instances of missing data, the Mahalanobis distance's salient attributes make it an indispensable tool in real-world challenges.Its capability to account for variable correlations, scale-invariance, outlier sensitivity, and adaptability to missing data solidify its efficacy as an outlier detection mechanism [27].(12) x(k Given these advantages, we opted for the Mahalanobis distance as the cornerstone of our residual evaluation strategy, leveraging its robust outlier detection capabilities and straightforward implementation.The computation of the Mahalanobis distance entails establishing the distance between a vector x and a distribution characterized by mean µ and covariance , expressed as follows: In terms of standard deviations, this distance conveys the extent to which x deviates from the mean.
During the training phase, we employ residual signal data from healthy conditions to calculate the Mahalanobis distance between the training dataset and its corresponding distribution.Subsequently, for fault detection, we utilize the maximal Mahalanobis distance value as a threshold score.This threshold can be tuned based on desired sensitivity levels.Consequently, the incorporation of the Mahalanobis distance within the residual evaluation component empowers robust fault detection, triggering timely responses from the control system to preserve system performance and efficacy.

Results and discussion
In this section, we present a comprehensive evaluation of the proposed fault-tolerant control strategy.To initiate the assessment, we focus on the motor position tracking system's performance under a reference input, illustrated in Fig. 6.This preliminary step (18) d M = (x − µ)�(x − µ) T Fig. 6 System tracking healthy response with RL-based PI control underscores the accuracy of the reinforcement learning-based control in maintaining precise tracking during healthy conditions and in the absence of faults.
Transitioning to more intricate scenarios, we explore the system's behavior in the presence of faults.Specifically, two fault scenarios are examined: a bias fault and an additive sinusoidal fault affecting the position sensor measurements.In the first scenario, depicted in Fig. 7, a bias fault with an amplitude of 2 emerges at t = 5 seconds ( f 1 (t) = 2step(t − 5) ).While the fault's influence is evident, the absence of a fault-tol- erant control system permits the system to eliminate the fault's impact, a phenomenon known as fault hiding.This situation underscores the need for robust fault detection and fault-tolerant control to preemptively address such imperceptible anomalies.
In the second scenario, depicted in Fig. 8, an additive sinusoidal fault with an amplitude of 2 is introduced at t = 5 seconds ( f 2 (t) = 2sin(t)step(t − 5) ).In this case, the controller's inability to counteract the fault's effect substantially impairs system performance.This outcome underscores the pivotal role of fault-tolerant control systems in swiftly detecting and mitigating fault-induced disturbances.The absence of such a system can lead to severe performance degradation and potential system failure.This underlines the paramount significance of a robust and efficient fault-tolerant control strategy to ensure both the safety and reliability of the system.
The introduced fault-tolerant control strategy is then assessed, demonstrating its efficacy in fault detection, residual generation, and residual evaluation.Figures 9 and 10 present the outcomes of the residual generation and evaluation subsystems, which collaborate to signal fault occurrences promptly.This signal enables the fault-tolerant It is important to underscore that this paper's perspective emanates from industrial control applications, where simplicity, minimal computational demands, and ease of implementation are paramount considerations.While advanced simulations and modeling techniques can offer deeper insights, they concurrently amplify system complexity and computational overheads significantly.In industrial contexts, solutions with simplicity and manageability are preferred, given the constraints of available computational resources.
The significance of fault detection and fault-tolerant control in ensuring industrial system safety and reliability cannot be overstated.Nonetheless, it is imperative that such approaches do not inadvertently introduce complexity, inflate costs, or hinder implementation and maintenance efforts.The approach posited in this paper strikes a balance by integrating learning-based techniques with conventional components in a straightforward yet efficacious manner.While advances in modeling, simulation, and computing open up new horizons, the practicality of simplicity, cost-effectiveness, and user-friendliness remains pivotal within an industrial framework.

Conclusions
This study has addressed the crucial challenges of fault diagnosis and fault-tolerant control within the framework of a permanent magnet DC motor affected by sensor faults.Through the integration of a reinforcement learning-based approach, we have introduced an innovative methodology for optimizing the proportional-integral control coefficients, customized specifically for motor position control.By conducting a comparative analysis against the Ziegler-Nichols method, we have clearly demonstrated the superior performance of our approach, characterized by computational efficiency and user-friendly implementation, thus making it accessible even to those without extensive expertise.
The introduction of an observer-based mechanism for residual signal generation, coupled with the employment of the Mahalanobis distance for rapid and accurate fault detection, has been pivotal in our strategy.To address sensor faults, we harnessed the capabilities of an internal model-based control approach.It is noteworthy that our methodologies have been thoughtfully crafted for ease of understanding and deployment, eliminating the necessity for specialized knowledge.This inherent simplicity allows for seamless integration with minimal additional cost, further solidifying their practicality.
As a trajectory for future research, we propose extending the evaluation to encompass nonlinear systems, exploring the adaptability of our methods to various fault types such as actuator or system faults, delving into scenarios involving cyber-attacks, and refining the reward function to amplify overall performance.By venturing into these directions, the applicability and robustness of our fault-tolerant control strategy can be further enhanced, paving the way for safer and more reliable industrial systems.

Fig. 2
Fig. 2 PMDC motor position control dynamics block diagram

Fig. 4
Fig. 4 Comparison of the step responses of the two controllers

Fig. 12
Fig. 12 Fault-tolerant control strategy response in sinusoidal sensor fault scenario Sardashti and Nazari Journal of Engineering and Applied Science (2023) 70:109

Table 1
The controllers' derived gains

Table 2
Network structure, hyperparameters, and DC motor parameters

Table 3
Comparison of the performance of the two controllers