Journal of Iranian Association of Electrical and Electronics Engineers

fa کنترل وضعیت تحمل‌پذیر عیب یکپارچه کوادروتور با استفاده از یادگیری تقویتی Integrated Fault-Tolerant Attitude Control of Quadrotor Using Reinforcement Learning کنترل Control پژوهشي Research در این مقاله&rlm; به مسئله طراحی کنترل‌ وضعیت بهینه برای پرنده کوادروتور با دینامیک غیرخطی که در معرض عیوب عملگر و اجزا قرار دارد پرداخته شده است. سیستم کنترل وضعیت بهینه تحمل‌پذیر عیب پیشنهادی مبتنی بر یادگیری تقویتی است و بدون نیاز به شناخت قبلی از دینامیک پرنده و به صورت یکپارچه طراحی می‌شود. بدین معنی که آشکارسازی ‌عیب و بازطراحی کنترل‌کننده را به‌ طور همزمان مورد بحث قرار می‌دهد. به منظور حل برخط معادله همیلتون-ژاکوبی-بلمن (HJB) بدون نیاز به شناخت دینامیک پرنده، از یک ساختار تخمین با دو شبکه عصبی شناساگر-نقاد استفاده شده است. ضریب فراموشی در قانون به‌روزرسانی شبکه شناساگر متغیر و تابعی از خطای تخمین حالات و تخمین نویز اندازه‌گیری می‌باشد که باعث بهبود مشخصه‌های حالت گذرا و پایدار آن می‌شود. از سوی دیگر به منظور حذف نیازمندی فرآیند آموزش به کنترل‌کننده پایدارساز اولیه، از یک جمله پایدارساز در قانون به‌روزرسانی شبکه نقاد استفاده شده است که امکان شروع فرآیند آموزش را از کنترل‌کننده بهینه پیشین فراهم می‌کند که لزوما پایداری سیستم معیوب جدید را تضمین نمی‌کند. همچنین در این ساختار، آشکارسازی عیب بدون نیاز به هیچگونه بانکی از مدل و صرفا مبتنی بر مقدار باقی مانده معادله HJB انجام می‌پذیرد. پایداری فراگیر یکنواخت وزن‌های هر دو شبکه و در نتیجه همگرایی قانون کنترل به پاسخ بهینه با استفاده از قضیه لیاپانوف اثبات شده و کارایی آن با استفاده از شبیه‌سازی مورد ارزیابی قرار گرفته است. This paper deals with the optimal attitude control problem for quadrotor unmannd air vehicles with unknown nonlinear dynamics subject to component and actuator faults. The proposed integrated optimal fault tolerant control (FTC) scheme is based on reinforcement learning (RL) algorithm, without requiring prior knowledge of the system dynamics. To solve the Hamilton-Jacobi-Bellman (HJB) equation, an identifier-critic-based online RL strategy is employed with a dual neural network (NN) approximation structure. The forgetting factor in the proposed identifier update law is variable and dependent on the state estimation errors and measurement noise estimation. Choosing this variable forgetting factor increases the convergence speed and decreases the estimation error of identifier NN weights compared to the constant one while maintaining its robustness. When a fault occurs, the system continues to operate under the former control policy until the fault is detected. On the other hand, the optimal control design in the RL framework requires the initial stabilizing control condition. In order to make it possible to initiate the control learning process from the former applied FTC, this condition is relaxed by leveraging a stabilizing term in the critic update law. The Uniformly Ultimately Boundedness (UUB) of identifier and critic NN weight errors and, as a result, the convergence of the control input to the neighborhood of the optimal solution are all proved by Lyapunov theory. In the proposed method, changes in the values of faults are detected by comparing the HJB error to a predefined threshold. Finally, the simulation results are given to validate the effectiveness of the developed method.   کنترل وضعیت کوادروتور, عیب اجزا و عملگر, کنترل بهینه تحمل‌پذیر عیب, آشکارسازی وقوع عیب, یادگیری تقویتی Quadrature Attitude Control, Component and Actuator Faults, Fault-Tolerant Optimal Control, Fault Detection, Reinforcement Learning 3 16 http://jiaeee.com/browse.php?a_code=A-10-2739-2&slc_lang=fa&sid=1 Sajad Roshanravan سجاد روشن روان S_roshanravan@elec.iust.ac.ir 100319475328460012632 100319475328460012632 No Iran University of Science and Technology (IUST) دانشگاه علم و صنعت ایران Saeed Shamaghdari سعید شمقدری Shamaghdari@iust.ac.ir 100319475328460012631 100319475328460012631 Yes Iran University of Science and Technology (IUST) دانشگاه علم و صنعت ایران