Deep Reinforcement Learning for Airfoil Pitching Moment Control

PROJECT DETAIL · 2025

Deep Reinforcement Learning for Airfoil Pitching Moment Control

Under-review Computers & Fluids manuscript on PPO-based DRL control of a NACA0012 airfoil ( $\mathrm{Re}_c=3000$ , $\alpha=10^\circ$ ) for quarter-chord pitching-moment trim using CFD-in-the-loop active flow control.

Current Focus
DRL
Active Flow Control
CFD
HPC

Problem Statement

Closed-loop aerodynamic trim in separated unsteady flow is difficult because pitching-moment dynamics are strongly coupled to vortex shedding and reward design.

Why It Matters

This study demonstrates a reproducible DRL-CFD path for pitching-moment control using fluidic actuation, relevant to AFC-enabled control authority without conventional moving surfaces.

Methods / Numerical + ML Setup

Simulated a NACA0012 airfoil at $\mathrm{Re}_c=3000$ and $\alpha=10^\circ$ in PyFR using high-order FR/DG for 2D compressible Navier-Stokes dynamics.
Used two AFC actuator families: wall-normal blowing/suction (including ZNMF variant) and near-wall tangential Coanda-type blowing.
Trained PPO policies with TorchRL (and compatible Stable-Baselines3 workflows) using 156 velocity probes as feedback state.
Compared reward formulations based on moving-average $C_m$ windows ( $T_a$ , $T_p$ , $2T_p$ ) and an added standard-deviation penalty term.
Executed training on 4 parallel CFD environments with GPU-backed runs and deterministic post-training policy evaluation.

Key Results

Learned policies consistently drove mean quarter-chord pitching moment toward trim ( $|\langle C_m \rangle| \sim \mathcal{O}(10^{-3})$ ).
Long averaging windows achieved mean trim but allowed larger instantaneous $C_m$ oscillations.
Adding a variance penalty improved instantaneous trim behavior while preserving control authority.
Selected policies also increased mean lift and reduced mean drag relative to baseline uncontrolled flow.
Coanda-type blowing achieved comparable trim with lower actuation mass-flux demand than wall-normal blowing/suction setups.

Toolchain

PyFR
TorchRL
Stable-Baselines3
PPO actor-critic networks
Python
Slurm
GPU HPC (NVIDIA V100)

Challenges and Lessons

Balancing sample efficiency against expensive CFD rollouts and long training horizons.
Designing rewards that enforce both mean trim and low instantaneous unsteadiness.
Working with partially practical sensing assumptions before moving to sparse pressure-only observations.

Future Work

Shift from velocity-field probes to sparse pressure sensing with POMDP-oriented state design.
Extend to 3D and higher-Reynolds-number regimes for stronger physical realism.
Advance toward wind-tunnel-in-the-loop DRL and multi-objective rewards (trim, drag/lift, actuation cost).

PROJECT LINKS

Paper GitHub Video Back to all projects