Abstract:The evasive maneuver strategy for a fighter against a medium-range air-to-air missile is crucial to improving aircraft survivability. This paper studies that the deep deterministic policy gradient algorithm trains the agent to learn the evasive maneuver strategy. The missile-aircraft engagement model parameters are the input states. The aircraft control commands are taken as the output actions. The missile-aircraft pursuit-evasion model is the environment. The shaping reward, including engagement model parameters and flight parameters, and the sparse reward of the engagement results are designed. Finally, the agent realizes the end-to-end evasive maneuver strategy from the state parameters to the aircraft control variables. Compared to the attack zones of four classic evasive maneuvers based on prior knowledge by simulating, this paper proves that the evasion strategy developed by the agent is second only to the tail dive maneuver. However, this strategy has the lowest dependence on the specialized domain knowledge of missile evasion.