We will test whether RL outperforms a greedy strategy in scenarios where the PIW moves out of the region of highest probability. This will be evaluated using a grid environment (20x20) with four drones controlled by both centralized and independent Deep Q-Network (DQN) algorithms against a greedy approach. The performance of each method will be assessed under conditions of small dispersion with a probability of detection (POD) set to 1.
This hypothesis examines if agents trained with independent neural networks converge faster than when trained with shared networks. Using a similar grid environment, we will deploy four drones to find a PIW under conditions of both small and large dispersion, analyzing the rate of convergence in these scenarios.