Code Monkey home page Code Monkey logo

autonomous-uav-navigation-using-reinforcement-learning's Introduction

Autonomous-UAV-navigation-using-reinforcement-learning

UAV systems are attracting the attention of researchers and companies from a wide range of industries, particularly in the last decade, due to their ability to operate in a coordinated manner in complex scenarios and to cover and speed up applications that can be dangerous or tedious for humans. Search and rescue, inspection of facilities, delivery of commodities, and surveillance are just a few examples of these in the military, commercial, and government sectors. Path planning is an important research field in the autonomous UAV that helps to find the path, avoid collisions, and navigate in the environment. In this project, I have implemented the TD3+ PER algorithm to increase the performance of the autonomous vehicle. As a result, when compared to the TD3 algorithm, TD3+PER has shown increasing stability in training and improved the performance of the autonomous UAV.

Aim of the project

The goal of the project is to develop an autonomous UAV navigation system using reinforcement learning. The navigation of UAV include path finding in the environment without any collision. This is done by using TD3 and Prioritized experience replay (PER).

Objectives

  • to learn the fundamentals of unmanned aerial vehicle operation
  • To investigate different path finding systems.
  • To develop a Reinforcement learning algorithm which will be used to create an autonomous navigation system for UAV.
  • Explore various simulators which are used drone training.
  • Evaluate the results which are obtained.

Research questions

  • Does the prioritised experience replay method improve navigation performance?
  • Does reinforcement learning, which is used in the project, improve the navigation of the UAV’s?

Project methodology

For this project, I am using AirSim simulator from Microsoft to simulate the environment which is used to train the proposed TD3 + PER reinforcement algorithm.

The proposed model uses TD3 reinforcement learning method with PER to help it to converge to the point quickly.In the working model, when the TD3 agent generates an action in the environment (AirSim), the agent gets rewarded. The states and rewards are then sent to PER. The PER calculates the TD and sends the required samples of actions and states to the batch of experience. This batch of experience has the knowledge of best actions and how to get the highest rewards. Then the experience is shared with the TD3 agent, which increases the performance of the algorithm by reducing the unwanted steps in training.

image

Requirements

image

The libraries which are used for implementation are:

  • TensorFlow: Helps in creating neural networks.
  • PIL: this is an image processing library used for collecting image data from the environment (AIRsim)
  • Numpy: This library is a mathematical library that is used for computation of matrices.
  • Opengym: This library provides the environment wapper which is used to connect AirSim and the RL algorithm.

Workflow

image

Results

Average Q value: It's a measure of the total anticipated benefit when the agent is in state s and does action a. this is used to give the probability of success of the model to achieve its goals. The average q values are stored in stats.csv file.

TD3 + PER

image

only TD3

image

The above plots show the average Q value of TD3 algorithm and TD3+per algorithm. The proposed algorithm showed a steady growth of performance than the random performance in the TD3. This means that the PER increases the stability of the TD3 algorithms and helps to converge faster.

Total rewards: Using reward and punishment, the Reward Function instructs the agent what is right and what is wrong. Agents in RL are tasked with maximising the sum of all possible rewards. There are instances when we have to give up quick gains in order to get the most overall benefit. So, most of the RL algorithm uses total rewards as the metric to evaluate the model.

TD3 +PER

image

Only TD3

image

The above graphs indicate that the TD3+ PER was able to get more rewards compare to the TD3. This can be concluded that the TD3+PER algorithm was taking more precise steps and was converging to the goal more quickly. The bias introduced by the PER had an impact on the performance of the algorithm.

autonomous-uav-navigation-using-reinforcement-learning's People

Contributors

poojithpoosa avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.