Code Monkey home page Code Monkey logo

Comments (7)

MetallicaSPA avatar MetallicaSPA commented on July 30, 2024 1

How often does it happen?

It happens every time I ran that enviroment, usually before 50k steps. Never happened with basic or defend the center.
Info about my enviroment:

I'm running everything in Linux Mint 21.1 Vera, under Anaconda using Spyder IDE.
Vizdoom version: 1.2.0
Gymnasium version: 0.26.3
Stable-baselines3 version_ 2.0.0a5

Let me know if you need any more information about my enviroment.

EDIT: Updated Gymnasium to 0.28.1, still getting the same problem.
Here's the traceback:

File ~/anaconda3/lib/python3.9/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
exec(code, globals, locals)

File ~/TFM/Doom_RL/vizdoom_A2C.py:248
model.learn(total_timesteps=3000000, callback=callback, progress_bar=True)

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/a2c/a2c.py:194 in learn
return super().learn(

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py:259 in learn
continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py:178 in collect_rollouts
new_obs, rewards, dones, infos = env.step(clipped_actions)

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/vec_env/base_vec_env.py:171 in step
return self.step_wait()

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/vec_env/vec_transpose.py:95 in step_wait
observations, rewards, dones, infos = self.venv.step_wait()

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py:69 in step_wait
obs, self.reset_infos[env_idx] = self.envs[env_idx].reset()

File ~/anaconda3/lib/python3.9/site-packages/stable_baselines3/common/monitor.py:83 in reset
return self.env.reset(**kwargs)

File ~/TFM/Doom_RL/vizdoom_A2C.py:208 in reset
state = self.game.get_state().screen_buffer

AttributeError: 'NoneType' object has no attribute 'screen_buffer'

from vizdoom.

MetallicaSPA avatar MetallicaSPA commented on July 30, 2024 1

So at the moment, I think the reason might be that your .cfg or .wad files were somehow modified and, for example, now allow the agent to be killed before the episode starts

Thanks for this! I modified my cfg file and set the episode start time to 1. After 100k steps it was running smoothly.
Seems that for any reason, you can get killed sooner there than in other episodes.

from vizdoom.

mwydmuch avatar mwydmuch commented on July 30, 2024

Hi @MetallicaSPA! I may need some help to fully understand what is happening. If you mean that from time to time, you get None from get_state(), then this is expected. In the original ViZDoom API get_state() will return None if the episode ends/reaches the terminal state. So you should always check if it's None or use the self.game.is_episode_finished() check.
If your problem is that self.game.new_episode() doesn't reset your episode then this is unexpected, but I would need a code sample to run to see what is happening.

Also, we now provide official wrappers for Gym and Gymnasium, so you don't need to implement them yourself! Check https://github.com/Farama-Foundation/ViZDoom/tree/master/examples/python directory for Gym, Gymnasium and StableBaselines examples.

from vizdoom.

MetallicaSPA avatar MetallicaSPA commented on July 30, 2024

If your problem is that self.game.new_episode() doesn't reset your episode then this is unexpected, but I would need a code sample to run to see what is happening.

That's what it seems to happen, because I tried and it happens at different steps; so I feel it's something random.
Here's the full code:

import vizdoom as vzd
import numpy as np
import cv2
import os 

from vizdoom import *
from gymnasium import Env
from gymnasium.spaces import Discrete, Box
from stable_baselines3.common.callbacks import CallbackList, EvalCallback, ProgressBarCallback, CheckpointCallback
from stable_baselines3 import A2C

DEFAULT_CONFIG = "/home/joaquin/TFM/Doom_RL/scenarios/deadly_corridor.cfg"
SCENARIO_PATH = '/home/joaquin/TFM/Doom_RL/scenarios_official/deadly_corridor.wad'
CHECKPOINT_DIR = './train/train_deadly_corridor'
LOG_DIR = './logs/log_deadly_corridor'

render = False # True will show the window while training, False don't but will make the training faster

class VizDoomGym(Env): 
    # Function that is called when we start the env
    def __init__(self, render=render): 
        # Inherit from Env
        super().__init__()
        # Setup the game 
        self.game = vzd.DoomGame()
        self.game.load_config(DEFAULT_CONFIG)
        self.game.set_doom_scenario_path(SCENARIO_PATH)
        
        self.game.set_doom_game_path("/home/joaquin/TFM/Doom_RL/DOOM2.WAD")
        self.game.set_render_hud(False)
        self.game.set_screen_resolution(vzd.ScreenResolution.RES_640X480)
        # self.game.set_screen_resolution(vzd.ScreenResolution.RES_160X120)
        # Set cv2 friendly format.
        # self.game.set_screen_format(vzd.ScreenFormat.BGR24)
        
        # Enables labeling of the in game objects.
        self.game.set_labels_buffer_enabled(True)
        # Enables depth buffer (turned off by default).
        self.game.set_depth_buffer_enabled(True)
        
        # Render frame logic
        if render == False: 
            self.game.set_window_visible(False)
        else:
            self.game.set_window_visible(True)
        
        self.game.clear_available_game_variables()
        self.game.set_available_game_variables([
                                          vzd.GameVariable.AMMO0,
                                          vzd.GameVariable.AMMO1,
                                          vzd.GameVariable.AMMO2,
                                          vzd.GameVariable.AMMO3,
                                          vzd.GameVariable.AMMO4,
                                          vzd.GameVariable.AMMO5,
                                          vzd.GameVariable.AMMO6,
                                          vzd.GameVariable.AMMO7,
                                          vzd.GameVariable.AMMO8,
                                          vzd.GameVariable.AMMO9,
                                          vzd.GameVariable.ARMOR,
                                          vzd.GameVariable.HEALTH,
                                          vzd.GameVariable.POSITION_X,
                                          vzd.GameVariable.POSITION_Y,
                                          vzd.GameVariable.POSITION_Z,
                                          vzd.GameVariable.SELECTED_WEAPON,
                                          vzd.GameVariable.SELECTED_WEAPON_AMMO,
                                          vzd.GameVariable.WEAPON0,
                                          vzd.GameVariable.WEAPON1,
                                          vzd.GameVariable.WEAPON2,
                                          vzd.GameVariable.WEAPON3,
                                          vzd.GameVariable.WEAPON4,
                                          vzd.GameVariable.WEAPON5,
                                          vzd.GameVariable.WEAPON6,
                                          vzd.GameVariable.WEAPON7,
                                          vzd.GameVariable.WEAPON8,
                                          vzd.GameVariable.WEAPON9,
                                          vzd.GameVariable.DAMAGE_TAKEN,
                                          vzd.GameVariable.HITCOUNT
                                          ])
        
        # Start the game 
        self.game.init()
        
        # Get game variables:
        self.damage_taken = 0
        self.hitcount = 0
        self.ammo = 52
    
        # Create the action space and observation space
        self.observation_space = Box(low=0, high=255, shape=(160,120,1), dtype=np.uint8)
        self.action_space = Discrete(14)
        
    # This is how we take a step in the environment
    def step(self, action):
        # Specify action and take step 
        actions = np.identity(14)
        action_reward = self.game.make_action(actions[action], 4) 
        
        # Get all the other stuff we need to return 
        if self.game.get_state(): 
            state = self.game.get_state().screen_buffer
            state = self.grayscale(state)
            
            ammo0 = self.game.get_state().game_variables[0]
            ammo1 = self.game.get_state().game_variables[1]
            ammo2= self.game.get_state().game_variables[2]
            ammo3 = self.game.get_state().game_variables[3]
            ammo4 = self.game.get_state().game_variables[4]
            ammo5 = self.game.get_state().game_variables[5]
            ammo6 = self.game.get_state().game_variables[6]
            ammo7 = self.game.get_state().game_variables[7]
            ammo8 = self.game.get_state().game_variables[8]
            ammo9 = self.game.get_state().game_variables[9]
            armor = self.game.get_state().game_variables[10]
            health = self.game.get_state().game_variables[11] 
            pos_x = self.game.get_state().game_variables[12]
            pos_y = self.game.get_state().game_variables[13]
            pos_z = self.game.get_state().game_variables[14]
            selected_weapon = self.game.get_state().game_variables[15] 
            selected_weapon_ammo = self.game.get_state().game_variables[16] 
            weapon0 = self.game.get_state().game_variables[17]
            weapon1 = self.game.get_state().game_variables[18]
            weapon2 = self.game.get_state().game_variables[19]
            weapon3 = self.game.get_state().game_variables[20]
            weapon4 = self.game.get_state().game_variables[21]
            weapon5 =self.game.get_state().game_variables[22]
            weapon6 = self.game.get_state().game_variables[23]
            weapon7 = self.game.get_state().game_variables[24]
            weapon8 = self.game.get_state().game_variables[25]
            weapon9 = self.game.get_state().game_variables[26]
            damage_taken = self.game.get_state().game_variables[27]
            hitcount = self.game.get_state().game_variables[28]
            
            info = {"ammo0":ammo0, "ammo1":ammo1, "ammo2":ammo2, "ammo3":ammo3,
                    "ammo4":ammo4,"ammo5":ammo5,"ammo6":ammo6,"ammo7":ammo7, "ammo8":ammo8,
                    "ammo9":ammo9, "armor":armor, "health":health, "pos_x":pos_x, 
                    "pos_y":pos_y, "pos_z":pos_z, "selected_weapon":selected_weapon, 
                    "selected_weapon_ammo":selected_weapon_ammo, "weapon0":weapon0,
                    "weapon1":weapon1,"weapon2":weapon2,"weapon3":weapon3,
                    "weapon4":weapon4,"weapon5":weapon5,"weapon6":weapon6,
                    "weapon7":weapon7,"weapon8":weapon8,"weapon9":weapon9, 
                    'damage_taken':damage_taken, 'hitcount':hitcount}
            
            # Calculate rewards:
            total_damage_taken = -damage_taken + self.damage_taken
            self.damage_taken = total_damage_taken
            total_hitcount = hitcount - self.hitcount
            total_ammo = ammo0 + ammo1 + ammo2 + ammo3 + ammo4 + ammo5 + ammo6 + ammo7 + ammo8 + ammo9 - self.ammo
            self.ammo = total_ammo
            
            reward = action_reward + total_damage_taken*10 + total_hitcount*200 + total_ammo*5
            
            truncated = False
        else: 
            state = np.zeros(self.observation_space.shape)
            info = 0
            reward = 0
            truncated = True
        
        info = {"info":info}
        done = self.game.is_episode_finished()
        
        return state, reward, done, truncated, info 
    
    # Define how to render the game or environment 
    def render(): 
        pass
    
    # What happens when we start a new game 
    def reset(self):
        self.game.new_episode()
        state = self.game.get_state().screen_buffer
        info = 0
        info = {"info":info}

        return self.grayscale(state), info
    
    
    # Grayscale the game frame and resize it 
    def grayscale(self, observation):
        gray = cv2.cvtColor(np.moveaxis(observation, 0, -1), cv2.COLOR_BGR2GRAY)
        resize = cv2.resize(gray, (160,120), interpolation=cv2.INTER_CUBIC)
        state = np.reshape(resize, (160,120,1))
        return state
    

    
    # Call to close down the game
    def close(self): 
        self.game.close()
        
# ENVIROMENT CHECK:        
# env = VizDoomGym(render=True)

# state = env.reset()

# env_checker.check_env(env)

# TRAIN MODEL

env = VizDoomGym()

checkpoint_callback = CheckpointCallback(save_freq=50000, save_path=CHECKPOINT_DIR, 
                                         save_replay_buffer=True, save_vecnormalize=True)
eval_callback = EvalCallback(env, best_model_save_path=CHECKPOINT_DIR, log_path=LOG_DIR, 
                             eval_freq=50000, deterministic=False, render=True, verbose=1)

callback = CallbackList([checkpoint_callback, eval_callback])

model = A2C('CnnPolicy', env, tensorboard_log=LOG_DIR, verbose=1, learning_rate=0.0001, n_steps=8192)
# model = A2C.load('/home/joaquin/TFM/Doom_RL/train/train_basic/best_model_1800000', env)
model.learn(total_timesteps=3000000, callback=callback, progress_bar=True)
model.save('vizdoom_A2C')
env.close()`

from vizdoom.

mwydmuch avatar mwydmuch commented on July 30, 2024

How often does it happen? I'm running your code using Stable-Baselines3 2.0.0a5 alpha (one with Gymnasium support), installed in the following way:

pip install "sb3_contrib>=2.0.0a1" --upgrade
pip install "stable_baselines3>=2.0.0a1" --upgrade

and I don't see any problem with the reset method after 200k timesteps. I'm afraid I will need more details to help you. Details about your environment, and detailed instructions on how to reproduce the problem (and how it occurs).

from vizdoom.

mwydmuch avatar mwydmuch commented on July 30, 2024

@MetallicaSPA, I replicated your environment and ran a slightly modified script (I attached the modified version below). I've just changed paths to config/log/model files. After 3mln of timesteps, no error. Checked deathmatch and deadly corridor environments.

So at the moment, I think the reason might be that your .cfg or .wad files were somehow modified and, for example, now allow the agent to be killed before the episode starts. This is, for example, possible if the episode's start_time in the config is set to a large number. If you are sure that your .cfg/.wad files were not modified, then I will need to ask you to prepare a docker file that I can run to replicate the problem.

import vizdoom as vzd
import numpy as np
import cv2
import os 

from vizdoom import *
from gymnasium import Env
from gymnasium.spaces import Discrete, Box
from stable_baselines3.common.callbacks import CallbackList, EvalCallback, ProgressBarCallback, CheckpointCallback
from stable_baselines3 import A2C

SCENARIO = "deadly_corridor"
DEFAULT_CONFIG = os.path.join(scenarios_path, f"{SCENARIO}.cfg")
CHECKPOINT_DIR = f'./vizdoom_train/train_{SCENARIO}'
LOG_DIR = f'./vizdoom_logs/log_{SCENARIO}'

render = False # True will show the window while training, False don't but will make the training faster

class VizDoomGym(Env): 
    # Function that is called when we start the env
    def __init__(self, render=render): 
        # Inherit from Env
        super().__init__()
        # Setup the game
        self.game = vzd.DoomGame()
        self.game.load_config(DEFAULT_CONFIG)
        
        self.game.set_doom_game_path("doom2.wad")
        self.game.set_render_hud(False)
        #self.game.set_screen_resolution(vzd.ScreenResolution.RES_640X480)
        self.game.set_screen_resolution(vzd.ScreenResolution.RES_160X120)
        # Set cv2 friendly format.
        # self.game.set_screen_format(vzd.ScreenFormat.BGR24)
        
        # Enables labeling of the in game objects.
        self.game.set_labels_buffer_enabled(True)
        # Enables depth buffer (turned off by default).
        self.game.set_depth_buffer_enabled(True)
        
        # Render frame logic
        if render == False: 
            self.game.set_window_visible(False)
        else:
            self.game.set_window_visible(True)
        
        self.game.clear_available_game_variables()
        self.game.set_available_game_variables([
                                          vzd.GameVariable.AMMO0,
                                          vzd.GameVariable.AMMO1,
                                          vzd.GameVariable.AMMO2,
                                          vzd.GameVariable.AMMO3,
                                          vzd.GameVariable.AMMO4,
                                          vzd.GameVariable.AMMO5,
                                          vzd.GameVariable.AMMO6,
                                          vzd.GameVariable.AMMO7,
                                          vzd.GameVariable.AMMO8,
                                          vzd.GameVariable.AMMO9,
                                          vzd.GameVariable.ARMOR,
                                          vzd.GameVariable.HEALTH,
                                          vzd.GameVariable.POSITION_X,
                                          vzd.GameVariable.POSITION_Y,
                                          vzd.GameVariable.POSITION_Z,
                                          vzd.GameVariable.SELECTED_WEAPON,
                                          vzd.GameVariable.SELECTED_WEAPON_AMMO,
                                          vzd.GameVariable.WEAPON0,
                                          vzd.GameVariable.WEAPON1,
                                          vzd.GameVariable.WEAPON2,
                                          vzd.GameVariable.WEAPON3,
                                          vzd.GameVariable.WEAPON4,
                                          vzd.GameVariable.WEAPON5,
                                          vzd.GameVariable.WEAPON6,
                                          vzd.GameVariable.WEAPON7,
                                          vzd.GameVariable.WEAPON8,
                                          vzd.GameVariable.WEAPON9,
                                          vzd.GameVariable.DAMAGE_TAKEN,
                                          vzd.GameVariable.HITCOUNT
                                          ])
        
        # Start the game 
        self.game.init()
        
        # Get game variables:
        self.damage_taken = 0
        self.hitcount = 0
        self.ammo = 52
    
        # Create the action space and observation space
        self.observation_space = Box(low=0, high=255, shape=(160,120,1), dtype=np.uint8)
        self.action_space = Discrete(14)
        
    # This is how we take a step in the environment
    def step(self, action):
        # Specify action and take step 
        actions = np.identity(14)
        action_reward = self.game.make_action(actions[action], 4) 
        
        # Get all the other stuff we need to return 
        if self.game.get_state(): 
            state = self.game.get_state().screen_buffer
            state = self.grayscale(state)
            
            ammo0 = self.game.get_state().game_variables[0]
            ammo1 = self.game.get_state().game_variables[1]
            ammo2 = self.game.get_state().game_variables[2]
            ammo3 = self.game.get_state().game_variables[3]
            ammo4 = self.game.get_state().game_variables[4]
            ammo5 = self.game.get_state().game_variables[5]
            ammo6 = self.game.get_state().game_variables[6]
            ammo7 = self.game.get_state().game_variables[7]
            ammo8 = self.game.get_state().game_variables[8]
            ammo9 = self.game.get_state().game_variables[9]
            armor = self.game.get_state().game_variables[10]
            health = self.game.get_state().game_variables[11] 
            pos_x = self.game.get_state().game_variables[12]
            pos_y = self.game.get_state().game_variables[13]
            pos_z = self.game.get_state().game_variables[14]
            selected_weapon = self.game.get_state().game_variables[15] 
            selected_weapon_ammo = self.game.get_state().game_variables[16] 
            weapon0 = self.game.get_state().game_variables[17]
            weapon1 = self.game.get_state().game_variables[18]
            weapon2 = self.game.get_state().game_variables[19]
            weapon3 = self.game.get_state().game_variables[20]
            weapon4 = self.game.get_state().game_variables[21]
            weapon5 = self.game.get_state().game_variables[22]
            weapon6 = self.game.get_state().game_variables[23]
            weapon7 = self.game.get_state().game_variables[24]
            weapon8 = self.game.get_state().game_variables[25]
            weapon9 = self.game.get_state().game_variables[26]
            damage_taken = self.game.get_state().game_variables[27]
            hitcount = self.game.get_state().game_variables[28]
            
            info = {"ammo0":ammo0, "ammo1":ammo1, "ammo2":ammo2, "ammo3":ammo3,
                    "ammo4":ammo4,"ammo5":ammo5,"ammo6":ammo6,"ammo7":ammo7, "ammo8":ammo8,
                    "ammo9":ammo9, "armor":armor, "health":health, "pos_x":pos_x, 
                    "pos_y":pos_y, "pos_z":pos_z, "selected_weapon":selected_weapon, 
                    "selected_weapon_ammo":selected_weapon_ammo, "weapon0":weapon0,
                    "weapon1":weapon1,"weapon2":weapon2,"weapon3":weapon3,
                    "weapon4":weapon4,"weapon5":weapon5,"weapon6":weapon6,
                    "weapon7":weapon7,"weapon8":weapon8,"weapon9":weapon9, 
                    'damage_taken':damage_taken, 'hitcount':hitcount}
            
            # Calculate rewards:
            total_damage_taken = -damage_taken + self.damage_taken
            self.damage_taken = total_damage_taken
            total_hitcount = hitcount - self.hitcount
            total_ammo = ammo0 + ammo1 + ammo2 + ammo3 + ammo4 + ammo5 + ammo6 + ammo7 + ammo8 + ammo9 - self.ammo
            self.ammo = total_ammo
            
            reward = action_reward + total_damage_taken*10 + total_hitcount*200 + total_ammo*5
            
            truncated = False
        else: 
            state = np.zeros(self.observation_space.shape)
            info = 0
            reward = 0
            truncated = True
        
        info = {"info":info}
        done = self.game.is_episode_finished()
        
        return state, reward, done, truncated, info 
    
    # Define how to render the game or environment 
    def render(): 
        pass
    
    # What happens when we start a new game 
    def reset(self):
        self.game.new_episode()
        state = self.game.get_state().screen_buffer
        info = 0
        info = {"info":info}
        #print("Reseting!")

        return self.grayscale(state), info
    
    
    # Grayscale the game frame and resize it 
    def grayscale(self, observation):
        gray = cv2.cvtColor(np.moveaxis(observation, 0, -1), cv2.COLOR_BGR2GRAY)
        resize = cv2.resize(gray, (160,120), interpolation=cv2.INTER_CUBIC)
        state = np.reshape(resize, (160,120,1))
        return state
    

    
    # Call to close down the game
    def close(self): 
        self.game.close()
        
# ENVIROMENT CHECK:        
# env = VizDoomGym(render=True)

# state = env.reset()

# env_checker.check_env(env)

# TRAIN MODEL

env = VizDoomGym(render=True)

checkpoint_callback = CheckpointCallback(save_freq=50000, save_path=CHECKPOINT_DIR, 
                                         save_replay_buffer=True, save_vecnormalize=True)
eval_callback = EvalCallback(env, best_model_save_path=CHECKPOINT_DIR, log_path=LOG_DIR, 
                             eval_freq=50000, deterministic=False, render=True, verbose=1)

callback = CallbackList([checkpoint_callback, eval_callback])

model = A2C('CnnPolicy', env, verbose=1, learning_rate=0.0001, n_steps=8192)
# model = A2C.load('/home/joaquin/TFM/Doom_RL/train/train_basic/best_model_1800000', env)
model.learn(total_timesteps=3000000, callback=callback, progress_bar=True)
model.save('vizdoom_A2C')
env.close()

from vizdoom.

mwydmuch avatar mwydmuch commented on July 30, 2024

Happy that we've figured this out! :)

from vizdoom.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.