Code Monkey home page Code Monkey logo

Comments (8)

mfe7 avatar mfe7 commented on June 5, 2024

so the ./train.sh script first enters the virtualenv that was created when installing the dependencies, then starts the python training script. did ./install.sh complete successfully?

i would guess that the virtualenv is different than either of those two python versions -- one way to check is to enter the virtualenv by source venv/bin/activate (depending on the path of your virtualenv), then start an interactive python session (confirm that which python points to a version within your virtualenv, not a system-wide python), then try import numpy.

from rl_collision_avoidance.

BingHan0458 avatar BingHan0458 commented on June 5, 2024

Thank you very much! This bug has been solved. But there is another error when running this command: "./train.sh TrainPhase1" as follows:
Entered virtualenv.

Running GA3C-CADRL gym-collision-avoidance training script (TrainPhase1)

Traceback (most recent call last):
File "Run.py", line 74, in
Server().main()
File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Server.py", line 43, in init
self.training_q = Queue(maxsize=Config.MAX_QUEUE_SIZE)
AttributeError: module 'Config' has no attribute 'MAX_QUEUE_SIZE'

but I search the Config.py and there are "self.MAX_QUEUE_SIZE = 100 # Max size of the queue". I don't know why?

from rl_collision_avoidance.

mfe7 avatar mfe7 commented on June 5, 2024

hmm. the Config object should get loaded from the Config.py in this repo, but it's possible it is loading the default Config.py from the gym_collision_avoidance directory if the path isn't set correctly. Could you add print(Config.__dict__) to the line above the one that causes this error, which will show all the attributes of the config object, which would help us debug whether it's loading the right class?

Also, when pasting any code/terminal logs, it's helpful to use this formatting:

Traceback (most recent call last):
File "Run.py", line 74, in
Server().main()
File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Server.py", line 43, in init
self.training_q = Queue(maxsize=Config.MAX_QUEUE_SIZE)
AttributeError: module 'Config' has no attribute 'MAX_QUEUE_SIZE'

from rl_collision_avoidance.

BingHan0458 avatar BingHan0458 commented on June 5, 2024

when I add print(Config.__dict__) to Server.py and run ./train.sh TrainPhase1, the result is as follows:

{'name': 'Config',
'doc': None,
'package': '',
'loader': <_frozen_importlib_external.SourceFileLoader object at 0x7f0deb8f2c50>,
'spec': ModuleSpec(name='Config', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f0deb8f2c50>, origin='/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Config.py'),
'file': '/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Config.py',
'cached': '/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/pycache/Config.cpython-36.pyc',
'builtins': {'name': 'builtins', 'doc': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the nil' object; Ellipsis represents ...' in slices.", 'package': '', 'loader': <class '_frozen_importlib.BuiltinImporter'>, 'spec': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>), 'build_class': , 'import': , 'abs': , 'all': , 'any': , 'ascii': , 'bin': , 'callable': , 'chr': , 'compile': , 'delattr': , 'dir': , 'divmod': , 'eval': , 'exec': , 'format': , 'getattr': , 'globals': , 'hasattr': , 'hash': , 'hex': , 'id': , 'input': , 'isinstance': , 'issubclass': , 'iter': , 'len': , 'locals': , 'max': , 'min': , 'next': , 'oct': , 'ord': , 'pow': , 'print': , 'repr': , 'round': , 'setattr': , 'sorted': , 'sum': , 'vars': , 'None': None, 'Ellipsis': Ellipsis, 'NotImplemented': NotImplemented, 'False': False, 'True': True, 'bool': <class 'bool'>, 'memoryview': <class 'memoryview'>, 'bytearray': <class 'bytearray'>, 'bytes': <class 'bytes'>, 'classmethod': <class 'classmethod'>, 'complex': <class 'complex'>, 'dict': <class 'dict'>, 'enumerate': <class 'enumerate'>, 'filter': <class 'filter'>, 'float': <class 'float'>, 'frozenset': <class 'frozenset'>, 'property': <class 'property'>, 'int': <class 'int'>, 'list': <class 'list'>, 'map': <class 'map'>, 'object': <class 'object'>, 'range': <class 'range'>, 'reversed': <class 'reversed'>, 'set': <class 'set'>, 'slice': <class 'slice'>, 'staticmethod': <class 'staticmethod'>, 'str': <class 'str'>, 'super': <class 'super'>, 'tuple': <class 'tuple'>, 'type': <class 'type'>, 'zip': <class 'zip'>, 'debug': True, 'BaseException': <class 'BaseException'>, 'Exception': <class 'Exception'>, 'TypeError': <class 'TypeError'>, 'StopAsyncIteration': <class 'StopAsyncIteration'>, 'StopIteration': <class 'StopIteration'>, 'GeneratorExit': <class 'GeneratorExit'>, 'SystemExit': <class 'SystemExit'>, 'KeyboardInterrupt': <class 'KeyboardInterrupt'>, 'ImportError': <class 'ImportError'>, 'ModuleNotFoundError': <class 'ModuleNotFoundError'>, 'OSError': <class 'OSError'>, 'EnvironmentError': <class 'OSError'>, 'IOError': <class 'OSError'>, 'EOFError': <class 'EOFError'>, 'RuntimeError': <class 'RuntimeError'>, 'RecursionError': <class 'RecursionError'>, 'NotImplementedError': <class 'NotImplementedError'>, 'NameError': <class 'NameError'>, 'UnboundLocalError': <class 'UnboundLocalError'>, 'AttributeError': <class 'AttributeError'>, 'SyntaxError': <class 'SyntaxError'>, 'IndentationError': <class 'IndentationError'>, 'TabError': <class 'TabError'>, 'LookupError': <class 'LookupError'>, 'IndexError': <class 'IndexError'>, 'KeyError': <class 'KeyError'>, 'ValueError': <class 'ValueError'>, 'UnicodeError': <class 'UnicodeError'>, 'UnicodeEncodeError': <class 'UnicodeEncodeError'>, 'UnicodeDecodeError': <class 'UnicodeDecodeError'>, 'UnicodeTranslateError': <class 'UnicodeTranslateError'>, 'AssertionError': <class 'AssertionError'>, 'ArithmeticError': <class 'ArithmeticError'>, 'FloatingPointError': <class 'FloatingPointError'>, 'OverflowError': <class 'OverflowError'>, 'ZeroDivisionError': <class 'ZeroDivisionError'>, 'SystemError': <class 'SystemError'>, 'ReferenceError': <class 'ReferenceError'>, 'BufferError': <class 'BufferError'>, 'MemoryError': <class 'MemoryError'>, 'Warning': <class 'Warning'>, 'UserWarning': <class 'UserWarning'>, 'DeprecationWarning': <class 'DeprecationWarning'>, 'PendingDeprecationWarning': <class 'PendingDeprecationWarning'>, 'SyntaxWarning': <class 'SyntaxWarning'>, 'RuntimeWarning': <class 'RuntimeWarning'>, 'FutureWarning': <class 'FutureWarning'>, 'ImportWarning': <class 'ImportWarning'>, 'UnicodeWarning': <class 'UnicodeWarning'>, 'BytesWarning': <class 'BytesWarning'>, 'ResourceWarning': <class 'ResourceWarning'>, 'ConnectionError': <class 'ConnectionError'>, 'BlockingIOError': <class 'BlockingIOError'>, 'BrokenPipeError': <class 'BrokenPipeError'>, 'ChildProcessError': <class 'ChildProcessError'>, 'ConnectionAbortedError': <class 'ConnectionAbortedError'>, 'ConnectionRefusedError': <class 'ConnectionRefusedError'>, 'ConnectionResetError': <class 'ConnectionResetError'>, 'FileExistsError': <class 'FileExistsError'>, 'FileNotFoundError': <class 'FileNotFoundError'>, 'IsADirectoryError': <class 'IsADirectoryError'>, 'NotADirectoryError': <class 'NotADirectoryError'>, 'InterruptedError': <class 'InterruptedError'>, 'PermissionError': <class 'PermissionError'>, 'ProcessLookupError': <class 'ProcessLookupError'>, 'TimeoutError': <class 'TimeoutError'>, 'open': , 'quit': Use quit() or Ctrl-D (i.e. EOF) to exit, 'exit': Use exit() or Ctrl-D (i.e. EOF) to exit, 'copyright': Copyright (c) 2001-2019 Python Software Foundation.
All Rights Reserved.
Copyright (c) 2000 BeOpen.com.
All Rights Reserved.
Copyright (c) 1995-2001 Corporation for National Research Initiatives.
All Rights Reserved.
Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
All Rights Reserved., 'credits': Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands for supporting Python development. See www.python.org for more information., 'license': Type license() to see the full license text, 'help': Type help() for interactive help, or help(object) for help about object., 'pybind11_internals_v3_gcc_libstdcpp_cxxabi1002': <capsule object NULL at 0x7f0c54f4c5d0>, 'pybind11_internals_v3': <capsule object NULL at 0x7f0c47298b70>},
'np': <module 'numpy' from '/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/numpy/init.py'>,
'sys': <module 'sys' (built-in)>,
'os': <module 'os' from '/usr/lib/python3.6/os.py'>,
'EnvConfig': <class 'gym_collision_avoidance.envs.config.Config'>,
'Train': <class 'Config.Train'>,
'TrainPhase1': <class 'Config.TrainPhase1'>,
'TrainPhase2': <class 'Config.TrainPhase2'>,
'TrainRegression': <class 'Config.TrainRegression'>
}

I think it the config objtct get loaded from the config.py in the same repo with train.sh: rl_collision_avoidance/ga3c/GA3C/Config.py and there is no such attribute MAX_QUEUE_SIZE in the result.

from rl_collision_avoidance.

BingHan0458 avatar BingHan0458 commented on June 5, 2024

I'm sorry, I still don't know how to solve the error above and I need your help, thank you very much!

from rl_collision_avoidance.

mfe7 avatar mfe7 commented on June 5, 2024

ok, thanks, that indicates that the config isn't being loaded from the right place. for me, in Server.py when I add print(Config.__dict__) I get:

{'MAX_NUM_AGENTS_IN_ENVIRONMENT': 4, 'MAX_NUM_AGENTS_TO_SIM': 4, 'STATES_IN_OBS': ['is_learning', 'num_other_agents', 'dist_to_goal', 'heading_ego_frame', 'pref_speed', 'radius', 'other_agents_states'], 'STATES_NOT_USED_IN_POLICY': ['is_learning'], 'MULTI_AGENT_ARCH_RNN': 0, 'MULTI_AGENT_ARCH_WEIGHT_SHARING': 1, 'MULTI_AGENT_ARCH_LASERSCAN': 2, 'MULTI_AGENT_ARCH': 0, 'MAX_NUM_OTHER_AGENTS_OBSERVED': 3, 'COLLISION_AVOIDANCE': True, 'continuous': 0, 'discrete': 1, 'ACTION_SPACE_TYPE': 0, 'ANIMATE_EPISODES': False, 'SHOW_EPISODE_PLOTS': False, 'SAVE_EPISODE_PLOTS': False, 'PLOT_CIRCLES_ALONG_TRAJ': True, 'ANIMATION_PERIOD_STEPS': 5, 'PLT_LIMITS': None, 'PLT_FIG_SIZE': (10, 8), 'USE_STATIC_MAP': False, 'TRAIN_MODE': True, 'PLAY_MODE': False, 'EVALUATE_MODE': False, 'REWARD_AT_GOAL': 1.0, 'REWARD_COLLISION_WITH_AGENT': -0.25, 'REWARD_COLLISION_WITH_WALL': -0.25, 'REWARD_GETTING_CLOSE': -0.1, 'REWARD_ENTERED_NORM_ZONE': -0.05, 'REWARD_TIME_STEP': 0.0, 'REWARD_WIGGLY_BEHAVIOR': 0.0, 'WIGGLY_BEHAVIOR_THRESHOLD': inf, 'COLLISION_DIST': 0.0, 'GETTING_CLOSE_RANGE': 0.2, 'SOCIAL_NORMS': 'none', 'DT': 0.2, 'NEAR_GOAL_THRESHOLD': 0.2, 'MAX_TIME_RATIO': 2.0, 'TEST_CASE_FN': 'get_testcase_random', 'TEST_CASE_ARGS': {'policy_to_ensure': 'learning_ga3c', 'policies': ['noncoop', 'learning_ga3c', 'static'], 'policy_distr': [0.05, 0.9, 0.05], 'speed_bnds': [0.5, 2.0], 'radius_bnds': [0.2, 0.8], 'side_length': [{'num_agents': [0, 5], 'side_length': [4, 5]}, {'num_agents': [5, inf], 'side_length': [6, 8]}]}, 'MAX_NUM_OTHER_AGENTS_IN_ENVIRONMENT': 3, 'PLOT_EVERY_N_EPISODES': 100, 'SENSING_HORIZON': inf, 'LASERSCAN_LENGTH': 512, 'LASERSCAN_NUM_PAST': 3, 'NUM_STEPS_IN_OBS_HISTORY': 1, 'NUM_PAST_ACTIONS_IN_STATE': 0, 'RVO_TIME_HORIZON': 5.0, 'RVO_COLLAB_COEFF': 0.5, 'RVO_ANTI_COLLAB_T': 1.0, 'TRAIN_SINGLE_AGENT': False, 'STATE_INFO_DICT': {'dist_to_goal': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [-inf, inf], 'attr': 'get_agent_data("dist_to_goal")', 'std': array([5.], dtype=float32), 'mean': array([0.], dtype=float32)}, 'radius': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [0, inf], 'attr': 'get_agent_data("radius")', 'std': array([1.], dtype=float32), 'mean': array([0.5], dtype=float32)}, 'heading_ego_frame': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [-3.141592653589793, 3.141592653589793], 'attr': 'get_agent_data("heading_ego_frame")', 'std': array([3.14], dtype=float32), 'mean': array([0.], dtype=float32)}, 'pref_speed': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [0, inf], 'attr': 'get_agent_data("pref_speed")', 'std': array([1.], dtype=float32), 'mean': array([1.], dtype=float32)}, 'num_other_agents': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [0, inf], 'attr': 'get_agent_data("num_other_agents_observed")', 'std': array([1.], dtype=float32), 'mean': array([1.], dtype=float32)}, 'other_agent_states': {'dtype': <class 'numpy.float32'>, 'size': 7, 'bounds': [-inf, inf], 'attr': 'get_agent_data("other_agent_states")', 'std': array([5., 5., 1., 1., 1., 5., 1.], dtype=float32), 'mean': array([0. , 0. , 0. , 0. , 0.5, 0. , 1. ], dtype=float32)}, 'other_agents_states': {'dtype': <class 'numpy.float32'>, 'size': (3, 7), 'bounds': [-inf, inf], 'attr': 'get_sensor_data("other_agents_states")', 'std': array([[5., 5., 1., 1., 1., 5., 1.],
       [5., 5., 1., 1., 1., 5., 1.],
       [5., 5., 1., 1., 1., 5., 1.]], dtype=float32), 'mean': array([[0. , 0. , 0. , 0. , 0.5, 0. , 1. ],
       [0. , 0. , 0. , 0. , 0.5, 0. , 1. ],
       [0. , 0. , 0. , 0. , 0.5, 0. , 1. ]], dtype=float32)}, 'laserscan': {'dtype': <class 'numpy.float32'>, 'size': (3, 512), 'bounds': [0.0, 6.0], 'attr': 'get_sensor_data("laserscan")', 'std': array([[5., 5., 5., ..., 5., 5., 5.],
       [5., 5., 5., ..., 5., 5., 5.],
       [5., 5., 5., ..., 5., 5., 5.]], dtype=float32), 'mean': array([[5., 5., 5., ..., 5., 5., 5.],
       [5., 5., 5., ..., 5., 5., 5.],
       [5., 5., 5., ..., 5., 5., 5.]], dtype=float32)}, 'is_learning': {'dtype': <class 'numpy.float32'>, 'size': 1, 'bounds': [0.0, 1.0], 'attr': 'get_agent_data_equiv("policy.str", "learning")'}, 'other_agents_states_encoded': {'dtype': <class 'numpy.float32'>, 'size': 100.0, 'bounds': [0.0, 1.0], 'attr': 'get_sensor_data("other_agents_states_encoded")'}}, 'MEAN_OBS': {'num_other_agents': array([1.], dtype=float32), 'dist_to_goal': array([0.], dtype=float32), 'heading_ego_frame': array([0.], dtype=float32), 'pref_speed': array([1.], dtype=float32), 'radius': array([0.5], dtype=float32), 'other_agents_states': array([[0. , 0. , 0. , 0. , 0.5, 0. , 1. ],
       [0. , 0. , 0. , 0. , 0.5, 0. , 1. ],
       [0. , 0. , 0. , 0. , 0.5, 0. , 1. ]], dtype=float32)}, 'STD_OBS': {'num_other_agents': array([1.], dtype=float32), 'dist_to_goal': array([5.], dtype=float32), 'heading_ego_frame': array([3.14], dtype=float32), 'pref_speed': array([1.], dtype=float32), 'radius': array([1.], dtype=float32), 'other_agents_states': array([[5., 5., 1., 1., 1., 5., 1.],
       [5., 5., 1., 1., 1., 5., 1.],
       [5., 5., 1., 1., 1., 5., 1.]], dtype=float32)}, 'AGENT_SORTING_METHOD': 'closest_first', 'game_grid': 0, 'game_ale': 1, 'game_collision_avoidance': 2, 'GAME_CHOICE': 2, 'USE_WANDB': False, 'WANDB_PROJECT_NAME': 'ga3c_cadrl', 'DEBUG': False, 'RANDOM_SEED_1000': 0, 'USE_IMAGE': False, 'NN_INPUT_AVG_VECTOR': array([1. , 0. , 0. , 1. , 0.5, 0. , 0. , 0. , 0. , 0.5, 0. , 1. , 0. ,
       0. , 0. , 0. , 0.5, 0. , 1. , 0. , 0. , 0. , 0. , 0.5, 0. , 1. ]), 'NN_INPUT_STD_VECTOR': array([1.  , 5.  , 3.14, 1.  , 1.  , 5.  , 5.  , 1.  , 1.  , 1.  , 5.  ,
       1.  , 5.  , 5.  , 1.  , 1.  , 1.  , 5.  , 1.  , 5.  , 5.  , 1.  ,
       1.  , 1.  , 5.  , 1.  ]), 'NN_INPUT_SIZE': 26, 'FIRST_STATE_INDEX': 1, 'HOST_AGENT_OBSERVATION_LENGTH': 4, 'OTHER_AGENT_OBSERVATION_LENGTH': 7, 'OTHER_AGENT_FULL_OBSERVATION_LENGTH': 7, 'HOST_AGENT_STATE_SIZE': 4, 'NUM_ACTIONS': 11, 'LOAD_RL_THEN_TRAIN_RL': 0, 'TRAIN_ONLY_REGRESSION': 1, 'LOAD_REGRESSION_THEN_TRAIN_RL': 2, 'NET_ARCH': 'NetworkVP_rnn', 'ALL_ARCHS': ['NetworkVP_rnn'], 'NORMALIZE_INPUT': True, 'USE_DROPOUT': False, 'USE_REGULARIZATION': True, 'AGENTS': 32, 'PREDICTORS': 2, 'TRAINERS': 2, 'DEVICE': '/cpu:0', 'DYNAMIC_SETTINGS': False, 'DYNAMIC_SETTINGS_STEP_WAIT': 20, 'DYNAMIC_SETTINGS_INITIAL_WAIT': 10, 'DISCOUNT': 0.97, 'TIME_MAX': 20, 'MAX_QUEUE_SIZE': 100, 'PREDICTION_BATCH_SIZE': 128, 'MIN_POLICY': 0.0, 'OPT_RMSPROP': 0, 'OPT_ADAM': 1, 'OPTIMIZER': 1, 'LEARNING_RATE_RL_START': 2e-05, 'LEARNING_RATE_RL_END': 2e-05, 'RMSPROP_DECAY': 0.99, 'RMSPROP_MOMENTUM': 0.0, 'RMSPROP_EPSILON': 0.1, 'BETA_START': 0.0001, 'BETA_END': 0.0001, 'USE_GRAD_CLIP': False, 'GRAD_CLIP_NORM': 40.0, 'LOG_EPSILON': 1e-06, 'TRAINING_MIN_BATCH_SIZE': 100, 'TENSORBOARD': True, 'TENSORBOARD_UPDATE_FREQUENCY': 100, 'SAVE_MODELS': True, 'SAVE_FREQUENCY': 50000, 'SPECIAL_EPISODES_TO_SAVE': [1490000, 1500000], 'PRINT_STATS_FREQUENCY': 1, 'STAT_ROLLING_MEAN_WINDOW': 1000, 'RESULTS_FILENAME': 'results.txt', 'NETWORK_NAME': 'network', 'TRAIN_VERSION': 2, 'LOAD_FROM_WANDB_RUN_ID': 'run-rnn', 'EPISODE_NUMBER_TO_LOAD': 0, 'EPISODES': 1500000, 'ANNEALING_EPISODE_COUNT': 1500000}

which includes MAX_QUEUE_SIZE as desired.

The gym_collision_avoidance/envs/__init__.py file is where the Config class is instantiated, with some hacking that allows users to choose which config class to use via 2 environment variables. For example, in the train.sh script, we have default values for these env variables (the path to the RL config.py file, and the name of the class within that file).

I'm pretty sure something has gone wrong in that __init__.py, and you could try to debug that file (the last line is Config = config_class(), so you could check that Config object has the right attributes). Right now it seems like your Config object is referring to the whole Config.py python file as an object.

from rl_collision_avoidance.

BingHan0458 avatar BingHan0458 commented on June 5, 2024

Thank you very much! This bug has been solved due to the import Config. But there is also another error when running this command: ./train.sh TrainPhase1 as follows:

Entered virtualenv.
--------------------------------------------------------------------------------------------------------
Running GA3C-CADRL gym-collision-avoidance training script (TrainPhase1)
--------------------------------------------------------------------------------------------------------
[Server] Making model...
[Server] Loading Regression Model then training RL.
[NetworkVPCore] Loading checkpoint file: /home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/checkpoints/regression/wandb/run-rnn/checkpoints/network_00000000
Traceback (most recent call last):
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: not an sstable (bad magic number)
	 [[{{node save/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Run.py", line 77, in <module>
    Server().main()
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Server.py", line 72, in __init__
    self.model.load(learning_method='regression')
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/NetworkVPCore.py", line 249, in load
    self.saver.restore(self.sess, filename)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1290, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: not an sstable (bad magic number)
	 [[node save/RestoreV2 (defined at /home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/RestoreV2':
  File "Run.py", line 77, in <module>
    Server().main()
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Server.py", line 63, in __init__
    self.model = self.make_model()
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/Server.py", line 89, in make_model
    return globals()[Config.NET_ARCH](Config.DEVICE, Config.NETWORK_NAME, self.num_actions) # TODO can probably change Config.NETWORK_NAME to Config.NET_ARCH
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/NetworkVP_rnn.py", line 41, in __init__
    super(self.__class__, self).__init__(device, model_name, num_actions)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/ga3c/GA3C/NetworkVPCore.py", line 60, in __init__
    self.saver = tf.compat.v1.train.Saver({var.name: var for var in vars}, max_to_keep=0)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/hanbin/catkin_ws/src/CADRL/rl_collision_avoidance/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

I guess it is due to the checkpoints with episode, and does the network_00000000 is the trained network or the network we need to train? I really don't have any solution about it. how to solved it? could you help me? Thank you very much!

from rl_collision_avoidance.

mfe7 avatar mfe7 commented on June 5, 2024

network_00000000 has a network that was trained with regression only, so it provides a good starting point for training with RL. It does seem like this is an issue with loading that checkpoint, and maybe there are some clues here or here?

from rl_collision_avoidance.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.