-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hi,
I'm trying to train a new policy. Unfortunately, whether I train teacher or student, aborted occurs when creating the environment.
Take "mlp.py" as an example, specifically, the error occurs on line 95 in "dclaw_multiobjs,py" : "self.gym.begin_aggregate(env_ptr, max_agg_bodies, max_agg_shapes, True)". The specific error is as follows:
Importing module 'gym_38' (/workspace/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so)
Setting GYM_USD_PLUG_INFO_PATH to /workspace/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
mlp.py:17: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path=dexenv.PROJECT_ROOT.joinpath('conf').as_posix(), config_name="debug_dclaw")
/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
PyTorch version 2.1.0+cu118
Device count 2
/workspace/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /root/.cache/torch_extensions/py38_cu118 as PyTorch extensions root...
Emitting ninja build file /root/.cache/torch_extensions/py38_cu118/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
/workspace/isaacgym/python/isaacgym/torch_utils.py:135: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def get_axis_params(value, axis_idx, x_value=0., dtype=np.float, n_dims=3):
2024-05-05 04:18:06.870 | INFO | dexenv.utils.create_task_env:create_task_env:19 - Creating environment DclawMultiObjs
2024-05-05 04:18:06.871 | WARNING | MODDclawMultiObjs:parse_obj_dataset:168 - Dataset path:/workspace/dexenv/dexenv/assets/miscnet/train
2024-05-05 04:18:06.877 | INFO | MODDclawMultiObjs:__init__:23 - Object urdf root path:/workspace/dexenv/dexenv/assets/miscnet/train.
2024-05-05 04:18:06.877 | INFO | MODDclawMultiObjs:__init__:24 - Number of available objects:150.
2024-05-05 04:18:06.877 | WARNING | dexenv.envs.dclaw_base:__init__:29 - Domain randomization is enabled!
Obs type: full
2024-05-05 04:18:06.879 | INFO | dexenv.envs.base.vec_task:__init__:89 - RL device:cuda
2024-05-05 04:18:06.879 | INFO | dexenv.envs.base.vec_task:__init__:90 - Simulation device:cuda:0
2024-05-05 04:18:06.879 | INFO | dexenv.envs.base.vec_task:__init__:91 - Graphics device:-1
/usr/local/lib/python3.8/dist-packages/gym/spaces/box.py:84: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
[Warning] [carb.gym.plugin] useGpuPipeline is set, forcing GPU PhysX
[Warning] [carb.gym.plugin] useGpu is set, forcing single scene (0 subscenes)
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
2024-05-05 04:18:07.985 | INFO | dexenv.envs.dclaw_base:get_dclaw_asset:455 - VHACD:True
/workspace/dexenv/dexenv/envs/dclaw_base.py:461: DeprecationWarning: an integer is required (got type isaacgym._bindings.linux-x86_64.gym_38.DofDriveMode). Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
asset_options.default_dof_drive_mode = gymapi.DOF_MODE_POS
Dclaw asset root:/workspace/dexenv/dexenv/assets/dclaw_4f robot name:dclaw_4f
D-Claw:
Number of bodies: 21
Number of shapes: 105
Number of dofs: 12
2024-05-05 04:18:10.423 | INFO | dexenv.envs.dclaw_base:get_dclaw_asset:481 - Joint names:dict_keys(['four1_jnt', 'four2_jnt', 'four3_jnt', 'one1_jnt', 'one2_jnt', 'one3_jnt', 'three1_jnt', 'three2_jnt', 'three3_jnt', 'two1_jnt', 'two2_jnt', 'two3_jnt'])
Number of fingertips:4 Fingertips:['four_tip_link', 'one_tip_link', 'three_tip_link', 'two_tip_link']
Actuator --- DoF Index
four1_jnt 0
four2_jnt 1
four3_jnt 2
one1_jnt 3
one2_jnt 4
one3_jnt 5
three1_jnt 6
three2_jnt 7
three3_jnt 8
two1_jnt 9
two2_jnt 10
two3_jnt 11
Setting DOF velocity limit to:[6.55172413793103, 7.758620689655173, 7.7586206896551]
Setting DOF effort limit to:2.6
Setting stiffness to:[2.7724128689655174, 3.558619503448275, 3.5586195034482757]
Setting damping to:[0.273946924137931, 0.382384248275862, 0.382384248275862]
2024-05-05 04:18:10.460 | INFO | MODDclawMultiObjs:load_object_asset:207 - Loading object IDs from 0 to 150.
Loading Asset: 100%|███████████████████████████████████| 150/150 [01:18<00:00, 1.92it/s]
free(): invalid pointer
Aborted (core dumped)
This error occurs very randomly, when the number of environments is set differently, it will occur in the aggregation of different objects. Also, I tried it on different computers and the same thing happened all the time, so I don't think it was a problem with my computer.
When I try to cancel "begin_aggregate", I get no error when I create the environment, but then I get a new error: "RuntimeError: CUDA error: an illegal memory access was encountered" when training. Even though I reduced the number of environments to 1000, it also happens. A smaller number of environments would lead to a longer training time, and I don't want that. I wonder if the cancellation of aggregation caused this problem?
The computer I'm using has 250GB of RAM, an RTX A6000 GPU, and 48GB of video memory
In general, I wanted to solve the "free(): invalid pointer" problem caused by aggregation, but I didn't succeed. Does anyone have any idea about this situation? If more detailed information is needed, please let me know.
Thanks for any suggestion.