-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
I had an issue when trying to perform a training run on the GPU, which appeared to be caused by reference and predicted data being stored on different devices leading to errors like RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu).
I can fix this by explicitly allocating the reference data (energies, forces and coords) to the GPU (https://github.com/SimonBoothroyd/descent/blob/92a139604f4b166a6ab040e5e8e8b8a70fa719d8/descent/targets/energy.py#L110):
energy_ref = entry["energy"].cuda()
forces_ref = entry["forces"].reshape(len(energy_ref), -1, 3).cuda()
coords = (
entry["coords"]
.reshape(len(energy_ref), -1, 3)
.detach()
.requires_grad_(True).cuda()
)
but likely something smarter is needed that can deal with CPU/GPU runs.
Metadata
Metadata
Assignees
Labels
No labels