Однако при передаче четного количества графических процессоров, например 0,1 или 0,1,2,3, я получаю следующая ошибка:-
Код: Выделить всё
RuntimeError: Caught RuntimeError in replica 0 on device 6.
Original Traceback (most recent call last):
File "/raid/training_data/motor_insurance/env/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/raid/training_data/motor_insurance/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/raid/training_data/motor_insurance/env/lib/python3.8/site-packages/torchvision/models/detection/generalized_rcnn.py", line 83, in forward
images, targets = self.transform(images, targets)
File "/raid/training_data/motor_insurance/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/raid/training_data/motor_insurance/env/lib/python3.8/site-packages/torchvision/models/detection/transform.py", line 129, in forward
image = self.normalize(image)
File "/raid/training_data/motor_insurance/env/lib/python3.8/site-packages/torchvision/models/detection/transform.py", line 157, in normalize
return (image - mean[:, None, None]) / std[:, None, None]
RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 0
Я перепробовал все, но думаю, что что-то не так с самим кодом Pytorch
Подробнее здесь: https://stackoverflow.com/questions/791 ... n-training