Недавно я изменил свой инструмент обработки данных с xarray на Polars и использую pl.DataFrame.to_torch() для создания тензора для обучения моей модели Pytorch. Формат источника данных — файл паркета.
Чтобы избежать разветвления дочерних процессов, я использую torch.multiprocessing.spawn, чтобы запустить процесс обучения, однако процесс аварийно завершился из-за этого:
/home/username/.conda/envs/torchhydro1/bin/python3.11 -X pycache_prefix=/home/username/.cache/JetBrains/IntelliJIdea2024.3/cpython-cache /home/username/.local/share/JetBrains/IntelliJIdea2024.3/python-ce/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --port 29781 --file /home/username/torchhydro/experiments/train_with_era5land_gnn_ddp.py
Console output is saving to: /home/username/torchhydro/experiments/results/train_gnn_ddp.txt
[20:38:51] DEBUG No module named 'forge' signatures.py:43
DEBUG No module named 'forge' signatures.py:43
[20:38:52] DEBUG Using selector: EpollSelector selector_events.py:54
……
DEBUG Using fontManager instance from font_manager.py:1580
/home/username/.cache/matplotlib/fontl
ist-v390.json
update config file
!!!!!!NOTE!!!!!!!!
-------Please make sure the PRECIPITATION variable is in the 1st location in var_t setting!!---------
If you have POTENTIAL_EVAPOTRANSPIRATION, please set it the 2nd!!!-
!!!!!!NOTE!!!!!!!!
-------Please make sure the STREAMFLOW variable is in the 1st location in var_out setting!!---------
[20:39:04] DEBUG No module named 'forge' signatures.py:43
DEBUG No module named 'forge' signatures.py:43
[20:39:06] DEBUG Using selector: EpollSelector selector_events.py:54
……
DEBUG Using fontManager instance from font_manager.py:1580
/home/username/.cache/matplotlib/fontl
ist-v390.json
……
Torch is using cuda:0
[2024-12-12 20:48:08,931] torch.distributed.distributed_c10d: [INFO] Using backend config: {'cuda': 'nccl'}
[W CUDAAllocatorConfig.h:30] Warning: expandable_segments not supported on this platform (function operator())
using 8 workers
Pin memory set to True
0%| | 0/22986 [00:00
Подробнее здесь: [url]https://stackoverflow.com/questions/79275700/why-there-is-unpickling-error-when-using-polars-to-read-data-for-pytorch[/url]
Недавно я изменил свой инструмент обработки данных с xarray на Polars и использую pl.DataFrame.to_torch() для создания тензора для обучения моей модели Pytorch. Формат источника данных — файл паркета. Чтобы избежать разветвления дочерних процессов, я использую torch.multiprocessing.spawn, чтобы запустить процесс обучения, однако процесс аварийно завершился из-за этого: [code]/home/username/.conda/envs/torchhydro1/bin/python3.11 -X pycache_prefix=/home/username/.cache/JetBrains/IntelliJIdea2024.3/cpython-cache /home/username/.local/share/JetBrains/IntelliJIdea2024.3/python-ce/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --port 29781 --file /home/username/torchhydro/experiments/train_with_era5land_gnn_ddp.py Console output is saving to: /home/username/torchhydro/experiments/results/train_gnn_ddp.txt [20:38:51] DEBUG No module named 'forge' signatures.py:43 DEBUG No module named 'forge' signatures.py:43 [20:38:52] DEBUG Using selector: EpollSelector selector_events.py:54 …… DEBUG Using fontManager instance from font_manager.py:1580 /home/username/.cache/matplotlib/fontl ist-v390.json update config file !!!!!!NOTE!!!!!!!! -------Please make sure the PRECIPITATION variable is in the 1st location in var_t setting!!--------- If you have POTENTIAL_EVAPOTRANSPIRATION, please set it the 2nd!!!- !!!!!!NOTE!!!!!!!! -------Please make sure the STREAMFLOW variable is in the 1st location in var_out setting!!--------- [20:39:04] DEBUG No module named 'forge' signatures.py:43 DEBUG No module named 'forge' signatures.py:43 [20:39:06] DEBUG Using selector: EpollSelector selector_events.py:54 …… DEBUG Using fontManager instance from font_manager.py:1580 /home/username/.cache/matplotlib/fontl ist-v390.json …… Torch is using cuda:0 [2024-12-12 20:48:08,931] torch.distributed.distributed_c10d: [INFO] Using backend config: {'cuda': 'nccl'} [W CUDAAllocatorConfig.h:30] Warning: expandable_segments not supported on this platform (function operator()) using 8 workers Pin memory set to True 0%| | 0/22986 [00:00