Код: Выделить всё
***** Running training *****
Num examples = 647
Num Epochs = 3
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 32
Gradient Accumulation steps = 4
Total optimization steps = 60
Number of trainable parameters = 25,165,824
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
/opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
return fn(*args, **kwargs)
You are not running the flash-attention implementation, expect numerical differences.
/opt/conda/lib/python3.11/site-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]
**Error operation not supported at line 383 in file /src/csrc/pythonInterface.cpp**
затем ядро умирает, ниже версию пакетов, которую я использую
Код: Выделить всё
transformers 4.44.2
torch 2.4.1
torchaudio 2.4.1
torchvision 0.19.1
accelerate 0.34.2
peft 0.12.0
при попытке в Google Colab работает тот же код, но в лаборатории Jupyter это не так работаю
Подробнее здесь: https://stackoverflow.com/questions/789 ... iner-train