Что происходит в этой точной настройке ИИ? - Цифровое Кемерово

Что происходит в этой точной настройке ИИ? ⇐ Linux

Ответить

1 сообщение • Страница 1 из 1

Anonymous

Что происходит в этой точной настройке ИИ?

Цитата

Сообщение Anonymous » 14 апр 2025, 09:34

Я следую учебной документации AMD ROCM, я уже настроил Docker и установил все необходимые зависимости, а также рассказывают об обучении модели искусственного интеллекта с их собственными файлами, уже настроенными, они называют эти файлы «рецепты», они говорят, что эти учебные рецепты готовы к использованию, и вы также можете изменить их, чтобы соответствовать им, например, в соответствии с вашим DATASET, что у вас есть, или изменить DATASET, что у вас есть, или изменить DATASEET, что у вас есть, или изменить DATASEET, что у вас есть, или изменить DATASEET, что у вас есть, или изменить DATASEET, что у вас есть, или изменить DATASEET, что у вас есть, или изменить DATATESET, что у вас есть, или изменить DATATESET. мой пользовательский рецепт: < /p>
output_dir: /workspace/notebooks/result/ # /tmp may be deleted by your system. Change it to your preference.

# Tokenizer
tokenizer:
_component_: torchtune.models.llama3.llama3_tokenizer
path: /workspace/notebooks/modello-preaddestrato/original/tokenizer.model
max_seq_len: null

# Dataset
dataset:
_component_: torchtune.datasets.chat_dataset
source: /workspace/notebooks/datasets/dataset.json
packed: False # True increases speed
conversation_column: conversations
conversation_style: chatml
seed: 42
shuffle: True

# Model Arguments
model:
_component_: torchtune.models.llama3_1.llama3_1_8b

checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /workspace/notebooks/modello-preaddestrato/
checkpoint_files: [
model-00001-of-00004.safetensors,
model-00002-of-00004.safetensors,
model-00003-of-00004.safetensors,
model-00004-of-00004.safetensors
]
recipe_checkpoint: null
output_dir: ${output_dir}
model_type: LLAMA3
resume_from_checkpoint: False

# Fine-tuning arguments
batch_size: 2
epochs: 2

optimizer:
_component_: torch.optim.AdamW
lr: 2e-5
fused: True
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
max_steps_per_epoch: null
clip_grad_norm: null
compile: False # torch.compile the model + loss, True increases speed + decreases memory
optimizer_in_bwd: False # True saves memory. Requires gradient_accumulation_steps=1
gradient_accumulation_steps: 4 # Use to increase effective batch size

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True # True reduces memory
enable_activation_offloading: False # True reduces memory
custom_sharded_layers: ['tok_embeddings', 'output'] # Layers to shard separately (useful for large vocab size models). Lower Memory, but lower speed.

# Reduced precision
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: ${output_dir}/logs
log_every_n_steps: 1
log_peak_memory_stats: True

# Profiler (disabled)
profiler:
_component_: torchtune.training.setup_torch_profiler
enabled: False

#Output directory of trace artifacts
output_dir: ${output_dir}/profiling_outputs

#`torch.profiler.ProfilerActivity` types to trace
cpu: True
cuda: True

#trace options passed to `torch.profiler.profile`
profile_memory: False
with_stack: False
record_shapes: True
with_flops: False

# `torch.profiler.schedule` options:
# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
wait_steps: 5
warmup_steps: 3
active_steps: 2
num_cycles: 1
< /code>
Я думаю, что я застрял, как только начинаю обучение, потому что, когда я запускаю команду: < /p>
tune run --nproc_per_node 2 full_finetune_distributed --config /workspace/notebooks/my_custom_config_distributed.yaml
< /code>
Терминал регистрирует файл YAML, а также другие журналы отладки и отладки, такие как: < /p>
Running with torchrun...
W0411 10:51:13.143000 9895 site-packages/torch/distributed/run.py:766]
W0411 10:51:13.143000 9895 site-packages/torch/distributed/run.py:766]
*****************************************
W0411 10:51:13.143000 9895 site-packages/torch/distributed/run.py:766] Setting
OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your
system being overloaded, please further tune the variable for optimal performance in your application as needed.

W0411 10:51:13.143000 9895 site-packages/torch/distributed/run.py:766]
*****************************************

INFO:torchtune.utils._logging:Running FullFinetuneRecipeDistributed with resolved config:

batch_size: 2
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /workspace/notebooks/modello-preaddestrato/
checkpoint_files:
- model-00001-of-00004.safetensors
- model-00002-of-00004.safetensors
- model-00003-of-00004.safetensors
- model-00004-of-00004.safetensors
model_type: LLAMA3
output_dir: /workspace/notebooks/result/
recipe_checkpoint: null
clip_grad_norm: null
compile: false
custom_sharded_layers:
- tok_embeddings
- output
dataset:
_component_: torchtune.datasets.chat_dataset
conversation_column: conversations
conversation_style: chatml
packed: false
source: /workspace/notebooks/datasets/dataset.json
device: cuda
dtype: bf16
enable_activation_checkpointing: true
enable_activation_offloading: false
epochs: 2
gradient_accumulation_steps: 4
log_every_n_steps: 1
log_peak_memory_stats: true
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
max_steps_per_epoch: null
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: /workspace/notebooks/result//logs
model:
_component_: torchtune.models.llama3_1.llama3_1_8b
optimizer:
_component_: torch.optim.AdamW
fused: true
lr: 2.0e-05
optimizer_in_bwd: false
output_dir: /workspace/notebooks/result/
profiler:
_component_: torchtune.training.setup_torch_profiler
active_steps: 2
cpu: true
cuda: true
enabled: false
num_cycles: 1
output_dir: /workspace/notebooks/result//profiling_outputs
profile_memory: false
record_shapes: true
wait_steps: 5
warmup_steps: 3
with_flops: false
with_stack: false
resume_from_checkpoint: false
seed: 42
shuffle: true
tokenizer:
_component_: torchtune.models.llama3.llama3_tokenizer
max_seq_len: null
path: /workspace/notebooks/modello-preaddestrato/original/tokenizer.model

INFO:torchtune.utils._logging:Running FullFinetuneRecipeDistributed with resolved config:

batch_size: 2
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /workspace/notebooks/modello-preaddestrato/
checkpoint_files:
- model-00001-of-00004.safetensors
- model-00002-of-00004.safetensors
- model-00003-of-00004.safetensors
- model-00004-of-00004.safetensors
model_type: LLAMA3
output_dir: /workspace/notebooks/result/
recipe_checkpoint: null
clip_grad_norm: null
compile: false
custom_sharded_layers:
- tok_embeddings
- output
dataset:
_component_: torchtune.datasets.chat_dataset
conversation_column: conversations
conversation_style: chatml
packed: false
source: /workspace/notebooks/datasets/dataset.json
device: cuda
dtype: bf16
enable_activation_checkpointing: true
enable_activation_offloading: false
epochs: 2
gradient_accumulation_steps: 4
log_every_n_steps: 1
log_peak_memory_stats: true
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
max_steps_per_epoch: null
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: /workspace/notebooks/result//logs
model:
_component_: torchtune.models.llama3_1.llama3_1_8b
optimizer:
_component_: torch.optim.AdamW
fused: true
lr: 2.0e-05
optimizer_in_bwd: false
output_dir: /workspace/notebooks/result/
profiler:
_component_: torchtune.training.setup_torch_profiler
active_steps: 2
cpu: true
cuda: true
enabled: false
num_cycles: 1
output_dir: /workspace/notebooks/result//profiling_outputs
profile_memory: false
record_shapes: true
wait_steps: 5
warmup_steps: 3
with_flops: false
with_stack: false
resume_from_checkpoint: false
seed: 42
shuffle: true
tokenizer:
_component_: torchtune.models.llama3.llama3_tokenizer
max_seq_len: null
path: /workspace/notebooks/modello-preaddestrato/original/tokenizer.model
INFO:torchtune.utils._logging:Hint: enable_activation_checkpointing is True, but enable_activation_offloading isn't. Enabling activation offloading should reduce memory further.
DEBUG:torchtune.utils._logging:Setting manual seed to local seed 42. Local seed is seed + rank = 42 + 0
Writing logs to /workspace/notebooks/result/logs/log_1744368681.txt
INFO:torchtune.utils._logging:Distributed training is enabled. Instantiating model and loading checkpoint on Rank 0 ...
< /code>
Проблема в том, что терминал прекратил регистрацию в течение примерно 4 часа, поэтому последний журнал: < /p>
INFO:torchtune.utils._logging:Distributed training is enabled. Instantiating model and loading checkpoint on Rank 0 ...
< /code>
без регистрации чего -либо еще, буквально нет ничего ниже последнего журнала, даже не в Терминальной подсказке, что заставляет меня думать, что он все еще работает, но, вероятно, это не так. Я не знаю, упускаю ли я другие зависимости, которые необходимо установить, и у меня их нет, или что -то связано с ROCM или некоторыми переменными среды, которые будут настроены в Docker. Это компоненты, которые у меня есть на сервере:
ЦП: AMD Ryzen 9 5900xt 16-Core
GPU: Amd Radeon ™ RX 7600 XT 16 ГБ

Подробнее здесь: https://stackoverflow.com/questions/795 ... ine-tuning

1744612459

Anonymous

 Я следую учебной документации AMD ROCM, я уже настроил Docker и установил все необходимые зависимости, а также рассказывают об обучении модели искусственного интеллекта с их собственными файлами, уже настроенными, они называют эти файлы «рецепты», они говорят, что эти учебные рецепты готовы к использованию, и вы также можете изменить их, чтобы соответствовать им, например, в соответствии с вашим DATASET, что у вас есть, или изменить DATASET, что у вас есть, или изменить DATASEET, что у вас есть, или изменить DATASEET, что у вас есть, или изменить DATASEET, что у вас есть, или изменить DATASEET, что у вас есть, или изменить DATASEET, что у вас есть, или изменить DATATESET, что у вас есть, или изменить DATATESET. мой пользовательский рецепт: < /p>
output_dir: /workspace/notebooks/result/ # /tmp may be deleted by your system. Change it to your preference.

# Tokenizer
tokenizer:
_component_: torchtune.models.llama3.llama3_tokenizer
path: /workspace/notebooks/modello-preaddestrato/original/tokenizer.model
max_seq_len: null

# Dataset
dataset:
_component_: torchtune.datasets.chat_dataset
source: /workspace/notebooks/datasets/dataset.json
packed: False  # True increases speed
conversation_column: conversations
conversation_style: chatml
seed: 42
shuffle: True

# Model Arguments
model:
_component_: torchtune.models.llama3_1.llama3_1_8b

checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /workspace/notebooks/modello-preaddestrato/
checkpoint_files: [
model-00001-of-00004.safetensors,
model-00002-of-00004.safetensors,
model-00003-of-00004.safetensors,
model-00004-of-00004.safetensors
]
recipe_checkpoint: null
output_dir: ${output_dir}
model_type: LLAMA3
resume_from_checkpoint: False

# Fine-tuning arguments
batch_size: 2
epochs: 2

optimizer:
_component_: torch.optim.AdamW
lr: 2e-5
fused: True
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
max_steps_per_epoch: null
clip_grad_norm: null
compile: False  # torch.compile the model + loss, True increases speed + decreases memory
optimizer_in_bwd: False  # True saves memory. Requires gradient_accumulation_steps=1
gradient_accumulation_steps: 4  # Use to increase effective batch size

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True  # True reduces memory
enable_activation_offloading: False  # True reduces memory
custom_sharded_layers: ['tok_embeddings', 'output']  # Layers to shard separately (useful for large vocab size models).  Lower Memory, but lower speed.

# Reduced precision
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: ${output_dir}/logs
log_every_n_steps: 1
log_peak_memory_stats: True

# Profiler (disabled)
profiler:
_component_: torchtune.training.setup_torch_profiler
enabled: False

#Output directory of trace artifacts
output_dir: ${output_dir}/profiling_outputs

#`torch.profiler.ProfilerActivity` types to trace
cpu: True
cuda: True

#trace options passed to `torch.profiler.profile`
profile_memory: False
with_stack: False
record_shapes: True
with_flops: False

# `torch.profiler.schedule` options:
# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles ->  repeat
wait_steps: 5
warmup_steps: 3
active_steps: 2
num_cycles: 1
< /code>
Я думаю, что я застрял, как только начинаю обучение, потому что, когда я запускаю команду: < /p>
tune run --nproc_per_node 2 full_finetune_distributed --config /workspace/notebooks/my_custom_config_distributed.yaml
< /code>
Терминал регистрирует файл YAML, а также другие журналы отладки и отладки, такие как: < /p>
Running with torchrun...
W0411 10:51:13.143000 9895 site-packages/torch/distributed/run.py:766]
W0411 10:51:13.143000 9895 site-packages/torch/distributed/run.py:766]
*****************************************
W0411 10:51:13.143000 9895 site-packages/torch/distributed/run.py:766] Setting
OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your
system being overloaded, please further tune the variable for optimal performance in your application as needed.

W0411 10:51:13.143000 9895 site-packages/torch/distributed/run.py:766]
*****************************************

INFO:torchtune.utils._logging:Running FullFinetuneRecipeDistributed with resolved config:

batch_size: 2
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /workspace/notebooks/modello-preaddestrato/
checkpoint_files:
- model-00001-of-00004.safetensors
- model-00002-of-00004.safetensors
- model-00003-of-00004.safetensors
- model-00004-of-00004.safetensors
model_type: LLAMA3
output_dir: /workspace/notebooks/result/
recipe_checkpoint: null
clip_grad_norm: null
compile: false
custom_sharded_layers:
- tok_embeddings
- output
dataset:
_component_: torchtune.datasets.chat_dataset
conversation_column: conversations
conversation_style: chatml
packed: false
source: /workspace/notebooks/datasets/dataset.json
device: cuda
dtype: bf16
enable_activation_checkpointing: true
enable_activation_offloading: false
epochs: 2
gradient_accumulation_steps: 4
log_every_n_steps: 1
log_peak_memory_stats: true
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
max_steps_per_epoch: null
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: /workspace/notebooks/result//logs
model:
_component_: torchtune.models.llama3_1.llama3_1_8b
optimizer:
_component_: torch.optim.AdamW
fused: true
lr: 2.0e-05
optimizer_in_bwd: false
output_dir: /workspace/notebooks/result/
profiler:
_component_: torchtune.training.setup_torch_profiler
active_steps: 2
cpu: true
cuda: true
enabled: false
num_cycles: 1
output_dir: /workspace/notebooks/result//profiling_outputs
profile_memory: false
record_shapes: true
wait_steps: 5
warmup_steps: 3
with_flops: false
with_stack: false
resume_from_checkpoint: false
seed: 42
shuffle: true
tokenizer:
_component_: torchtune.models.llama3.llama3_tokenizer
max_seq_len: null
path: /workspace/notebooks/modello-preaddestrato/original/tokenizer.model

INFO:torchtune.utils._logging:Running FullFinetuneRecipeDistributed with resolved config:

batch_size: 2
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /workspace/notebooks/modello-preaddestrato/
checkpoint_files:
- model-00001-of-00004.safetensors
- model-00002-of-00004.safetensors
- model-00003-of-00004.safetensors
- model-00004-of-00004.safetensors
model_type: LLAMA3
output_dir: /workspace/notebooks/result/
recipe_checkpoint: null
clip_grad_norm: null
compile: false
custom_sharded_layers:
- tok_embeddings
- output
dataset:
_component_: torchtune.datasets.chat_dataset
conversation_column: conversations
conversation_style: chatml
packed: false
source: /workspace/notebooks/datasets/dataset.json
device: cuda
dtype: bf16
enable_activation_checkpointing: true
enable_activation_offloading: false
epochs: 2
gradient_accumulation_steps: 4
log_every_n_steps: 1
log_peak_memory_stats: true
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
max_steps_per_epoch: null
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: /workspace/notebooks/result//logs
model:
_component_: torchtune.models.llama3_1.llama3_1_8b
optimizer:
_component_: torch.optim.AdamW
fused: true
lr: 2.0e-05
optimizer_in_bwd: false
output_dir:  /workspace/notebooks/result/
profiler:
_component_: torchtune.training.setup_torch_profiler
active_steps: 2
cpu: true
cuda: true
enabled: false
num_cycles: 1
output_dir: /workspace/notebooks/result//profiling_outputs
profile_memory: false
record_shapes: true
wait_steps: 5
warmup_steps: 3
with_flops: false
with_stack: false
resume_from_checkpoint: false
seed: 42
shuffle: true
tokenizer:
_component_: torchtune.models.llama3.llama3_tokenizer
max_seq_len: null
path: /workspace/notebooks/modello-preaddestrato/original/tokenizer.model
INFO:torchtune.utils._logging:Hint: enable_activation_checkpointing is True, but enable_activation_offloading isn't. Enabling activation offloading should reduce memory further.
DEBUG:torchtune.utils._logging:Setting manual seed to local seed 42. Local seed is seed + rank = 42 + 0
Writing logs to /workspace/notebooks/result/logs/log_1744368681.txt
INFO:torchtune.utils._logging:Distributed training is enabled. Instantiating model and loading checkpoint on Rank 0 ...
< /code>
Проблема в том, что терминал прекратил регистрацию в течение примерно 4 часа, поэтому последний журнал: < /p>
INFO:torchtune.utils._logging:Distributed training is enabled. Instantiating model and loading checkpoint on Rank 0 ...
< /code>
без регистрации чего -либо еще, буквально нет ничего ниже последнего журнала, даже не в Терминальной подсказке, что заставляет меня думать, что он все еще работает, но, вероятно, это не так. Я не знаю, упускаю ли я другие зависимости, которые необходимо установить, и у меня их нет, или что -то связано с ROCM или некоторыми переменными среды, которые будут настроены в Docker. Это компоненты, которые у меня есть на сервере: 
ЦП: AMD Ryzen 9 5900xt 16-Core 
GPU: Amd Radeon ™ RX 7600 XT 16 ГБ  

Подробнее здесь: [url]https://stackoverflow.com/questions/79572489/whats-going-on-in-this-ai-model-fine-tuning[/url]

Ответить

1 сообщение • Страница 1 из 1

Быстрый ответ

Заголовок:

Имя пользователя:

Изменение регистра текста:

Смайлики

Ещё смайлики…

К этому ответу прикреплено по крайней мере одно вложение.

Если вы не хотите добавлять вложения, оставьте поля пустыми. Можно прикреплять файлы, перетаскивая их в окно сообщения.

Максимально разрешённый размер вложения: 15 МБ.

Имя файла:

Комментарий к файлу:

Имя файла	Комментарий к файлу	Размер	Статус

Вернуться в «Linux»