RuntimeError: размер представления несовместим с размером и шагом входного тензора (по крайней мере одно измерение охватывает два смежных подпространства). Вместо этого используйте .reshape(...).
Эта ошибка возникает только при использовании MPS, а не CPU
fine_tuned_decoder_path = "/path/fine_tuned_decoder"
model = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
encoder_pretrained_model_name_or_path="google/vit-base-patch16-224-in21k",
decoder_pretrained_model_name_or_path=fine_tuned_decoder_path,
tie_encoder_decoder=True,
cache_dir="/path/datasets/"+"models" # Directory for caching models
)
os.environ["WANDB_MODE"] = "disabled"
# Set batch size and number of training epochs
BATCH_SIZE = 16
TRAIN_EPOCHS = 5
# Define the output directory for storing training outputs
output_directory = os.path.join("path", "captioning_outputs")
# Check if MPS is available
device = torch.device("mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu")
# Move your model to the correct device
model.to(device)
# Set mixed precision and device handling
fp16 = False # Disable fp16 entirely
mixed_precision = None # Disable mixed precision (default)
# Training Arguments
training_args = TrainingArguments(
output_dir=output_directory,
per_device_train_batch_size=BATCH_SIZE,
do_train=True,
num_train_epochs=TRAIN_EPOCHS,
overwrite_output_dir=True,
use_cpu=False, # Ensure you're not using CPU
dataloader_pin_memory=False,
fp16=fp16, # Disable fp16 if using MPS
bf16=False, # Disable bf16 if using MPS
optim="adamw_torch", # Use AdamW Torch optimizer (more stable with mixed precision)
gradient_checkpointing=False, # Disable gradient checkpointing if necessary
logging_dir=os.path.join(output_directory, 'logs'),
report_to="none", # Disable reporting
)
# Use the Trainer with the model on the correct device
trainer = Trainer(
processing_class=feature_extractor, # Tokenizer
model=model, # Model to train
args=training_args, # Training arguments
train_dataset=train_dataset, # Training dataset
data_collator=default_data_collator # Data collator
)
# Start the training process
trainer.train()
Попробовал установить use_cpu=True, работает нормально, но не с MPS
RuntimeError: размер представления несовместим с размером и шагом входного тензора (по крайней мере одно измерение охватывает два смежных подпространства). Вместо этого используйте .reshape(...). Эта ошибка возникает только при использовании MPS, а не CPU [code] fine_tuned_decoder_path = "/path/fine_tuned_decoder" model = VisionEncoderDecoderModel.from_encoder_decoder_pretrained( encoder_pretrained_model_name_or_path="google/vit-base-patch16-224-in21k", decoder_pretrained_model_name_or_path=fine_tuned_decoder_path, tie_encoder_decoder=True, cache_dir="/path/datasets/"+"models" # Directory for caching models ) os.environ["WANDB_MODE"] = "disabled"
# Set batch size and number of training epochs BATCH_SIZE = 16 TRAIN_EPOCHS = 5
# Define the output directory for storing training outputs output_directory = os.path.join("path", "captioning_outputs")
# Check if MPS is available device = torch.device("mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu")
# Move your model to the correct device model.to(device)
# Training Arguments training_args = TrainingArguments( output_dir=output_directory, per_device_train_batch_size=BATCH_SIZE, do_train=True, num_train_epochs=TRAIN_EPOCHS, overwrite_output_dir=True, use_cpu=False, # Ensure you're not using CPU dataloader_pin_memory=False, fp16=fp16, # Disable fp16 if using MPS bf16=False, # Disable bf16 if using MPS optim="adamw_torch", # Use AdamW Torch optimizer (more stable with mixed precision) gradient_checkpointing=False, # Disable gradient checkpointing if necessary logging_dir=os.path.join(output_directory, 'logs'), report_to="none", # Disable reporting
)
# Use the Trainer with the model on the correct device trainer = Trainer( processing_class=feature_extractor, # Tokenizer model=model, # Model to train args=training_args, # Training arguments train_dataset=train_dataset, # Training dataset data_collator=default_data_collator # Data collator ) # Start the training process trainer.train() [/code] Попробовал установить use_cpu=True, работает нормально, но не с MPS