Я пытаюсь обучить языковую модель для машинного перевода между языком с ограниченными ресурсами и португальским с помощью Tensorflow. к сожалению, я получаю следующую ошибку:
PS C:\Users\myuser\PycharmProjects\teste> python .\tensorflow_model.py
2024-08-23 21:29:50.839647: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE SSE2 SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File ".\tensorflow_model.py", line 52, in
dataset = tf.data.Dataset.from_tensor_slices((src_tensor, tgt_tensor)).shuffle(BUFFER_SIZE)
File "C:\Users\myuser\PycharmProjects\teste\.venv\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 831, in from_tensor_slices
return from_tensor_slices_op._from_tensor_slices(tensors, name)
File "C:\Users\myuser\PycharmProjects\teste\.venv\lib\site-packages\tensorflow\python\data\ops\from_tensor_slices_op.py", line 25, in _from_tensor_slices
return _TensorSliceDataset(tensors, name=name)
File "C:\Users\myuser\PycharmProjects\teste\.venv\lib\site-packages\tensorflow\python\data\ops\from_tensor_slices_op.py", line 45, in __init__
batch_dim.assert_is_compatible_with(
File "C:\Users\myuser\PycharmProjects\teste\.venv\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 300, in assert_is_compatible_with
raise ValueError("Dimensions %s and %s are not compatible" %
ValueError: Dimensions 21 and 22 are not compatible
Как преодолеть эту ошибку?
import tensorflow as tf
import numpy as np
import re
import os
# Clean data
def preprocess_sentence(sentence):
sentence = sentence.lower().strip()
sentence = re.sub(r"([?.!,¿])", r" \1 ", sentence)
sentence = re.sub(r'[" "]+', " ", sentence)
sentence = re.sub(r"[^a-zA-Z?.!,¿]+", " ", sentence)
sentence = sentence.strip()
sentence = ' ' + sentence + ' '
return sentence
#Function to load data
def load_data(file_path_src, file_path_tgt):
src_sentences = open(file_path_src, 'r', encoding='utf-8').read().strip().split('\n')
tgt_sentences = open(file_path_tgt, 'r', encoding='utf-8').read().strip().split('\n')
src_sentences = [preprocess_sentence(sentence) for sentence in src_sentences]
tgt_sentences = [preprocess_sentence(sentence) for sentence in tgt_sentences]
return src_sentences, tgt_sentences
#load data
src_sentences, tgt_sentences = load_data('src_language.txt', 'portuguese.txt')
#Tokenization
src_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')
tgt_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')
src_tokenizer.fit_on_texts(src_sentences)
tgt_tokenizer.fit_on_texts(tgt_sentences)
src_tensor = src_tokenizer.texts_to_sequences(src_sentences)
tgt_tensor = tgt_tokenizer.texts_to_sequences(tgt_sentences)
src_tensor = tf.keras.preprocessing.sequence.pad_sequences(src_tensor, padding='post')
tgt_tensor = tf.keras.preprocessing.sequence.pad_sequences(tgt_tensor, padding='post')
BUFFER_SIZE = len(src_tensor)
#Creating the Dataset
dataset = tf.data.Dataset.from_tensor_slices((src_tensor, tgt_tensor)).shuffle(BUFFER_SIZE)
Подробнее здесь: https://stackoverflow.com/questions/789 ... ge-and-por
Языковая модель для машинного перевода между языком с низкими ресурсами и португальским с использованием Tensorflow. ⇐ Python
Программы на Python
1731410894
Anonymous
Я пытаюсь обучить языковую модель для машинного перевода между языком с ограниченными ресурсами и португальским с помощью Tensorflow. к сожалению, я получаю следующую ошибку:
PS C:\Users\myuser\PycharmProjects\teste> python .\tensorflow_model.py
2024-08-23 21:29:50.839647: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE SSE2 SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File ".\tensorflow_model.py", line 52, in
dataset = tf.data.Dataset.from_tensor_slices((src_tensor, tgt_tensor)).shuffle(BUFFER_SIZE)
File "C:\Users\myuser\PycharmProjects\teste\.venv\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 831, in from_tensor_slices
return from_tensor_slices_op._from_tensor_slices(tensors, name)
File "C:\Users\myuser\PycharmProjects\teste\.venv\lib\site-packages\tensorflow\python\data\ops\from_tensor_slices_op.py", line 25, in _from_tensor_slices
return _TensorSliceDataset(tensors, name=name)
File "C:\Users\myuser\PycharmProjects\teste\.venv\lib\site-packages\tensorflow\python\data\ops\from_tensor_slices_op.py", line 45, in __init__
batch_dim.assert_is_compatible_with(
File "C:\Users\myuser\PycharmProjects\teste\.venv\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 300, in assert_is_compatible_with
raise ValueError("Dimensions %s and %s are not compatible" %
ValueError: Dimensions 21 and 22 are not compatible
Как преодолеть эту ошибку?
import tensorflow as tf
import numpy as np
import re
import os
# Clean data
def preprocess_sentence(sentence):
sentence = sentence.lower().strip()
sentence = re.sub(r"([?.!,¿])", r" \1 ", sentence)
sentence = re.sub(r'[" "]+', " ", sentence)
sentence = re.sub(r"[^a-zA-Z?.!,¿]+", " ", sentence)
sentence = sentence.strip()
sentence = ' ' + sentence + ' '
return sentence
#Function to load data
def load_data(file_path_src, file_path_tgt):
src_sentences = open(file_path_src, 'r', encoding='utf-8').read().strip().split('\n')
tgt_sentences = open(file_path_tgt, 'r', encoding='utf-8').read().strip().split('\n')
src_sentences = [preprocess_sentence(sentence) for sentence in src_sentences]
tgt_sentences = [preprocess_sentence(sentence) for sentence in tgt_sentences]
return src_sentences, tgt_sentences
#load data
src_sentences, tgt_sentences = load_data('src_language.txt', 'portuguese.txt')
#Tokenization
src_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')
tgt_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')
src_tokenizer.fit_on_texts(src_sentences)
tgt_tokenizer.fit_on_texts(tgt_sentences)
src_tensor = src_tokenizer.texts_to_sequences(src_sentences)
tgt_tensor = tgt_tokenizer.texts_to_sequences(tgt_sentences)
src_tensor = tf.keras.preprocessing.sequence.pad_sequences(src_tensor, padding='post')
tgt_tensor = tf.keras.preprocessing.sequence.pad_sequences(tgt_tensor, padding='post')
BUFFER_SIZE = len(src_tensor)
#Creating the Dataset
dataset = tf.data.Dataset.from_tensor_slices((src_tensor, tgt_tensor)).shuffle(BUFFER_SIZE)
Подробнее здесь: [url]https://stackoverflow.com/questions/78911175/a-language-model-for-machine-translation-between-a-low-resource-language-and-por[/url]
Ответить
1 сообщение
• Страница 1 из 1
Перейти
- Кемерово-IT
- ↳ Javascript
- ↳ C#
- ↳ JAVA
- ↳ Elasticsearch aggregation
- ↳ Python
- ↳ Php
- ↳ Android
- ↳ Html
- ↳ Jquery
- ↳ C++
- ↳ IOS
- ↳ CSS
- ↳ Excel
- ↳ Linux
- ↳ Apache
- ↳ MySql
- Детский мир
- Для души
- ↳ Музыкальные инструменты даром
- ↳ Печатная продукция даром
- Внешняя красота и здоровье
- ↳ Одежда и обувь для взрослых даром
- ↳ Товары для здоровья
- ↳ Физкультура и спорт
- Техника - даром!
- ↳ Автомобилистам
- ↳ Компьютерная техника
- ↳ Плиты: газовые и электрические
- ↳ Холодильники
- ↳ Стиральные машины
- ↳ Телевизоры
- ↳ Телефоны, смартфоны, плашеты
- ↳ Швейные машинки
- ↳ Прочая электроника и техника
- ↳ Фототехника
- Ремонт и интерьер
- ↳ Стройматериалы, инструмент
- ↳ Мебель и предметы интерьера даром
- ↳ Cантехника
- Другие темы
- ↳ Разное даром
- ↳ Давай меняться!
- ↳ Отдам\возьму за копеечку
- ↳ Работа и подработка в Кемерове
- ↳ Давай с тобой поговорим...
Мобильная версия