Как я могу создать TFRecords со словарем из нескольких 2D-тензоров и прочитать их обратно без ошибок анализа? - Цифровое Кемерово

Как я могу создать TFRecords со словарем из нескольких 2D-тензоров и прочитать их обратно без ошибок анализа? ⇐ Python

Ответить

1 сообщение • Страница 1 из 1

Anonymous

Как я могу создать TFRecords со словарем из нескольких 2D-тензоров и прочитать их обратно без ошибок анализа?

Цитата

Сообщение Anonymous » 24 янв 2025, 17:36

Я просмотрел другие сообщения, например. Tensorflow TFRecord: не могу проанализировать сериализованный пример, чтобы попытаться решить проблему сбоями анализа сериализованных данных, но не могу понять, где я неправильно понимаю этапы преобразования при создании TFRecords и их обратном чтении.
Вот «игрушечный» образец кода, отражающий каждое из преобразований типов, которые я пытаюсь использовать. Переменная Task определяет, какой процесс выполнять.
import numpy as np
import tensorflow as tf
import random
import os

# options are "gen", "consume" or "both"
task = "both"

def createSerialExample(shard):
sh_f = tf.constant(shard["feats"])
sh_f = tf.cast(sh_f, tf.uint8)
sh_l = tf.constant(shard["labels"])
sh_l = tf.cast(sh_l, tf.uint8)

# Convert dataset tensor element to serialised bytes
featsByte = tf.io.serialize_tensor(sh_f)
labelsByte = tf.io.serialize_tensor(sh_l)

# Build feature from byte lists
featsFeature = tf.train.Feature(bytes_list=tf.train.BytesList(value=[featsByte.numpy()]))
labelsFeature = tf.train.Feature(bytes_list=tf.train.BytesList(value=[labelsByte.numpy()]))

# Build the feature map
featureMap = { "feats": featsFeature, "labels": labelsFeature }

# Build a collection of features defined by the feature map, followed by building an example
# from the features and serialising this example
example = tf.train.Example(features=tf.train.Features(feature=featureMap))
#print("example=", example)

serialisedExample = example.SerializeToString()

return serialisedExample
# end createSerialExample

#------------------------ Generation --------------------------------------------
if task == "gen" or task == "both":
vdata = { "feats":[], "labels":[] }

# Create random data in 2 x TFRecords with 2 x shards each of 4x4 and 2x2 data features
cnt = 0
for _ in range(2):
for _ in range(2):
feat4x4 = [[random.randint(1, 10) for _ in range(4)] for _ in range(4)]
lbl2x2 = [[random.randint(1, 10) for _ in range(2)] for _ in range(2)]

vdata["feats"].append(feat4x4)
vdata["labels"].append(lbl2x2)

path = os.path.join("datasets//", f"{cnt:06d}.tfrec")
dataset = tf.data.Dataset.from_tensor_slices(vdata) # Create a dataset of shards
# Write the shards into the TFRecord
with tf.io.TFRecordWriter(path) as writer:
for shard in dataset:
serialisedExample = createSerialExample(shard=shard)
writer.write(serialisedExample)
# Clear old data
vdata = { "feats":[], "labels":[] }
cnt = cnt + 1

print("Generation done...")

#------------------------ Consumption --------------------------------------------
# Map function for: train_dataset = tf.data.TFRecordDataset(ds_train_files).map(loadDataset) below
def loadDataset(ds):
featuresSchema = {
"feats": tf.io.FixedLenFeature([4,4], tf.string),
"labels": tf.io.FixedLenFeature([2,2], tf.string)
}

# Extract the dict from the serialised data
parsed_ds = tf.io.parse_single_example(ds, featuresSchema)
print("parsed_ds=", parsed_ds)

# Get the tensors from the parsed dict
X = tf.io.parse_tensor(parsed_ds["feats"], tf.dtypes.uint8) # input training data
X.set_shape([4,4])
X = tf.cast(X, tf.float32)/255.0

Y = tf.io.parse_tensor(parsed_ds["labels"], tf.dtypes.uint8) # output targt data
Y.set_shape([2,2])
Y = tf.cast(Y, tf.float32)/255.0

return X, Y
#end loadDataset

if task == "consume" or task == "both":
ds_train_files = tf.data.Dataset.list_files("datasets\\*", seed=42)

# Load the datasets from the list of filenames
train_dataset = tf.data.TFRecordDataset(ds_train_files).map(loadDataset).batch(1)

# Train the model
tf.random.set_seed(42) # Make the results reproducible with consistent random weight matrix
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=[4,4], dtype=tf.float32),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(4, activation="sigmoid"), # 2x2 output prediction
tf.keras.layers.Reshape((2,2))
])

model.compile(loss="mse", optimizer="sgd")

history = model.fit(train_dataset, epochs=5)

print("Consumption done...")

Получена ошибка:
2025-01-24 15:52:13.753674: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at example_parsing_ops.cc:98 : INVALID_ARGUMENT: Key: feats. Can't parse serialized Example.
.
.
.
Key: feats. Can't parse serialized Example.
[[{{node ParseSingleExample/ParseExample/ParseExampleV2}}]]
[[IteratorGetNext]] [Op:__inference_one_step_on_iterator_397]

Подробнее здесь: https://stackoverflow.com/questions/793 ... d-read-the

1737729402

Anonymous

Я просмотрел другие сообщения, например. Tensorflow TFRecord: не могу проанализировать сериализованный пример, чтобы попытаться решить проблему сбоями анализа сериализованных данных, но не могу понять, где я неправильно понимаю этапы преобразования при создании TFRecords и их обратном чтении.
Вот «игрушечный» образец кода, отражающий каждое из преобразований типов, которые я пытаюсь использовать.  Переменная Task определяет, какой процесс выполнять.
import numpy as np
import tensorflow as tf
import random
import os

# options are "gen", "consume" or "both"
task = "both"

def createSerialExample(shard):
sh_f = tf.constant(shard["feats"])
sh_f = tf.cast(sh_f, tf.uint8)
sh_l = tf.constant(shard["labels"])
sh_l = tf.cast(sh_l, tf.uint8)

# Convert dataset tensor element to serialised bytes
featsByte = tf.io.serialize_tensor(sh_f)
labelsByte = tf.io.serialize_tensor(sh_l)

# Build feature from byte lists
featsFeature = tf.train.Feature(bytes_list=tf.train.BytesList(value=[featsByte.numpy()]))
labelsFeature = tf.train.Feature(bytes_list=tf.train.BytesList(value=[labelsByte.numpy()]))

# Build the feature map
featureMap = { "feats": featsFeature, "labels": labelsFeature }

# Build a collection of features defined by the feature map, followed by building an example
# from the features and serialising this example
example = tf.train.Example(features=tf.train.Features(feature=featureMap))
#print("example=", example)

serialisedExample = example.SerializeToString()

return serialisedExample
# end createSerialExample

#------------------------ Generation --------------------------------------------
if task == "gen" or task == "both":
vdata = { "feats":[], "labels":[] }

# Create random data in 2 x TFRecords with 2 x shards each of 4x4 and 2x2 data features
cnt = 0
for _ in range(2):
for _ in range(2):
feat4x4 = [[random.randint(1, 10) for _ in range(4)] for _ in range(4)]
lbl2x2 = [[random.randint(1, 10) for _ in range(2)] for _ in range(2)]

vdata["feats"].append(feat4x4)
vdata["labels"].append(lbl2x2)

path = os.path.join("datasets//", f"{cnt:06d}.tfrec")
dataset = tf.data.Dataset.from_tensor_slices(vdata) # Create a dataset of shards
# Write the shards into the TFRecord
with tf.io.TFRecordWriter(path) as writer:
for shard in dataset:
serialisedExample = createSerialExample(shard=shard)
writer.write(serialisedExample)
# Clear old data
vdata = { "feats":[], "labels":[] }
cnt = cnt + 1

print("Generation done...")

#------------------------ Consumption --------------------------------------------
# Map function for: train_dataset = tf.data.TFRecordDataset(ds_train_files).map(loadDataset) below
def loadDataset(ds):
featuresSchema = {
"feats": tf.io.FixedLenFeature([4,4], tf.string),
"labels": tf.io.FixedLenFeature([2,2], tf.string)
}

# Extract the dict from the serialised data
parsed_ds = tf.io.parse_single_example(ds, featuresSchema)
print("parsed_ds=", parsed_ds)

# Get the tensors from the parsed dict
X = tf.io.parse_tensor(parsed_ds["feats"], tf.dtypes.uint8)  # input training data
X.set_shape([4,4])
X = tf.cast(X, tf.float32)/255.0

Y = tf.io.parse_tensor(parsed_ds["labels"], tf.dtypes.uint8) # output targt data
Y.set_shape([2,2])
Y = tf.cast(Y, tf.float32)/255.0

return X, Y
#end loadDataset

if task == "consume" or task == "both":
ds_train_files = tf.data.Dataset.list_files("datasets\\*", seed=42)

# Load the datasets from the list of filenames
train_dataset = tf.data.TFRecordDataset(ds_train_files).map(loadDataset).batch(1)

# Train the model
tf.random.set_seed(42)  # Make the results reproducible with consistent random weight matrix
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=[4,4], dtype=tf.float32),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(4, activation="sigmoid"), # 2x2 output prediction
tf.keras.layers.Reshape((2,2))
])

model.compile(loss="mse", optimizer="sgd")

history = model.fit(train_dataset, epochs=5)

print("Consumption done...")

Получена ошибка:
2025-01-24 15:52:13.753674: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at example_parsing_ops.cc:98 : INVALID_ARGUMENT: Key: feats.   Can't parse serialized Example.
.
.
.
Key: feats.  Can't parse serialized Example.
[[{{node ParseSingleExample/ParseExample/ParseExampleV2}}]]
[[IteratorGetNext]] [Op:__inference_one_step_on_iterator_397]

 

Подробнее здесь: [url]https://stackoverflow.com/questions/79384698/how-can-i-create-tfrecords-with-a-dictionary-of-multiple-2d-tensors-and-read-the[/url]

Ответить

1 сообщение • Страница 1 из 1

Быстрый ответ

Заголовок:

Имя пользователя:

Изменение регистра текста:

Смайлики

Ещё смайлики…

К этому ответу прикреплено по крайней мере одно вложение.

Если вы не хотите добавлять вложения, оставьте поля пустыми. Можно прикреплять файлы, перетаскивая их в окно сообщения.

Максимально разрешённый размер вложения: 15 МБ.

Имя файла:

Комментарий к файлу:

Имя файла	Комментарий к файлу	Размер	Статус

Вернуться в «Python»