Проверка фильтра: batch = [b для b в партии, если iSinstance (b, dict) и «input_cat» в b] < /p>
Потому что этот фильтр удаляет все элементы, collate_fn обнаруживает len (batch) = nathers a и возвращает Skatch a и возвращает Skatch a и возвращает Skatch a и возвращает Skatch a и возвращает Skatch a и возвращает Skatch a и возвращает Specta_fn. ({"skip_batch": true}). Партия, полученная COLLATE_FN, представляет собой список из 16 пустых словарей.
Код: Выделить всё
class IterableSoccerDataset(IterableDataset):
def __init__(self, sequences: List[List[Dict]], idx: FeatureIndexer, block_size: int, min_len: int = 2):
super().__init__()
self.sequences = sequences
self.idx = idx
self.block_size = block_size
self.min_len = min_len
self.pos_end_cat = np.array([idx.id_for("event_type", idx.POS_END) if col=="event_type" else 0
for col in ALL_CAT], dtype=np.int64)
self.pos_end_cont = np.zeros(len(ALL_CONT), dtype=np.float32)
print(f"IterableSoccerDataset initialized with {len(sequences)} sequences.")
def __iter__(self) -> Iterator[Dict[str, torch.Tensor]]:
rng = np.random.default_rng()
for seq in self.sequences:
if len(seq) < self.min_len:
continue
# encode
cat, cont = [], []
for ev in seq:
c, f = self.idx.encode(pd.Series(ev))
cat.append(c)
cont.append(f)
cat.append(self.pos_end_cat)
cont.append(self.pos_end_cont)
cat = np.stack(cat) # (L+1,C)
cont = np.stack(cont) # (L+1,F)
L = len(cat) # includes POS_END
# decide window boundaries
if L
def collate_fn(batch):
batch = [b for b in batch
if isinstance(b, dict) and "input_cat" in b]
if len(batch) == 0:
return {"skip_batch": True}
# ... rest of code
< /code>
I have tried:
[list]
[*]Successfully yields - confirmed via prints that the [b]iter[/b] method does yield dictionaries with the key "input_cat" and others, containing tensors.
[*]collate_fn receives items - confirmed via prints that collate_fn receives a list (batch) with the correct number of items (equal to batch_size).
[*]Filtering checks - the specific filter isinstance(b, dict) and "input_cat" in b evaluates to False for every item received by collate_fn in that first batch (as they are all just empty dictionaries).
[*]num_workers - I suspected this might be related to multiprocessing (dataloader_num_workers > 0), potentially due to serialization/deserialization issues between workers and the main process. However, did not make a difference when I set dataloader_num_workers=0.
[/list]
What could cause items that appear correctly structured just before being yielded by the IterableDataset to consistently fail the isinstance(b, dict) and "input_cat" in b check when they arrive as a list in the collate_fn, especially on the very first batch? I am at a loss for what to do.
To clarify, the print statement in IterableSoccerDataset
Большое спасибо!
Подробнее здесь: https://stackoverflow.com/questions/795 ... -batch-des