Проверка фильтра: batch = [b для b в партии, если iSinstance (b, dict) и «input_cat» в b] < /p>
Потому что этот фильтр удаляет все элементы, collate_fn обнаруживает len (batch) = nathers a и возвращает Skatch a и возвращает Skatch a и возвращает Skatch a и возвращает Skatch a и возвращает Skatch a и возвращает Skatch a и возвращает Specta_fn. ({"skip_batch": true}). Партия, полученная COLLATE_FN, представляет собой список из 16 пустых словарей.class IterableSoccerDataset(IterableDataset):
def __init__(self, sequences: List[List[Dict]], idx: FeatureIndexer, block_size: int, min_len: int = 2):
super().__init__()
self.sequences = sequences
self.idx = idx
self.block_size = block_size
self.min_len = min_len
self.pos_end_cat = np.array([idx.id_for("event_type", idx.POS_END) if col=="event_type" else 0
for col in ALL_CAT], dtype=np.int64)
self.pos_end_cont = np.zeros(len(ALL_CONT), dtype=np.float32)
print(f"IterableSoccerDataset initialized with {len(sequences)} sequences.")
def __iter__(self) -> Iterator[Dict[str, torch.Tensor]]:
rng = np.random.default_rng()
for seq in self.sequences:
if len(seq) < self.min_len:
continue
# encode
cat, cont = [], []
for ev in seq:
c, f = self.idx.encode(pd.Series(ev))
cat.append(c)
cont.append(f)
cat.append(self.pos_end_cat)
cont.append(self.pos_end_cont)
cat = np.stack(cat) # (L+1,C)
cont = np.stack(cont) # (L+1,F)
L = len(cat) # includes POS_END
# decide window boundaries
if L
def collate_fn(batch):
batch = [b for b in batch
if isinstance(b, dict) and "input_cat" in b]
if len(batch) == 0:
return {"skip_batch": True}
# ... rest of code
< /code>
I have tried:
- Successfully yields - confirmed via prints that the iter method does yield dictionaries with the key "input_cat" and others, containing tensors.
- collate_fn receives items - confirmed via prints that collate_fn receives a list (batch) with the correct number of items (equal to batch_size).
- Filtering checks - the specific filter isinstance(b, dict) and "input_cat" in b evaluates to False for every item received by collate_fn in that first batch (as they are all just empty dictionaries).
- num_workers - I suspected this might be related to multiprocessing (dataloader_num_workers > 0), potentially due to serialization/deserialization issues between workers and the main process. However, did not make a difference when I set dataloader_num_workers=0.
Many thanks!
Подробнее здесь: https://stackoverflow.com/questions/795 ... -batch-des