Код: Выделить всё
file = "./in.csv"
arrowFile = "./out.arrow"
with pa.OSFile(arrowFile, 'wb') as arrow:
with pa.csv.open_csv(file) as reader:
with pa.RecordBatchFileWriter(arrow, reader.schema) as writer:
for batch in reader:
writer.write_batch(batch)
Код: Выделить всё
convert_options = pa.csv.ConvertOptions(auto_dict_encode = True)
with pa.OSFile(arrowFile, 'wb') as arrow:
with pa.csv.open_csv(file, convert_options=convert_options) as reader:
with pa.RecordBatchFileWriter(arrow, reader.schema) as writer:
for batch in reader:
writer.write_batch(batch)
Код: Выделить всё
File "pyarrow/ipc.pxi", line 507, in pyarrow.lib._CRecordBatchWriter.write_batch
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Dictionary replacement detected when writing IPC file format. Arrow IPC files only support a single non-delta dictionary for a given field across all batches.
Подробнее здесь: https://stackoverflow.com/questions/792 ... y-encoding