Код: Выделить всё
for year in range(2016, 2019):
year_df = (pl
.scan_csv('some.csv', infer_schema_length=100000, null_values=['\\N'], cache=True)
.with_columns(
pl.col('origin_datetime').dt.year().alias('year'),
pl.col('origin_datetime').dt.month().alias('month')
)
.filter((pl.col('year') == year))
.cache()
)
for month in range(1,13):
(year_df
.filter((pl.col('month') == month))
.collect(streaming=True)
.write_parquet(f'/{year}_{month}.parquet')
)
Подробнее здесь: https://stackoverflow.com/questions/771 ... to-parquet