Код: Выделить всё
from pathlib import Path
import polars as pl
inDir = r"E:\Personal Projects\tmp\tarFiles\result2"
outDir = r"C:\Users\Akira\Documents\out_polars.ndjson"
inDir = Path(inDir)
outDir = Path(outDir)
schema = {"name" : pl.String,
"dateModified": pl.String,
"identifier" : pl.UInt64,
"url" : pl.String,
"html" : pl.String}
lf = pl.scan_ndjson(inDir / "*wiktionary*.ndjson", schema=schema)
lf = lf.group_by(["html"]).agg(pl.max("dateModified").alias("dateModified"))
lf.sink_ndjson(outDir,
maintain_order=False,
engine="streaming")
Код: Выделить всё
The Kernel crashed while executing code in the current cell or a previous cell.
Please review the code in the cell(s) to identify a possible cause of the failure.
Click [url=https://aka.ms/vscodeJupyterKernelCrash]here[/url] for more info.
View Jupyter [url=command:jupyter.viewOutput]log[/url] for further details.
Подробнее здесь: https://stackoverflow.com/questions/798 ... roupby-max
Мобильная версия