Код: Выделить всё
import duckdb
from pathlib import Path
inDir = r"E:\Personal Projects\tmp\tarFiles\result2"
outDir = r"C:\Users\Akira\Documents\out_duckdb2.ndjson"
inDir = Path(inDir)
outDir = Path(outDir)
con = duckdb.connect()
result = con.sql(f"""
SET threads=10;
SET memory_limit='10GB';
SET preserve_insertion_order=false;
COPY(SELECT
html,
dateModified,
ROW_NUMBER() OVER (PARTITION BY html ORDER BY dateModified DESC) AS rn
FROM read_ndjson('{inDir / "*wiktionary*.ndjson"}'))
TO "{outDir}"
""")
Код: Выделить всё
---------------------------------------------------------------------------
OutOfMemoryException Traceback (most recent call last)
Cell In[3], line 10
7 outDir = Path(outDir)
9 con = duckdb.connect()
---> 10 result = con.sql(f"""
11 SET threads=10;
12 SET memory_limit='10GB';
13 SET preserve_insertion_order=false;
14 COPY(SELECT
15 html,
16 dateModified,
17 ROW_NUMBER() OVER (PARTITION BY html ORDER BY dateModified DESC) AS rn
18 FROM read_ndjson('{inDir / "*wiktionary*.ndjson"}'))
19 TO "{outDir}"
20 """)
OutOfMemoryException: Out of Memory Error: could not allocate block of size 256.0 KiB (9.3 GiB/9.3 GiB used)
Possible solutions:
* Reducing the number of threads (SET threads=X)
* Disabling insertion-order preservation (SET preserve_insertion_order=false)
* Increasing the memory limit (SET memory_limit='...GB')
See also https://duckdb.org/docs/stable/guides/performance/how_to_tune_workloads
Не могли бы вы объяснить, как точно настроить параметры для моей рабочей нагрузки?
Подробнее здесь: https://stackoverflow.com/questions/798 ... roupby-max
Мобильная версия