Код: Выделить всё
import duckdb
from pathlib import Path
inDir = r"E:\Personal Projects\tmp\result"
outDir = r"C:\Users\Akira\Documents\enwiktionary.ndjson"
inDir = Path(inDir)
outDir = Path(outDir)
con = duckdb.connect()
con.execute("SET threads=5")
con.execute("SET memory_limit='12.5GB'")
con.execute("SET preserve_insertion_order=false")
result = con.sql(f"""
COPY(
SELECT
arg_max(html, dateModified) as html
FROM read_ndjson('{inDir / "*enwiktionary*.ndjson"}')
GROUP BY url
)
TO "{outDir}"
""")
Код: Выделить всё
---------------------------------------------------------------------------
OutOfMemoryException Traceback (most recent call last)
Cell In[5], line 16
12 con.execute("SET memory_limit='12.5GB'")
13 con.execute("SET preserve_insertion_order=false")
---> 16 result = con.sql(f"""
17 COPY(
18 SELECT
19 arg_max(html, dateModified) as html
20 FROM read_ndjson('{inDir / "*enwiktionary*.ndjson"}')
21 GROUP BY url
22 )
23 TO "{outDir}"
24 """)
OutOfMemoryException: Out of Memory Error: failed to allocate data of size 16.0 MiB (11.6 GiB/11.6 GiB used)
Possible solutions:
* Reducing the number of threads (SET threads=X)
* Disabling insertion-order preservation (SET preserve_insertion_order=false)
* Increasing the memory limit (SET memory_limit='...GB')
See also https://duckdb.org/docs/stable/guides/performance/how_to_tune_workloads
- Какие вычисления должны использовать DuckDB и не могут передаваться на диск?
- Как точно настроить параметр Memory_limit с учетом имеющегося набора данных?
Подробнее здесь: https://stackoverflow.com/questions/798 ... parameters
Мобильная версия