Я хочу создать предварительный просмотр файлов электронных таблиц, загружаемых пользователями, с эффективным использованием памяти. Я тестирую с помощью pytest-memray:
================================================================================================================ MEMRAY REPORT ================================================================================================================
Allocation results for tests/test_load.py::test_load_polars_xlsx at the high watermark
📦 Total memory allocated: 473.5MiB
📏 Total allocations: 21
📊 Histogram of allocation sizes: |▁█▆ |
🥇 Biggest allocating functions:
- load_sheet_eager:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/fastexcel/__init__.py:424 -> 473.4MiB
- _call_with_frames_removed::488 -> 20.4KiB
- read_excel:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/fastexcel/__init__.py:514 -> 13.9KiB
- _compile_bytecode::784 -> 9.6KiB
- inner:/home/linuxbrew/.linuxbrew/opt/python@3.13/lib/python3.13/typing.py:429 -> 9.0KiB
Allocation results for tests/test_load.py::test_load_polars_partial_xlsx at the high watermark
📦 Total memory allocated: 473.4MiB
📏 Total allocations: 7
📊 Histogram of allocation sizes: |█ |
🥇 Biggest allocating functions:
- load_sheet_eager:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/fastexcel/__init__.py:424 -> 473.4MiB
- read_excel:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/fastexcel/__init__.py:514 -> 13.9KiB
- _read_spreadsheet:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/polars/io/spreadsheet/functions.py:684 -> 536.0B
Allocation results for tests/test_load.py::test_load_pandas_xlsx at the high watermark
📦 Total memory allocated: 426.2MiB
📏 Total allocations: 871
📊 Histogram of allocation sizes: | █▂ |
🥇 Biggest allocating functions:
- feed:/home/linuxbrew/.linuxbrew/opt/python@3.13/lib/python3.13/xml/etree/ElementTree.py:1291 -> 245.1MiB
- parse_cell:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/openpyxl/worksheet/_reader.py:244 -> 63.0MiB
- maybe_infer_to_datetimelike:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/pandas/core/dtypes/cast.py:1198 -> 39.3MiB
- _rows_to_cols:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/pandas/io/parsers/python_parser.py:1066 -> 38.3MiB
- _infer_types:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/pandas/io/parsers/base_parser.py:720 -> 15.3MiB
Allocation results for tests/test_load.py::test_load_polars at the high watermark
📦 Total memory allocated: 172.5MiB
📏 Total allocations: 92
📊 Histogram of allocation sizes: |█ |
🥇 Biggest allocating functions:
- collect:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/polars/lazyframe/frame.py:2332 -> 96.0B
Allocation results for tests/test_load.py::test_load_pandas at the high watermark
📦 Total memory allocated: 24.6MiB
📏 Total allocations: 25
📊 Histogram of allocation sizes: |█ ▁|
🥇 Biggest allocating functions:
- read:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/pandas/io/parsers/c_parser_wrapper.py:234 -> 18.6MiB
- __init__:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/pandas/io/parsers/c_parser_wrapper.py:93 -> 6.0MiB
- get_handle:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/pandas/io/common.py:873 -> 4.0KiB
- _clean_options:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/pandas/io/parsers/readers.py:1688 -> 1.5KiB
- read_csv:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/pandas/io/parsers/readers.py:1009 -> 1.5KiB
Allocation results for tests/test_load.py::test_load_polars_partial_buffer at the high watermark
📦 Total memory allocated: 31.5MiB
📏 Total allocations: 16
📊 Histogram of allocation sizes: |▁█ |
🥇 Biggest allocating functions:
- _check_empty:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/polars/io/_utils.py:282 -> 1.3KiB
- read_csv:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/polars/io/csv/functions.py:572 -> 768.0B
- read_csv:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/polars/io/csv/functions.py:549 -> 728.0B
Pandas превосходит Polars по эффективности использования памяти. Есть ли что-то, что я делаю неправильно? Попытка воспользоваться преимуществами LazyFrames с помощью pl.scan_csv не помогает.
Я хочу создать предварительный просмотр файлов электронных таблиц, загружаемых пользователями, с эффективным использованием памяти. Я тестирую с помощью pytest-memray: [code]import pytest import pandas as pd import polars as pl
def test_load_polars_partial_buffer(path: str): with io.BytesIO() as buffer: pl.scan_csv(path).limit(20).sink_csv(buffer) df = pl.read_csv(buffer.getvalue()) assert len(df) == 20 [/code] rows.csv содержит около 30 тыс. строк, а Bankdataset.xlsx — 1 миллион строк. Вывод: [code]================================================================================================================ MEMRAY REPORT ================================================================================================================ Allocation results for tests/test_load.py::test_load_polars_xlsx at the high watermark
Allocation results for tests/test_load.py::test_load_polars_partial_buffer at the high watermark
📦 Total memory allocated: 31.5MiB 📏 Total allocations: 16 📊 Histogram of allocation sizes: |▁█ | 🥇 Biggest allocating functions: - _check_empty:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/polars/io/_utils.py:282 -> 1.3KiB - read_csv:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/polars/io/csv/functions.py:572 -> 768.0B - read_csv:/home/monopoly/workspace/toys/sheetz/.venv/lib/python3.13/site-packages/polars/io/csv/functions.py:549 -> 728.0B [/code] Pandas превосходит Polars по эффективности использования памяти. Есть ли что-то, что я делаю неправильно? Попытка воспользоваться преимуществами LazyFrames с помощью pl.scan_csv не помогает.