Код: Выделить всё
import pandas as pd
import os
# Create a sample DataFrame with daily frequency
data = {
"timestamp": pd.date_range(start="2023-01-01", periods=1000, freq="D"),
"value": range(100)
}
df = pd.DataFrame(data)
# Add a column for year (to use as a partition key)
df["year"] = df["timestamp"].dt.year
df["month"] = df["timestamp"].dt.month
# Use the join method to expand the DataFrame (Cartesian product with a multiplier)
multiplier = pd.DataFrame({"replica": range(100)}) # Create a multiplier DataFrame
expanded_df = df.join(multiplier, how="cross") # Cartesian product using cross join
# Define the output directory
output_dir = "output_parquet"
# Save the expanded DataFrame to Parquet with year-based partitioning
expanded_df.to_parquet(
output_dir,
partition_cols=["year", "month"], # Specify the partition column
)
Подробнее здесь: https://stackoverflow.com/questions/791 ... nto-pandas
Мобильная версия