Зарегистрируйте настройку распределения Pearson3 с использованием scipy и lmoments3 в Python

Зарегистрируйте настройку распределения Pearson3 с использованием scipy и lmoments3 в Python ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

Зарегистрируйте настройку распределения Pearson3 с использованием scipy и lmoments3 в Python

Цитата

Сообщение Anonymous » 12 дек 2024, 21:14

Я пытаюсь найти способ подогнать дистрибутив Log Pearson3 к моим данным потока, но не могу найти способ, как это сделать! Будем очень признательны за любые советы.
Вот проблема:
И в пакетах scipy, и в lmoment3 есть Pearson3, но у них нет подходящих дистрибутивов Log Pearson3! scipy использует метод оценки максимального правдоподобия (MLE) для соответствия распределению, а пакет lmoment3 соответствует распределениям с помощью метода l-moment. Но, как уже было сказано, в их списке статистических распределений есть только распределение Пирсона3.
Я рассчитываю данные годового максимального стока (AMF), которые возвращают временной ряд, скажем, за 50 лет речного стока. данные. Затем я использую пакеты scipy и lmoment3, соответствующие дистрибутивам. Я думал, что если я вычислю логарифм моего AMF, а затем подберу Pearson3, а затем в конце вычислю антилогарифм, это будет похоже на подгонку логарифма Pearson3, но похоже, что это не так! Существуют различия в том, как оцениваются параметры в Pearson3 и Log Pearson3!
и я не могу найти подходящего руководства в Интернете!
Есть мысли об этом?
ниже я использую код:
import os
import numpy as np
import pandas as pd
import scipy.stats as st
import lmoments3 as lm
from lmoments3 import distr, stats

# Define the folder path and get the list of CSV files
folder_path = '.../daily_ts/'
# folder_path = '.../all_sites/'

csv_files = sorted([f for f in os.listdir(folder_path) if f.endswith('.csv')])

# Iterate over each file and process the data
for i, file in enumerate(csv_files[5:6]):
station_code = file.split('_')[0]
# Read the data
df = pd.read_csv(os.path.join(folder_path, file), skiprows=27, names=['Date', 'Flow (ML)', 'Bureau QCode'])
station_code = file.split('_')[0]

# Process the data
df = df.dropna(subset=['Flow (ML)']) # Ensure 'Flow (ML)' has no NaN values
df['Date'] = pd.to_datetime(df['Date']) # Ensure the 'Date' column is in datetime format
df['Year'] = df['Date'].dt.year # Extract the year from the 'Date' column and add it as a new column
df_max_flow = df.loc[df.groupby('Year')['Flow (ML)'].idxmax()] # Max daily per year
df_max_flow_sorted = df_max_flow.sort_values(by='Flow (ML)', ascending=False) # Sort by 'Flow (ML)' in descending order
df_max_flow_sorted['Rank'] = range(1, len(df_max_flow_sorted) + 1)
df_max_flow_sorted['ARI_Empir'] = (df_max_flow_sorted['Rank'].iloc[-1] + 1) / df_max_flow_sorted['Rank']
df_max_flow_sorted['AEP_Empir'] = 1 / df_max_flow_sorted['ARI_Empir']
data = df_max_flow_sorted['Flow (ML)'].values

# Fit Log-Pearson3 using Maximun Likelihood Estimation method (MLE)
dist_name = 'pearson3'
# Use getattr to dynamically get the fitting method command
scipy_dist_fit = getattr(st, dist_name)
# Calculate natural log of the data
log_data = np.log(data)
param = scipy_dist_fit.fit(log_data)

# Applying the Kolmogorov-Smirnov test
ks_stat, p_val = st.kstest(log_data, dist_name, args=param)

print(f"parameters: {param}")
print(f"ks_stat, p_val: {ks_stat, p_val}")

ARI_dict = {}
AEP_dict = {}
data_dict = {}
logdata_dict = {}

# test the results of ARI and AEP according to the fitted distribution
loc = param[1]
scale = param[2]
shape = param[0]
# get the attribute to run SciPy stat in the loop
scipy_dist_fit = getattr(st, dist_name)
# run the SciPy stat
fitted_dist = scipy_dist_fit(shape, loc=loc, scale=scale)

# Calculate the return period and AEP based on the fitted_dist and add relevant columns to the table
AEP = 1 - fitted_dist.cdf(log_data)
ARI = 1 / AEP

# Store AEP and ARI in a dictionary
AEP_dict['AEP_LP3_MLE'] = np.sort(AEP)
ARI_dict['ARI_LP3_MLE'] = np.sort(ARI)[::-1]
data_dict['data'] = data
logdata_dict['log_data'] = log_data

AEP_df = pd.DataFrame(AEP_dict)
ARI_df = pd.DataFrame(ARI_dict)
data_df = pd.DataFrame(data_dict)
logdata_df = pd.DataFrame(logdata_dict)

# Reset indices of all DataFrames to ensure they align correctly
AEP_df = AEP_df.reset_index(drop=True)
ARI_df = ARI_df.reset_index(drop=True)
df_max_flow_sorted = df_max_flow_sorted.reset_index(drop=True)
data_df = data_df.reset_index(drop=True)
logdata_df = logdata_df.reset_index(drop=True)

df_fitDist = pd.concat([df_max_flow_sorted, AEP_df, ARI_df], axis=1)
LP3_FFA = pd.concat([data_df, logdata_df, AEP_df, ARI_df], axis=1)

# data = fitted_dist.ppf(1 - AEP) # Inverse of CDF
flood_100year = np.exp(fitted_dist.ppf(1 - 0.01))
# flood_100year =fitted_dist.ppf(1 - 0.01)
print(f"100-Year Flood = {flood_100year}")
flood_50year = np.exp(fitted_dist.ppf(1 - 0.02))
# flood_50year = fitted_dist.ppf(1 - 0.02)
print(f"50-Year Flood = {flood_50year}")

Подробнее здесь: https://stackoverflow.com/questions/792 ... -in-python

1734027289

Anonymous

Я пытаюсь найти способ подогнать дистрибутив Log Pearson3 к моим данным потока, но не могу найти способ, как это сделать! Будем очень признательны за любые советы.
Вот проблема:
И в пакетах scipy, и в lmoment3 есть Pearson3, но у них нет подходящих дистрибутивов Log Pearson3! scipy использует метод оценки максимального правдоподобия (MLE) для соответствия распределению, а пакет lmoment3 соответствует распределениям с помощью метода l-moment. Но, как уже было сказано, в их списке статистических распределений есть только распределение Пирсона3.
Я рассчитываю данные годового максимального стока (AMF), которые возвращают временной ряд, скажем, за 50 лет речного стока. данные. Затем я использую пакеты scipy и lmoment3, соответствующие дистрибутивам.  Я думал, что если я вычислю логарифм моего AMF, а затем подберу Pearson3, а затем в конце вычислю антилогарифм, это будет похоже на подгонку логарифма Pearson3, но похоже, что это не так! Существуют различия в том, как оцениваются параметры в Pearson3 и Log Pearson3!
и я не могу найти подходящего руководства в Интернете!
Есть мысли об этом?
ниже я использую код:
import os
import numpy as np
import pandas as pd
import scipy.stats as st
import lmoments3 as lm
from lmoments3 import distr, stats

# Define the folder path and get the list of CSV files
folder_path = '.../daily_ts/'
# folder_path = '.../all_sites/'

csv_files = sorted([f for f in os.listdir(folder_path) if f.endswith('.csv')])

# Iterate over each file and process the data
for i, file in enumerate(csv_files[5:6]):
station_code = file.split('_')[0]
# Read the data
df = pd.read_csv(os.path.join(folder_path, file), skiprows=27, names=['Date', 'Flow (ML)', 'Bureau QCode'])
station_code = file.split('_')[0]

# Process the data
df = df.dropna(subset=['Flow (ML)']) # Ensure 'Flow (ML)' has no NaN values
df['Date'] = pd.to_datetime(df['Date'])  # Ensure the 'Date' column is in datetime format
df['Year'] = df['Date'].dt.year  # Extract the year from the 'Date' column and add it as a new column
df_max_flow = df.loc[df.groupby('Year')['Flow (ML)'].idxmax()]  # Max daily per year
df_max_flow_sorted = df_max_flow.sort_values(by='Flow (ML)', ascending=False)  # Sort by 'Flow (ML)' in descending order
df_max_flow_sorted['Rank'] = range(1, len(df_max_flow_sorted) + 1)
df_max_flow_sorted['ARI_Empir'] = (df_max_flow_sorted['Rank'].iloc[-1] + 1) / df_max_flow_sorted['Rank']
df_max_flow_sorted['AEP_Empir'] = 1 / df_max_flow_sorted['ARI_Empir']
data = df_max_flow_sorted['Flow (ML)'].values

# Fit Log-Pearson3 using Maximun Likelihood Estimation method (MLE)
dist_name = 'pearson3'
# Use getattr to dynamically get the fitting method command
scipy_dist_fit = getattr(st, dist_name)
# Calculate natural log of the data
log_data = np.log(data)
param = scipy_dist_fit.fit(log_data)

# Applying the Kolmogorov-Smirnov test
ks_stat, p_val = st.kstest(log_data, dist_name, args=param)

print(f"parameters: {param}")
print(f"ks_stat, p_val: {ks_stat, p_val}")

ARI_dict = {}
AEP_dict = {}
data_dict = {}
logdata_dict = {}

# test the results of ARI and AEP according to the fitted distribution
loc = param[1]
scale = param[2]
shape = param[0]
# get the attribute to run SciPy stat in the loop
scipy_dist_fit = getattr(st, dist_name)
# run the SciPy stat
fitted_dist = scipy_dist_fit(shape, loc=loc, scale=scale)

# Calculate the return period and AEP based on the fitted_dist and add relevant columns to the table
AEP = 1 - fitted_dist.cdf(log_data)
ARI = 1 / AEP

# Store AEP and ARI in a dictionary
AEP_dict['AEP_LP3_MLE'] = np.sort(AEP)
ARI_dict['ARI_LP3_MLE'] = np.sort(ARI)[::-1]
data_dict['data'] = data
logdata_dict['log_data'] = log_data

AEP_df = pd.DataFrame(AEP_dict)
ARI_df = pd.DataFrame(ARI_dict)
data_df = pd.DataFrame(data_dict)
logdata_df = pd.DataFrame(logdata_dict)

# Reset indices of all DataFrames to ensure they align correctly
AEP_df = AEP_df.reset_index(drop=True)
ARI_df = ARI_df.reset_index(drop=True)
df_max_flow_sorted = df_max_flow_sorted.reset_index(drop=True)
data_df = data_df.reset_index(drop=True)
logdata_df = logdata_df.reset_index(drop=True)

df_fitDist = pd.concat([df_max_flow_sorted, AEP_df, ARI_df], axis=1)
LP3_FFA = pd.concat([data_df, logdata_df, AEP_df, ARI_df], axis=1)

# data = fitted_dist.ppf(1 - AEP)  # Inverse of CDF
flood_100year = np.exp(fitted_dist.ppf(1 - 0.01))
# flood_100year =fitted_dist.ppf(1 - 0.01)
print(f"100-Year Flood = {flood_100year}")
flood_50year = np.exp(fitted_dist.ppf(1 - 0.02))
# flood_50year = fitted_dist.ppf(1 - 0.02)
print(f"50-Year Flood = {flood_50year}")
 

Подробнее здесь: [url]https://stackoverflow.com/questions/79209572/log-pearson3-distribution-fitting-using-scipy-and-lmoments3-in-python[/url]