Вручную агрегировать прогнозы дерева для LightGBMRegressor, чтобы они соответствовали методу прогнозирования модели.

Вручную агрегировать прогнозы дерева для LightGBMRegressor, чтобы они соответствовали методу прогнозирования модели. ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

Вручную агрегировать прогнозы дерева для LightGBMRegressor, чтобы они соответствовали методу прогнозирования модели.

Цитата

Сообщение Anonymous » 15 янв 2025, 18:56

Я использую LightGBM для решения задачи регрессии и пытаюсь вручную воспроизвести процесс прогнозирования LGBMRegressor, чтобы понять, как модель работает внутри (поэтому, когда она будет работать, я не буду рассчитывать медиану вместо среднего значения) . Однако моя ручная агрегация прогнозов дерева не соответствует результату метода прогнозирования. Вот что я пробовал на данный момент:

Код: Выделить всё

import numpy as np
import lightgbm as lgb
from sklearn.model_selection import train_test_split

# Generate some random regression data
np.random.seed(42)
X = np.random.rand(100, 5)
y = 4 * X[:, 0] - 2 * X[:, 1] + np.random.rand(100) * 0.1

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the LGBMRegressor
model = lgb.LGBMRegressor(n_estimators=10, learning_rate=0.1, random_state=42)
model.fit(X_train, y_train)

# Make a prediction using the model's predict method
X_single = X_test[0].reshape(1, -1)  # Take a single sample for demonstration
y_pred_regular = model.predict(X_single)
print(f"Model's predict method output: {y_pred_regular[0]}")

# Manual aggregation of tree predictions
booster = model.booster_

# Initialize manual prediction
manual_prediction = 0.0

# Get the base value (initial score from the first iteration)
if booster.__getattribute__("dump_model")()["objective"] == "regression":
base_value = model.booster_.dump_model()["average_output"]
manual_prediction += base_value

# Retrieve tree contributions for a single test sample
for tree_index in range(model.n_estimators):
# Get raw score for the current tree
current_raw = booster.predict(X_single, num_iteration=tree_index + 1, raw_score=True)[0]

if tree_index == 0:
# For the first tree, use the direct raw score
manual_prediction -= base_value
manual_prediction += current_raw
else:
# For subsequent trees, consider the difference with the previous raw score
previous_raw = booster.predict(X_single, num_iteration=tree_index, raw_score=True)[0]
manual_prediction += model.learning_rate * (current_raw - previous_raw)

# Print the manually aggregated prediction
print(f"Manual aggregation prediction: {manual_prediction:.20f}")
print(f"Difference: {y_pred_regular[0] - manual_prediction}")

Вывод метода прогнозирования модели: 1,5239200230002279
Прогнозирование ручного агрегирования: 1,25415029164310753984
Разница: 0,26976973135712035
Вы можете помочь?

Подробнее здесь: https://stackoverflow.com/questions/793 ... e-models-p

1736956561

Anonymous

Я использую LightGBM для решения задачи регрессии и пытаюсь вручную воспроизвести процесс прогнозирования LGBMRegressor, чтобы понять, как модель работает внутри (поэтому, когда она будет работать, я не буду рассчитывать медиану вместо среднего значения) . Однако моя ручная агрегация прогнозов дерева не соответствует результату метода прогнозирования. Вот что я пробовал на данный момент:
[code]import numpy as np
import lightgbm as lgb
from sklearn.model_selection import train_test_split

# Generate some random regression data
np.random.seed(42)
X = np.random.rand(100, 5)
y = 4 * X[:, 0] - 2 * X[:, 1] + np.random.rand(100) * 0.1

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the LGBMRegressor
model = lgb.LGBMRegressor(n_estimators=10, learning_rate=0.1, random_state=42)
model.fit(X_train, y_train)

# Make a prediction using the model's predict method
X_single = X_test[0].reshape(1, -1)  # Take a single sample for demonstration
y_pred_regular = model.predict(X_single)
print(f"Model's predict method output: {y_pred_regular[0]}")

# Manual aggregation of tree predictions
booster = model.booster_

# Initialize manual prediction
manual_prediction = 0.0

# Get the base value (initial score from the first iteration)
if booster.__getattribute__("dump_model")()["objective"] == "regression":
base_value = model.booster_.dump_model()["average_output"]
manual_prediction += base_value

# Retrieve tree contributions for a single test sample
for tree_index in range(model.n_estimators):
# Get raw score for the current tree
current_raw = booster.predict(X_single, num_iteration=tree_index + 1, raw_score=True)[0]

if tree_index == 0:
# For the first tree, use the direct raw score
manual_prediction -= base_value
manual_prediction += current_raw
else:
# For subsequent trees, consider the difference with the previous raw score
previous_raw = booster.predict(X_single, num_iteration=tree_index, raw_score=True)[0]
manual_prediction += model.learning_rate * (current_raw - previous_raw)

# Print the manually aggregated prediction
print(f"Manual aggregation prediction: {manual_prediction:.20f}")
print(f"Difference: {y_pred_regular[0] - manual_prediction}")
[/code]
Вывод метода прогнозирования модели: 1,5239200230002279
Прогнозирование ручного агрегирования: 1,25415029164310753984
Разница: 0,26976973135712035
Вы можете помочь? 

Подробнее здесь: [url]https://stackoverflow.com/questions/79358903/manually-aggregate-tree-predictions-for-lightgbmregressor-to-match-the-models-p[/url]