Как повысить точность случайного лесного классификатора? - Цифровое Кемерово

Как повысить точность случайного лесного классификатора? ⇐ Python

Ответить Пред. тема След. тема

1 сообщение • Страница 1 из 1

Anonymous

Как повысить точность случайного лесного классификатора?

Цитата

Сообщение Anonymous » 17 июл 2025, 15:49

У меня есть лесной классификатор. Его точность составляет около 61%. Я хочу попытаться повысить точность, но то, что я уже пытался, не увеличивает ее значительно. Код показан ниже: < /p>

Код: Выделить всё

# importing time module to record the time of running the program
import time
begin_time = time.process_time()

# importing modules
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt

# we will use random forest classifier as our classifier
logistic_regression = LogisticRegression()
forest_classifier = RandomForestClassifier(max_depth=4, random_state=0)

# reading in accelerometer data
time_train = pd.read_csv("https://courses.edx.org/assets/courseware/v1/b98039c3648763aae4f153a6ed32f38b/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/train_time_series.csv", index_col=0)
labels_train = pd.read_csv("https://courses.edx.org/assets/courseware/v1/d64e74647423e525bbeb13f2884e9cfa/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/train_labels.csv", index_col=0)
time_test = pd.read_csv("https://courses.edx.org/assets/courseware/v1/1ca4f3d4976f07b8c4ecf99cf8f7bdbc/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/test_time_series.csv", index_col=0)
labels_test = pd.read_csv("https://courses.edx.org/assets/courseware/v1/72d5933c310cf5eac3fa3f28b26d9c39/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/test_labels.csv", index_col=0)

# making lists out of the x, y, z columns
x, y, z = time_train.iloc[3::10][['x', 'y', 'z']].T.values
labels_train[['x', 'y', 'z']] = np.stack([x, y, z], axis=1)

# doing the same with the test dataframe
x1, y1, z1 = time_test.iloc[9::10][['x', 'y', 'z']].T.values
labels_test[['x', 'y', 'z']] = np.stack([x1, y1, z1], axis=1)
labels_test.head(50)

# plotting the results on 3D graph

%matplotlib notebook
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.scatter(x, y, z, c=y) # to plot a scatter plot

ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")

# now splitting the dataframe into train (75%) and test data (25%) with random_state=1
X = labels_train[['x', 'y', 'z']]
y = labels_train['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=1)

# now choosing the best classifier. The code is based on the Case Study 7 Part 2
def correlation(estimator, X, y):
predictions = estimator.fit(X, y).predict(X)
return r2_score(y, predictions)

def accuracy(estimator, X, y):
predictions = estimator.fit(X, y).predict(X)
return accuracy_score(y, predictions)

regression_outcome = labels_train['label']
classification_outcome = labels_train['label']
covariates = labels_train[['x', 'y', 'z']]

logistic_regression_scores = cross_val_score(logistic_regression, covariates, classification_outcome, cv=10, scoring=accuracy)
forest_classification_scores = cross_val_score(forest_classifier, covariates, classification_outcome, cv=10, scoring=accuracy)

plt.axes().set_aspect('equal', 'box')
plt.scatter(logistic_regression_scores, forest_classification_scores)
plt.plot((0, 1), (0, 1), 'k-')

plt.xlim(0, 1)
plt.ylim(0, 1)
plt.xlabel("Logistic Regression Score")
plt.ylabel("Forest Classification Score")

plt.show()

np.mean(forest_classification_scores)

# tuning in Random Forest.  The idea is taken from Katarina Pavlović - Predicting the type of physical activity from tri-axial smartphone accelerometer data
from sklearn.model_selection import RandomizedSearchCV

estimators = [] # the number of trees in our random forest
for x in range(100, 1001, 10):
estimators.append(int(x))

max_features=['auto', 'sqrt'] # Number of features to consider at every split

# Maximum number of levels in tree
max_depth = []
for x in range(3, 31):
max_depth.append(int(x))
max_depth.append(None)
print(max_depth)

# Minimum number of samples required to split a node
min_samples_split=[2, 5, 10]

# Minimum number of samples required at each leaf
min_samples_leaf=[1, 2, 3]

# Method of selecting samples for training each tree
bootstrap=[True, False]

random_grid = {'n_estimators': estimators, 'max_features': max_features,
'max_depth': max_depth, 'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf, 'bootstrap': bootstrap}

# Find the best parameters for the Random Forest Classifer (for a better fit)
# and the score if those paremeters were used
rf_random = RandomizedSearchCV(estimator=forest_classifier, param_distributions=random_grid, n_iter =100, cv=3, verbose=2, random_state=1)
rf_random.fit(covariates, classification_outcome)
Best_params = rf_random.best_params_
print(Best_params)
print(rf_random.best_score_)

forest_classifier= RandomForestClassifier(n_estimators=300, min_samples_split=10, min_samples_leaf=3, max_features='sqrt', max_depth=20, bootstrap=True)

# Calculate the accuracy of the classifer on the test set created in Section B1.2
forest_classifier.fit(X_train, y_train)
forest_predictions = forest_classifier.predict(X_test)
accuracy_score(y_test, forest_predictions)
< /code>
Я попытался использовать RandomizedSearchCV, но это не очень помогает. Чтобы быть более конкретным - данные взяты из базы данных, и здесь больше нет данных, связанных с темой. Вот этот код: < /p>
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(n_estimators=100,max_depth=5)

# fit the model with the training data
model.fit(X_train,y_train)

# predict the target on the train dataset
predict_train = model.predict(X_train)
print('\nTarget on train data',predict_train)

# Accuray Score on train dataset
accuracy_train = accuracy_score(y_train,predict_train)
print('\naccuracy_score on train dataset : ', accuracy_train)

Можете ли вы что-нибудь предложить мне? Может быть, есть другие методы?

Подробнее здесь: https://stackoverflow.com/questions/758 ... classifier

Реклама

1752756544

Anonymous

 У меня есть лесной классификатор. Его точность составляет около 61%. Я хочу попытаться повысить точность, но то, что я уже пытался, не увеличивает ее значительно. Код показан ниже: < /p>
[code]# importing time module to record the time of running the program
import time
begin_time = time.process_time()

# importing modules
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt

# we will use random forest classifier as our classifier
logistic_regression = LogisticRegression()
forest_classifier = RandomForestClassifier(max_depth=4, random_state=0)

# reading in accelerometer data
time_train = pd.read_csv("https://courses.edx.org/assets/courseware/v1/b98039c3648763aae4f153a6ed32f38b/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/train_time_series.csv", index_col=0)
labels_train = pd.read_csv("https://courses.edx.org/assets/courseware/v1/d64e74647423e525bbeb13f2884e9cfa/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/train_labels.csv", index_col=0)
time_test = pd.read_csv("https://courses.edx.org/assets/courseware/v1/1ca4f3d4976f07b8c4ecf99cf8f7bdbc/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/test_time_series.csv", index_col=0)
labels_test = pd.read_csv("https://courses.edx.org/assets/courseware/v1/72d5933c310cf5eac3fa3f28b26d9c39/asset-v1:HarvardX+PH526x+3T2022+type@asset+block/test_labels.csv", index_col=0)

# making lists out of the x, y, z columns
x, y, z = time_train.iloc[3::10][['x', 'y', 'z']].T.values
labels_train[['x', 'y', 'z']] = np.stack([x, y, z], axis=1)

# doing the same with the test dataframe
x1, y1, z1 = time_test.iloc[9::10][['x', 'y', 'z']].T.values
labels_test[['x', 'y', 'z']] = np.stack([x1, y1, z1], axis=1)
labels_test.head(50)

# plotting the results on 3D graph

%matplotlib notebook
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.scatter(x, y, z, c=y) # to plot a scatter plot

ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")

# now splitting the dataframe into train (75%) and test data (25%) with random_state=1
X = labels_train[['x', 'y', 'z']]
y = labels_train['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=1)

# now choosing the best classifier. The code is based on the Case Study 7 Part 2
def correlation(estimator, X, y):
predictions = estimator.fit(X, y).predict(X)
return r2_score(y, predictions)

def accuracy(estimator, X, y):
predictions = estimator.fit(X, y).predict(X)
return accuracy_score(y, predictions)

regression_outcome = labels_train['label']
classification_outcome = labels_train['label']
covariates = labels_train[['x', 'y', 'z']]

logistic_regression_scores = cross_val_score(logistic_regression, covariates, classification_outcome, cv=10, scoring=accuracy)
forest_classification_scores = cross_val_score(forest_classifier, covariates, classification_outcome, cv=10, scoring=accuracy)

plt.axes().set_aspect('equal', 'box')
plt.scatter(logistic_regression_scores, forest_classification_scores)
plt.plot((0, 1), (0, 1), 'k-')

plt.xlim(0, 1)
plt.ylim(0, 1)
plt.xlabel("Logistic Regression Score")
plt.ylabel("Forest Classification Score")

plt.show()

np.mean(forest_classification_scores)

# tuning in Random Forest.  The idea is taken from Katarina Pavlović - Predicting the type of physical activity from tri-axial smartphone accelerometer data
from sklearn.model_selection import RandomizedSearchCV

estimators = [] # the number of trees in our random forest
for x in range(100, 1001, 10):
estimators.append(int(x))

max_features=['auto', 'sqrt'] # Number of features to consider at every split

# Maximum number of levels in tree
max_depth = []
for x in range(3, 31):
max_depth.append(int(x))
max_depth.append(None)
print(max_depth)

# Minimum number of samples required to split a node
min_samples_split=[2, 5, 10]

# Minimum number of samples required at each leaf
min_samples_leaf=[1, 2, 3]

# Method of selecting samples for training each tree
bootstrap=[True, False]

random_grid = {'n_estimators': estimators, 'max_features': max_features,
'max_depth': max_depth, 'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf, 'bootstrap': bootstrap}

# Find the best parameters for the Random Forest Classifer (for a better fit)
# and the score if those paremeters were used
rf_random = RandomizedSearchCV(estimator=forest_classifier, param_distributions=random_grid, n_iter =100, cv=3, verbose=2, random_state=1)
rf_random.fit(covariates, classification_outcome)
Best_params = rf_random.best_params_
print(Best_params)
print(rf_random.best_score_)

forest_classifier= RandomForestClassifier(n_estimators=300, min_samples_split=10, min_samples_leaf=3, max_features='sqrt', max_depth=20, bootstrap=True)

# Calculate the accuracy of the classifer on the test set created in Section B1.2
forest_classifier.fit(X_train, y_train)
forest_predictions = forest_classifier.predict(X_test)
accuracy_score(y_test, forest_predictions)
< /code>
Я попытался использовать RandomizedSearchCV, но это не очень помогает. Чтобы быть более конкретным - данные взяты из базы данных, и здесь больше нет данных, связанных с темой. Вот этот код: < /p>
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(n_estimators=100,max_depth=5)

# fit the model with the training data
model.fit(X_train,y_train)

# predict the target on the train dataset
predict_train = model.predict(X_train)
print('\nTarget on train data',predict_train)

# Accuray Score on train dataset
accuracy_train = accuracy_score(y_train,predict_train)
print('\naccuracy_score on train dataset : ', accuracy_train)
[/code]
Можете ли вы что-нибудь предложить мне? Может быть, есть другие методы? 
 

Подробнее здесь: [url]https://stackoverflow.com/questions/75857433/how-to-increase-the-accuracy-of-random-forest-classifier[/url]

Ответить Пред. тема След. тема

1 сообщение • Страница 1 из 1

Быстрый ответ

Заголовок:

Имя пользователя:

Изменение регистра текста:

Смайлики

Ещё смайлики…

К этому ответу прикреплено по крайней мере одно вложение.

Если вы не хотите добавлять вложения, оставьте поля пустыми. Можно прикреплять файлы, перетаскивая их в окно сообщения.

Максимально разрешённый размер вложения: 15 МБ.

Имя файла:

Комментарий к файлу:

Имя файла	Комментарий к файлу	Размер	Статус

Похожие темы

Ответы

Просмотры

Последнее сообщение

Низкая точность от геопространственного случайного лесного моделирования.

Последнее сообщение Anonymous « 23 апр 2025, 12:14
Добавлено в форуме Python

Anonymous » 23 апр 2025, 12:14 » в форуме Python

Я делаю геопространственную оценку, интегрированную с моделированием ML. Проблема заключается в очень низком проценте точности, так как увеличивается больше тренировочных функций, она становится ниже. Каким может быть решение такой проблемы?
код:...

0 Ответы

8 Просмотры

Последнее сообщение Anonymous
23 апр 2025, 12:14
Функция st_makeenvelope (двойная точность, двойная точность, двойная точность, двойная точность, целое число) не существ

Последнее сообщение Anonymous « 07 авг 2025, 10:55
Добавлено в форуме JAVA

Anonymous » 07 авг 2025, 10:55 » в форуме JAVA

function st_makeenvelope (двойная точность, двойная точность, двойная точность, двойная точность, целое число) не существует
Подсказка: Никакая функция не соответствует данным имени и типам аргументов. Вам может потребоваться добавить явные типы....

0 Ответы

7 Просмотры

Последнее сообщение Anonymous
07 авг 2025, 10:55
GridSearchCV Настройка случайного лесного регрессора Лучшие параметры

Последнее сообщение Anonymous « 31 июл 2024, 13:03
Добавлено в форуме Python

Anonymous » 31 июл 2024, 13:03 » в форуме Python

Я хочу улучшить параметры этого GridSearchCV для регрессора случайного леса .

def Grid_Search_CV_RFR(X_train, y_train):
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import ShuffleSplit
from sklearn.ensemble import...

0 Ответы

16 Просмотры

Последнее сообщение Anonymous
31 июл 2024, 13:03
GridSearchCV Настройка случайного лесного регрессора Лучшие параметры

Последнее сообщение Anonymous « 01 дек 2024, 00:00
Добавлено в форуме Python

Anonymous » 01 дек 2024, 00:00 » в форуме Python

Я хочу улучшить параметры этого GridSearchCV для регрессора случайного леса .

def Grid_Search_CV_RFR(X_train, y_train):
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import ShuffleSplit
from sklearn.ensemble import...

0 Ответы

10 Просмотры

Последнее сообщение Anonymous
01 дек 2024, 00:00
Рассчитайте точность, полноту, точность и сбалансированную точность из матрицы путаницы.

Последнее сообщение Anonymous « 28 окт 2024, 21:33
Добавлено в форуме Python

Anonymous » 28 окт 2024, 21:33 » в форуме Python

Матрица путаницы показывает, как реальные метки сравниваются с прогнозируемыми метками для задачи двоичной классификации.
Используя матрицу путаницы, вычислите следующее:
Точность: какая часть прогнозов оказалась верной?
Точность: какая доля...

0 Ответы

40 Просмотры

Последнее сообщение Anonymous
28 окт 2024, 21:33

Вернуться в «Python»

Programmiererforum