Мой набор данных представляет собой список из двух столбцов, около 20 тысяч строк. .
Столбец 1 — FromNodeId
Столбец 2 — ToNodeId
Предполагается, что узлы представляют автора и их соавтор. Учитывая автора, мы хотим предсказать, с кем он будет сотрудничать.
Это мой код
Код: Выделить всё
#FOLLOWING CODE SEEMS TO BE WORKING GOOD LOCALLY! WILL PROBABLY TURN THIS ONE IN!
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import warnings
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import KNNImputer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.tree import plot_tree
from sklearn.metrics import accuracy_score
warnings.filterwarnings('ignore')
df = pd.read_csv(r"CA-GrQc.txt", sep="\t", header=None, skiprows=4, usecols=[0,1])
df.info()
X = df.iloc[:,0:1].values #features
y = df.iloc[:,1].values #target variable
# Check for and handle categorical variables
label_encoder = LabelEncoder()
x_categorical = df.select_dtypes(include=['object']).apply(label_encoder.fit_transform)
x_numerical = df.select_dtypes(exclude=['object']).values
x = pd.concat([pd.DataFrame(x_numerical), x_categorical], axis=1).values
del df
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42)
# Fitting Random Forest Classifier to the dataset
rf_classifier = RandomForestClassifier(n_estimators = 100, random_state = 42, max_depth=10)
# Fit the regressor with x and y data
rf_classifier.fit(X_train, y_train)
y_pred = rf_classifier.predict(X_test)
# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
r2 = r2_score(y_test, y_pred)
print(f'R-squared: {r2}')
# Calculate the absolute errors
errors = abs(y_pred - y_test)
# Calculate mean absolute percentage error (MAPE)
mape = 100 * (errors / y_test)
# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')
Код: Выделить всё
Mean Squared Error: 104365042.85075915
R-squared: -0.831195262553108
Accuracy: -398.14 %.
Код: Выделить всё
# Directed graph (each unordered pair of nodes is saved once): CA-GrQc.txt
# Collaboration network of Arxiv General Relativity category (there is an edge if authors coauthored at least one paper)
# Nodes: 5242 Edges: 28980
# FromNodeId ToNodeId
3466 937
3466 5233
3466 8579
3466 10310
3466 15931
3466 17038
Подробнее здесь: https://stackoverflow.com/questions/787 ... g-very-low