Среднеквадратичное отклонение от измеренных значений поли1д-функции оказалось неожиданно высоким?Python

Программы на Python
Ответить
Anonymous
 Среднеквадратичное отклонение от измеренных значений поли1д-функции оказалось неожиданно высоким?

Сообщение Anonymous »

У меня есть следующий код для прогнозирования цен на акции Ethereum:

Код: Выделить всё

# !pip install yfinance

import numpy
import yfinance

STOCK_TICKER = "ETH-USD"

START_DATE = "2016-01-01"

END_DATE   = "2021-01-01"

dataframe = yfinance.download(tickers = STOCK_TICKER,
start   = START_DATE,
end     = END_DATE)
print(dataframe[:5])

print(dataframe.shape)

dataframe["Volume Decrease/Increase"] = numpy.where(dataframe["Volume"].shift(-1) >
dataframe["Volume"],
1,
0)

dataframe["Buy or Sell"] = numpy.where(dataframe["Close"].shift(-1) >  dataframe["Close"],
1,
0)

dataframe["Returns"] = dataframe["Close"].pct_change()

print('Dataframe: ',dataframe[:5])

dataframe = dataframe.dropna()

print('Dataframe:',dataframe[:5])

X = dataframe["Open"].values

print('Open values: ', X[:5])

y = dataframe["Adj Close"].values

print('Close values: ', y[:5])

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size = 1/5)

import matplotlib.pyplot as pyplot

pyplot.plot(X_train, y_train, "bo")

pyplot.plot(X_train, numpy.zeros_like(X_train), "r+")

pyplot.show()

pyplot.plot(X_test, y_test, "bo")

pyplot.plot(X_test, numpy.zeros_like(X_test), "r+")

pyplot.show()

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X_train.reshape(-1, 1), y_train)

predictions = model.predict(X_test.reshape(-1, 1))

print('Linear prediction:', predictions[:5])

pyplot.plot(predictions)
pyplot.show()

COEFFICIENT = 1

from sklearn.metrics import mean_squared_error

# print('X_train: ', X_train[:5])
# print('y_train: ', y_train[:5])
# print('X_train.shape():', X_train.shape)

X_train = numpy.array(X_train).reshape(-1)
# print('X_train: ', X_train[:5])
# print('X_train.shape():', X_train.shape)

y_train = numpy.array(y_train).reshape(-1)
# print('y_train: ', y_train[:5])
# print('y_train.shape():', y_train.shape)

model_1d_polynomial = numpy.poly1d(numpy.polyfit(X_train, y_train,
COEFFICIENT))

model_1d_predictions = model_1d_polynomial(X_test)

print('Polynomial 1d prediction:', model_1d_predictions[:5])
print('Y-values:', y_test[:5])
print('Y-values, mean:', y_test.mean())
print('Y-values, standard deviation:', numpy.sqrt(y_test.var()))
print('Predictions 1d-model y-values mean:', model_1d_predictions.mean())
print('Polynomial 2d prediction, standard deviation:',
numpy.sqrt(model_1d_predictions.var()))

print('Polynomial 1d prediction, mean squared error deviation from y-
values:',numpy.sqrt(mean_squared_error(y_test, model_1d_predictions)))

COEFFICIENT_2D = 2

model_2d_polynomial = numpy.poly1d(numpy.polyfit(X_train,
y_train,
COEFFICIENT_2D))

model_2d_predictions = model_2d_polynomial(X_test)

print('Polynomial 2d prediction', model_2d_predictions[:5])
print('Y-values:', y_test[:5])
print('Y-values, mean:', y_test.mean())
print('Y-values, standard deviation:', numpy.sqrt(y_test.var()))

print('Predictions 2d-model y-values mean:', model_2d_predictions.mean())
print('Polynomial 2d prediction, standard deviation:',
numpy.sqrt(model_2d_predictions.var()))

print('Polynomial 2d prediction, mean squared error deviation from y-
values:',numpy.sqrt(mean_squared_error(y_test, model_2d_predictions)))

COEFFICIENT_9D = 9

model_9d_polynomial = numpy.poly1d(numpy.polyfit(X_train,
y_train,
COEFFICIENT_9D))

model_9d_predictions = model_9d_polynomial(X_test)

print('Polynomial 9d prediction', model_9d_predictions[:5])
print('Y-values:', y_test[:5])
print('Y-values, mean:', y_test.mean())
print('Y-values, standard deviation:', numpy.sqrt(y_test.var()))

print('Predictions 9d-model y-values mean:', model_9d_predictions.mean())
print('Polynomial 9d prediction, standard deviation :',
numpy.sqrt(model_9d_predictions.var()))

pyplot.plot(model_9d_predictions)
pyplot.show()

print('Polynomial 9d prediction, mean squared error deviation from y-values:',
numpy.sqrt(mean_squared_error(y_test, model_9d_predictions)))
И я получаю следующий результат:

Код: Выделить всё

Price                       Adj Close       Close  ...        Open      Volume
Ticker                        ETH-USD     ETH-USD  ...     ETH-USD     ETH-USD
Date                                               ...
2017-11-09 00:00:00+00:00  320.884003  320.884003  ...  308.644989   893249984
2017-11-10 00:00:00+00:00  299.252991  299.252991  ...   320.670990   885985984
2017-11-11 00:00:00+00:00  314.681000  314.681000  ...  298.585999   842300992
2017-11-12 00:00:00+00:00  307.907990  307.907990  ...  314.690002  1613479936
2017-11-13 00:00:00+00:00  316.716003  316.716003  ...  307.024994  1041889984

[5 rows x 6 columns]
(1149, 6)
Dataframe:  Price                       Adj Close       Close  ... Buy or Sell
Returns
Ticker                        ETH-USD     ETH-USD  ...
Date                                               ...
2017-11-09 00:00:00+00:00  320.884003  320.884003  ...           0       NaN
2017-11-10 00:00:00+00:00  299.252991  299.252991  ...           1 -0.067411
2017-11-11 00:00:00+00:00  314.681000  314.681000  ...           0  0.051555
2017-11-12 00:00:00+00:00  307.907990  307.907990  ...           1 -0.021523
2017-11-13 00:00:00+00:00  316.716003  316.716003  ...           1  0.028606

[5 rows x 9 columns]
Dataframe: Price                       Adj Close       Close  ... Buy or Sell   Returns
Ticker                        ETH-USD     ETH-USD  ...
Date                                               ...
2017-11-10 00:00:00+00:00  299.252991  299.252991  ...           1 -0.067411
2017-11-11 00:00:00+00:00  314.681000  314.681000  ...           0  0.051555
2017-11-12 00:00:00+00:00  307.907990  307.907990  ...           1 -0.021523
2017-11-13 00:00:00+00:00  316.716003  316.716003  ...           1  0.028606
2017-11-14 00:00:00+00:00  337.631012  337.631012  ...           0  0.066037

[5 rows x 9 columns]
Open values:  [[320.67098999]
[298.58599854]
[314.69000244]
[307.0249939 ]
[316.76300049]]
Close values:  [[299.25299072]
[314.68099976]
[307.9079895 ]
[316.71600342]
[337.63101196]]
Linear prediction: [[1063.41281545]
[ 283.62464302]
[ 491.11383606]
[ 171.44437377]
[ 236.82333007]]
Polynomial 1d prediction: [[1063.41281545]
[ 283.62464302]
[ 491.11383606]
[ 171.44437377]
[ 236.82333007]]
Y-values: [[1056.0300293 ]
[ 288.04598999]
[ 499.64199829]
[ 179.87220764]
[ 265.40612793]]
Y-values, mean: 321.5382971722147
Y-values, standard deviation: 227.31418611687224
Predictions 1d-model y-values mean: 324.2891475549503
Polynomial 1d prediction, standard deviation: 230.8754488392883
Polynomial 1d prediction, mean squared error deviation from y-values: 21.81828723995999
Polynomial 2d prediction [[1054.52831851]
[ 284.98404354]
[ 494.98426677]
[ 169.86469522]
[ 237.09149459]]
Y-values: [[1056.0300293 ]
[ 288.04598999]
[ 499.64199829]
[ 179.87220764]
[ 265.40612793]]
Y-values, mean: 321.5382971722147
Y-values, standard deviation: 227.31418611687224
Predictions 2d-model y-values mean: 324.08912377022153
Polynomial 2d prediction, standard deviation: 231.06080845612234
Polynomial 2d prediction, mean squared error deviation from y-values: 22.126271098973927
Polynomial 9d prediction [[1077.04611029]
[ 283.61433033]
[ 491.67316406]
[ 170.04975312]
[ 235.53584864]]
Y-values: [[1056.0300293 ]
[ 288.04598999]
[ 499.64199829]
[ 179.87220764]
[ 265.40612793]]
Y-values, mean: 321.5382971722147
Y-values, standard deviation: 227.31418611687224
Predictions 9d-model y-values mean: 324.1833043915265
Polynomial 9d prediction, standard deviation : 230.70330227179178
Polynomial 9d prediction, mean squared error deviation from y-values:
23.262691546867423

Process finished with exit code 0
Итак, среднее значение прогнозируемых значений всегда на 2,5 выше среднего значения измеренных значений, и всегда существует огромное среднеквадратичное отклонение от измеренных значений, чего я бы не стал ожидаемо, так как это отклонение должно составлять около < 1.
Я использую Python 3.11 в качестве интерпретатора и Numpy версии 1.26.3 на ноутбуке с Windows 11 с программой DataSpell.
Заранее спасибо,

Подробнее здесь: https://stackoverflow.com/questions/791 ... t-unexpect
Ответить

Быстрый ответ

Изменение регистра текста: 
Смайлики
:) :( :oops: :roll: :wink: :muza: :clever: :sorry: :angel: :read: *x)
Ещё смайлики…
   
К этому ответу прикреплено по крайней мере одно вложение.

Если вы не хотите добавлять вложения, оставьте поля пустыми.

Максимально разрешённый размер вложения: 15 МБ.

Вернуться в «Python»