Предотвращение переоснащения в модели двойного натурального логарифма (Python)

Предотвращение переоснащения в модели двойного натурального логарифма (Python) ⇐ Python

Ответить

1 сообщение • Страница 1 из 1

Anonymous

Предотвращение переоснащения в модели двойного натурального логарифма (Python)

Цитата

Сообщение Anonymous » 22 ноя 2024, 06:35

У меня есть функция

Код: Выделить всё

y = a * ln(p*(t + 32)) - b * ln(q*(t + 30))

где:

Код: Выделить всё
```
y
```
и t — измеренные значения
Код: Выделить всё
```
a
```
, b, p и q являются подогнанными параметрами.

Как правило, использование Curve_fit из SciPy делает свое дело, однако я получаю определенные наборы данных с «нападками» при подгонке на низких частотах.

Код: Выделить всё

tЗначения

в котором находятся данные:

Код: Выделить всё

7.083,3.63E-12
14.291,3.49E-12
21.494,3.48E-12
28.709,3.59E-12
35.915,3.79E-12
43.02,3.60E-12
50.226,3.80E-12
57.431,3.84E-12
64.639,3.76E-12
71.846,3.79E-12
79.056,3.84E-12
86.262,3.71E-12
93.467,4.05E-12
100.578,3.93E-12
107.783,4.12E-12
114.992,4.01E-12
122.2,4.12E-12
129.403,4.30E-12
136.606,4.12E-12
143.812,4.28E-12

Вот еще:

в котором находятся данные:

Код: Выделить всё

5.98,6.44E-14
13.183,6.54E-14
20.385,4.21E-14
27.592,5.39E-14
34.801,2.16E-14
41.907,4.79E-14
49.112,5.81E-14
56.323,6.23E-14
63.526,6.82E-14
70.73,5.83E-14
77.944,4.31E-14
85.152,7.92E-14
92.258,8.07E-14
99.464,8.27E-14
106.672,7.12E-14
113.878,9.25E-14
121.091,1.04E-13
128.299,8.98E-14
135.508,8.40E-14
142.713,8.83E-14

Как предотвратить переобучение?
Скрипт приведен ниже:

Код: Выделить всё

import numpy as np
from scipy.optimize import curve_fit

class NaturalLogFitter():
def __init__(self):
self.neg_lneq_time = 30
self.pos_lneq_time = 32

def scale_y(self, y: np.ndarray):
y_oom = np.floor(np.log10(np.abs(y.max())))
self.y_scf = 10 ** y_oom
y_scaled = y * 10 ** -y_oom
return y_scaled

def natural_log_func(self, t, a, b, p, q):
return a * np.log(p*(t + self.pos_lneq_time)) - b * np.log(q*(t + self.neg_lneq_time))

def set_data(self, t: np.ndarray, y: np.ndarray):
self.t = t
self.y = self.scale_y(y)

def fit(self):
return curve_fit(self.natural_log_func, self.t, self.y, maxfev=100000)

def get_intercept(self, popt):
return self.natural_log_func(0, *popt) * self.y_scf

def get_intercept_error(self, pcov):
return 1 # placeholder for now

def get_intercept_and_error(self, t: np.ndarray, y: np.ndarray):
self.set_data(t, y)
popt, pcov    = self.fit()
intercept     = self.get_intercept(popt)
intercept_err = self.get_intercept_error(pcov)

return intercept, intercept_err

def get_curve(self, t: np.ndarray, y: np.ndarray):
self.set_data(t, y)
popt, _ = self.fit()

fitted_t = np.linspace(0, t.max(), 100)
fitted_y = self.natural_log_func(fitted_t, *popt) * self.y_scf

return fitted_t, fitted_y

if __name__ == "__main__":
import matplotlib.pyplot as plt

# Generate synthetic data
np.random.seed(0)

# Initialize the fitter
fitter = NaturalLogFitter()

# Get the intercept and error
t = np.array([5.98, 13.183, 20.385, 27.592, 34.801, 41.907, 49.112, 56.323, 63.526, 70.73, 77.944, 85.152, 92.258, 99.464, 106.672, 113.878, 121.091, 128.299, 135.508, 142.713])
y = np.array([6.44E-14, 6.54E-14, 4.21E-14, 5.39E-14, 2.16E-14, 4.79E-14, 5.81E-14, 6.23E-14, 6.82E-14, 5.83E-14, 4.31E-14, 7.92E-14, 8.07E-14, 8.27E-14, 7.12E-14, 9.25E-14, 1.04E-13, 8.98E-14, 8.40E-14, 8.83E-14])
intercept, intercept_err = fitter.get_intercept_and_error(t, y)
print(f"Intercept: {intercept}, Intercept Error:  {intercept_err}")

# Get the fitted curve
fitted_t, fitted_y = fitter.get_curve(t, y)

# Plot the data and the fit
plt.scatter(t, y, label='Data')
plt.plot(fitted_t, fitted_y, label='Fit', color='red')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

Чтобы улучшить свою точку зрения, я отправляю следующую подгонку:
[img]https:/ /i.sstatic.net/EDEdu0IZ.png[/img]

с данными:

Код: Выделить всё

    t = np.array([5.303, 12.508, 19.711, 26.916, 34.13, 41.335, 48.44, 55.645, 62.847, 70.051, 77.258, 84.46, 91.67, 98.877, 105.98, 113.286, 120.391, 127.498, 134.704, 141.909])
y = np.array([1459.0265113713006, 1461.3176024628895, 1470.7175396409543, 1467.8009514214775, 1466.5423320874656, 1473.6950214047804, 1467.676141431994, 1464.8844871395088, 1460.1777269623358, 1466.415876623664, 1466.2427296537421, 1467.0116731759495, 1468.5874962643757, 1461.1129950726006, 1470.1431311995577, 1469.7089696334676, 1469.1248745384707, 1469.0477801972727, 1472.1239998394688, 1465.2701939696003])

в котором «скачок» настолько велик, что становится неоправданно высоким, так что точка пересечения уменьшающегося наклона оказывается выше, чем любые фактические данные.>

Подробнее здесь: https://stackoverflow.com/questions/792 ... del-python

1732246508

Anonymous

У меня есть функция
[code]y = a * ln(p*(t + 32)) - b * ln(q*(t + 30))
[/code]
где:
[list]
[*][code]y[/code] и t — измеренные значения
[*][code]a[/code], b, p и q являются подогнанными параметрами.
[/list]
Как правило, использование Curve_fit  из SciPy делает свое дело, однако я получаю определенные наборы данных с «нападками» при подгонке на низких частотах.[code]tЗначения [/code]:
[img]https://i.sstatic.net/WxmDfG2w.png[/img]

в котором находятся данные:
[code]7.083,3.63E-12
14.291,3.49E-12
21.494,3.48E-12
28.709,3.59E-12
35.915,3.79E-12
43.02,3.60E-12
50.226,3.80E-12
57.431,3.84E-12
64.639,3.76E-12
71.846,3.79E-12
79.056,3.84E-12
86.262,3.71E-12
93.467,4.05E-12
100.578,3.93E-12
107.783,4.12E-12
114.992,4.01E-12
122.2,4.12E-12
129.403,4.30E-12
136.606,4.12E-12
143.812,4.28E-12
[/code]
Вот еще:
[img]https://i.sstatic.net/V0HCDMBt.png[/img]

в котором находятся данные:
[code]5.98,6.44E-14
13.183,6.54E-14
20.385,4.21E-14
27.592,5.39E-14
34.801,2.16E-14
41.907,4.79E-14
49.112,5.81E-14
56.323,6.23E-14
63.526,6.82E-14
70.73,5.83E-14
77.944,4.31E-14
85.152,7.92E-14
92.258,8.07E-14
99.464,8.27E-14
106.672,7.12E-14
113.878,9.25E-14
121.091,1.04E-13
128.299,8.98E-14
135.508,8.40E-14
142.713,8.83E-14
[/code]
Как предотвратить переобучение?
Скрипт приведен ниже:
[code]import numpy as np
from scipy.optimize import curve_fit

class NaturalLogFitter():
def __init__(self):
self.neg_lneq_time = 30
self.pos_lneq_time = 32

def scale_y(self, y: np.ndarray):
y_oom = np.floor(np.log10(np.abs(y.max())))
self.y_scf = 10 ** y_oom
y_scaled = y * 10 ** -y_oom
return y_scaled

def natural_log_func(self, t, a, b, p, q):
return a * np.log(p*(t + self.pos_lneq_time)) - b * np.log(q*(t + self.neg_lneq_time))

def set_data(self, t: np.ndarray, y: np.ndarray):
self.t = t
self.y = self.scale_y(y)

def fit(self):
return curve_fit(self.natural_log_func, self.t, self.y, maxfev=100000)

def get_intercept(self, popt):
return self.natural_log_func(0, *popt) * self.y_scf

def get_intercept_error(self, pcov):
return 1 # placeholder for now

def get_intercept_and_error(self, t: np.ndarray, y: np.ndarray):
self.set_data(t, y)
popt, pcov    = self.fit()
intercept     = self.get_intercept(popt)
intercept_err = self.get_intercept_error(pcov)

return intercept, intercept_err

def get_curve(self, t: np.ndarray, y: np.ndarray):
self.set_data(t, y)
popt, _ = self.fit()

fitted_t = np.linspace(0, t.max(), 100)
fitted_y = self.natural_log_func(fitted_t, *popt) * self.y_scf

return fitted_t, fitted_y

if __name__ == "__main__":
import matplotlib.pyplot as plt

# Generate synthetic data
np.random.seed(0)

# Initialize the fitter
fitter = NaturalLogFitter()

# Get the intercept and error
t = np.array([5.98, 13.183, 20.385, 27.592, 34.801, 41.907, 49.112, 56.323, 63.526, 70.73, 77.944, 85.152, 92.258, 99.464, 106.672, 113.878, 121.091, 128.299, 135.508, 142.713])
y = np.array([6.44E-14, 6.54E-14, 4.21E-14, 5.39E-14, 2.16E-14, 4.79E-14, 5.81E-14, 6.23E-14, 6.82E-14, 5.83E-14, 4.31E-14, 7.92E-14, 8.07E-14, 8.27E-14, 7.12E-14, 9.25E-14, 1.04E-13, 8.98E-14, 8.40E-14, 8.83E-14])
intercept, intercept_err = fitter.get_intercept_and_error(t, y)
print(f"Intercept: {intercept}, Intercept Error:  {intercept_err}")

# Get the fitted curve
fitted_t, fitted_y = fitter.get_curve(t, y)

# Plot the data and the fit
plt.scatter(t, y, label='Data')
plt.plot(fitted_t, fitted_y, label='Fit', color='red')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()
[/code]
Чтобы улучшить свою точку зрения, я отправляю следующую подгонку:
[img]https:/ /i.sstatic.net/EDEdu0IZ.png[/img]

с данными:
[code]    t = np.array([5.303, 12.508, 19.711, 26.916, 34.13, 41.335, 48.44, 55.645, 62.847, 70.051, 77.258, 84.46, 91.67, 98.877, 105.98, 113.286, 120.391, 127.498, 134.704, 141.909])
y = np.array([1459.0265113713006, 1461.3176024628895, 1470.7175396409543, 1467.8009514214775, 1466.5423320874656, 1473.6950214047804, 1467.676141431994, 1464.8844871395088, 1460.1777269623358, 1466.415876623664, 1466.2427296537421, 1467.0116731759495, 1468.5874962643757, 1461.1129950726006, 1470.1431311995577, 1469.7089696334676, 1469.1248745384707, 1469.0477801972727, 1472.1239998394688, 1465.2701939696003])
[/code]
в котором «скачок» настолько велик, что становится неоправданно высоким, так что точка пересечения уменьшающегося наклона оказывается выше, чем любые фактические данные.> 

Подробнее здесь: [url]https://stackoverflow.com/questions/79213279/preventing-overfitting-in-a-dual-natural-logarithm-model-python[/url]