Моя нейронная сеть для распознавания цифр MNIST учится одну эпоху, а затем прекращает обучение

Моя нейронная сеть для распознавания цифр MNIST учится одну эпоху, а затем прекращает обучение ⇐ Python

1 сообщение • Страница 1 из 1

Anonymous

Моя нейронная сеть для распознавания цифр MNIST учится одну эпоху, а затем прекращает обучение

Цитата

Сообщение Anonymous » 25 янв 2026, 19:19

Я программирую нейронную сеть распознавания цифр MNIST. Я думал, что закончил, но когда я запускаю программу для обучения MNIST, точность после каждой эпохи остается стабильной. Я использую MSE в качестве функции стоимости и tanh(x) в качестве функции активации. Скорость обучения в настоящее время установлена на 0,1.
Вот точность для первых двух эпох, первая из которых представляет собой предварительное обучение:
8,35
9,8
9,8
9,8
9,8
9,8
Мои функции следующие:
tanh(z): принимает вектор в качестве входных данных и выводит вектор активации.
tanhDerivative(z): принимает вектор в качестве входных данных и вычисляет вектор градиента.
feedforward(input, stop): вычисляет выходные данные. Стоп означает, что он может остановиться перед последним слоем для расчета активаций любого уровня.
feedforward2(input, stop): вычисляет выходные данные функции, но выведенные результаты не имеют никакой функции активации.
MSE(input, DesertOutput): вычисляет MSE.
transformer(label): принимает метку, например 0, и выводит. [[1],[-1],[-1],[-1],[-1],[-1],[-1],[-1],[-1],[-1]]
Я подозреваю, что ошибка кроется в следующих двух функциях:

Код: Выделить всё

#The actual training. Backpropagation is going to be another algorithm
def train(self, dataLocation, learnRate, batchSize=100):
self.bias_updates = self.bias_templates #This will contain the updates required for the biases
self.weight_updates = self.weight_templates #This will contain the updates required for the weights
file = np.loadtxt(dataLocation, delimiter=",", dtype="float128")
count = 0
print("Starting training")
for row in file:
count += 1
data = []
for item in row:
data.append([item/255]) #This takes an array like [1,2,3,4] and makes it [[1],[2],[3],[4]] which is necessary for this program
desired = self.Transformer(data.pop(0)) #Look at transformer method to understand
self.backpropagation(data, desired)
if count % 100 == 0:
for n in range(0,len(self.bias_updates)):
self.bias_updates[n] *= learnRate
self.weight_updates[n] *= learnRate
self.biases[n] -= self.bias_updates[n]
self.weights[n] -= self.weight_updates[n]
self.bias_updates = self.bias_templates
self.weight_updates = self.weight_templates
print(count//100)

#The heart of the training algorithm. BACKPROPAGATION
#NOTE: Learning rate is only applied in train()
def backpropagation(self, input, desiredOutput):
SigmoidLastLayerActivations = np.array(self.feedforward(input, self.size-1))
LastLayerActivation = np.array(self.feedforward2(input, self.size-1))
δ = 2 * (SigmoidLastLayerActivations-desiredOutput) * self.tanhDerivative(LastLayerActivation) #This first value of the delta is just the standard ∂C/∂z(L)
self.bias_updates[-1] += δ #The update for the biases in the last layer
self.weight_updates[-1] += np.matmul(δ, np.transpose(self.feedforward(input, self.size-2))) #The update for the weights in the last layer

for i in range(len(self.weight_templates)-2 ,-1 , -1):
requiredWeights = np.transpose(np.array(self.weights[i+1])) #This is the required weight matrix from the formulas
LayerActivations = np.array(self.feedforward2(input, i+1)) #This is the z thing from the formulas
SigmoidLayerActivations = np.transpose(np.array(self.feedforward(input,i))) #Look at formula

δ = np.matmul(requiredWeights, δ) * self.tanhDerivative(LayerActivations)
self.bias_updates[i] += δ
otherVariable = np.matmul(δ, SigmoidLayerActivations) #This is the variable that contains the update required for the weight updates instead of the
self.weight_updates[i] += otherVariable

Прошу прощения, если это выглядит запутанным. Любая помощь относительно того, почему точность всегда сходится к 9,8, была бы очень полезна. Если для обнаружения ошибки необходима какая-либо другая функция, спросите.
Причина, по которой некоторые переменные содержат сигмоид, поскольку я изначально использовал функцию активации сигмоида, но изменил, чтобы посмотреть, имеет ли это значение.

Подробнее здесь: https://stackoverflow.com/questions/798 ... -then-stop

1769357982

Anonymous

Я программирую нейронную сеть распознавания цифр MNIST. Я думал, что закончил, но когда я запускаю программу для обучения MNIST, точность после каждой эпохи остается стабильной. Я использую MSE в качестве функции стоимости и tanh(x) в качестве функции активации. Скорость обучения в настоящее время установлена на 0,1.
Вот точность для первых двух эпох, первая из которых представляет собой предварительное обучение:
8,35
9,8
9,8
9,8
9,8
9,8
Мои функции следующие:
tanh(z): принимает вектор в качестве входных данных и выводит вектор активации.
tanhDerivative(z): принимает вектор в качестве входных данных и вычисляет вектор градиента.
feedforward(input, stop): вычисляет выходные данные. Стоп означает, что он может остановиться перед последним слоем для расчета активаций любого уровня.
feedforward2(input, stop): вычисляет выходные данные функции, но выведенные результаты не имеют никакой функции активации.
MSE(input, DesertOutput): вычисляет MSE.
transformer(label): принимает метку, например 0, и выводит. [[1],[-1],[-1],[-1],[-1],[-1],[-1],[-1],[-1],[-1]]
Я подозреваю, что ошибка кроется в следующих двух функциях:
[code]#The actual training. Backpropagation is going to be another algorithm
def train(self, dataLocation, learnRate, batchSize=100):
self.bias_updates = self.bias_templates #This will contain the updates required for the biases
self.weight_updates = self.weight_templates #This will contain the updates required for the weights
file = np.loadtxt(dataLocation, delimiter=",", dtype="float128")
count = 0
print("Starting training")
for row in file:
count += 1
data = []
for item in row:
data.append([item/255]) #This takes an array like [1,2,3,4] and makes it [[1],[2],[3],[4]] which is necessary for this program
desired = self.Transformer(data.pop(0)) #Look at transformer method to understand
self.backpropagation(data, desired)
if count % 100 == 0:
for n in range(0,len(self.bias_updates)):
self.bias_updates[n] *= learnRate
self.weight_updates[n] *= learnRate
self.biases[n] -= self.bias_updates[n]
self.weights[n] -= self.weight_updates[n]
self.bias_updates = self.bias_templates
self.weight_updates = self.weight_templates
print(count//100)

#The heart of the training algorithm. BACKPROPAGATION
#NOTE: Learning rate is only applied in train()
def backpropagation(self, input, desiredOutput):
SigmoidLastLayerActivations = np.array(self.feedforward(input, self.size-1))
LastLayerActivation = np.array(self.feedforward2(input, self.size-1))
δ = 2 * (SigmoidLastLayerActivations-desiredOutput) * self.tanhDerivative(LastLayerActivation) #This first value of the delta is just the standard ∂C/∂z(L)
self.bias_updates[-1] += δ #The update for the biases in the last layer
self.weight_updates[-1] += np.matmul(δ, np.transpose(self.feedforward(input, self.size-2))) #The update for the weights in the last layer

for i in range(len(self.weight_templates)-2 ,-1 , -1):
requiredWeights = np.transpose(np.array(self.weights[i+1])) #This is the required weight matrix from the formulas
LayerActivations = np.array(self.feedforward2(input, i+1)) #This is the z thing from the formulas
SigmoidLayerActivations = np.transpose(np.array(self.feedforward(input,i))) #Look at formula

δ = np.matmul(requiredWeights, δ) * self.tanhDerivative(LayerActivations)
self.bias_updates[i] += δ
otherVariable = np.matmul(δ, SigmoidLayerActivations) #This is the variable that contains the update required for the weight updates instead of the
self.weight_updates[i] += otherVariable
[/code]
Прошу прощения, если это выглядит запутанным. Любая помощь относительно того, почему точность всегда сходится к 9,8, была бы очень полезна. Если для обнаружения ошибки необходима какая-либо другая функция, спросите.
Причина, по которой некоторые переменные содержат сигмоид, поскольку я изначально использовал функцию активации сигмоида, но изменил, чтобы посмотреть, имеет ли это значение. 

Подробнее здесь: [url]https://stackoverflow.com/questions/79875657/my-neural-network-for-mnist-digit-recognition-learns-for-one-epoch-and-then-stop[/url]