Заранее извиняюсь, если мое следующее описание недостаточно полно.
Предположим, я обучил простую модель, подобную Resnet, со следующей структурой основных блоков:
Код: Выделить всё
self.layer1 = self._make_layer(self.inplanes, outplanes[0], num_blocks = 3, stride=1)
self.layer2 = self._make_layer(self.inplanes, outplanes[1], num_blocks = 3, stride=2)
self.layer3 = self._make_layer(self.inplanes, outplanes[2], num_blocks = 3, stride=2)
self.layer4 = self._make_layer(inplanes = outplanes[2], planes=outplanes[3], num_blocks = 3, stride=2)
Код: Выделить всё
self.layer1 = self._make_layer(self.inplanes, outplanes[0], num_blocks = 1, stride=1)
self.layer2 = self._make_layer(self.inplanes, outplanes[1], num_blocks = 1, stride=2)
self.layer3 = self._make_layer(self.inplanes, outplanes[2], num_blocks = 1, stride=2)
self.layer4 = self._make_layer(inplanes = outplanes[2], planes=outplanes[3], num_blocks = 1, stride=2)
Is there any way to implement this or it's simply not mathematically or theoretically correct? I think it's somewhat similar to knowledge distillation - conceptually?
In fact, I was thinking whether I could also modify the kernel size similarly.
Источник: https://stackoverflow.com/questions/781 ... ine-tuning