Anonymous
Почему моя 8-битная квантованная модель медленнее, чем 16-битная?
Сообщение
Anonymous » 17 окт 2024, 01:14
Я квантовал свою модель тензорного потока нейронной сети как с 8-битной, так и с 16-битной точностью, чтобы повысить производительность, ожидая, что 8-битная версия будет быстрее из-за меньших требований к памяти и вычислениям. Однако я заметил, что 8-битная квантованная модель на самом деле медленнее, чем 16-битная модель во время вывода.
Вот подробности для обеих моделей:
8-бит:
Код: Выделить всё
Tensor serving_default_input_2:0 - dtype: \, shape: \[1 9\]
Tensor FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd - dtype: \, shape: \[4\]
Tensor FeedforwardNN/batch_normalization_2/batchnorm/mul_1;FeedforwardNN/batch_normalization_2/batchnorm/add_1;FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd - dtype: \, shape: \[ 4 20\]
Tensor FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd - dtype: \, shape: \[20\]
Tensor FeedforwardNN/batch_normalization_1/batchnorm/mul_1;FeedforwardNN/batch_normalization_1/batchnorm/add_1;FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd - dtype: \, shape: \[20 32\]
Tensor FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd - dtype: \, shape: \[32\]
Tensor FeedforwardNN/batch_normalization/batchnorm/mul_1;FeedforwardNN/batch_normalization/batchnorm/add_1;FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd - dtype: \, shape: \[32 32\]
Tensor FeedforwardNN/dense/BiasAdd/ReadVariableOp - dtype: \, shape: \[32\]
Tensor FeedforwardNN/dense/MatMul - dtype: \, shape: \[32 9\]
Tensor tfl.quantize - dtype: \, shape: \[1 9\]
Tensor FeedforwardNN/dense/MatMul;FeedforwardNN/dense/BiasAdd - dtype: \, shape: \[ 1 32\]
Tensor FeedforwardNN/dense/leaky_re_lu/LeakyRelu - dtype: \, shape: \[ 1 32\]
Tensor FeedforwardNN/batch_normalization/batchnorm/mul_1;FeedforwardNN/batch_normalization/batchnorm/add_1;FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd1 - dtype: \, shape: \[ 1 32\]
Tensor FeedforwardNN/dense_1/leaky_re_lu_1/LeakyRelu - dtype: \, shape: \[ 1 32\]
Tensor FeedforwardNN/batch_normalization_1/batchnorm/mul_1;FeedforwardNN/batch_normalization_1/batchnorm/add_1;FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd1 - dtype: \, shape: \[ 1 20\]
Tensor FeedforwardNN/dense_2/leaky_re_lu_2/LeakyRelu - dtype: \, shape: \[ 1 20\]
Tensor FeedforwardNN/batch_normalization_2/batchnorm/mul_1;FeedforwardNN/batch_normalization_2/batchnorm/add_1;FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd1 - dtype: \, shape: \[1 4\]
Tensor StatefulPartitionedCall:01 - dtype: \, shape: \[1 4\]
Tensor StatefulPartitionedCall:0 - dtype: \, shape: \[1 4\]
16-бит:
Код: Выделить всё
Tensor serving_default_input_2:0 - dtype: \, shape: \[1 9\]
Tensor FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd - dtype: \, shape: \[32\]
Tensor FeedforwardNN/batch_normalization/batchnorm/mul_1;FeedforwardNN/batch_normalization/batchnorm/add_1;FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd - dtype: \, shape: \[32 32\]
Tensor FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd - dtype: \, shape: \[20\]
Tensor FeedforwardNN/batch_normalization_1/batchnorm/mul_1;FeedforwardNN/batch_normalization_1/batchnorm/add_1;FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd - dtype: \, shape: \[20 32\]
Tensor FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd - dtype: \, shape: \[4\]
Tensor FeedforwardNN/batch_normalization_2/batchnorm/mul_1;FeedforwardNN/batch_normalization_2/batchnorm/add_1;FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd - dtype: \, shape: \[ 4 20\]
Tensor FeedforwardNN/dense/BiasAdd/ReadVariableOp - dtype: \, shape: \[32\]
Tensor FeedforwardNN/dense/MatMul - dtype: \, shape: \[32 9\]
Tensor FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd1 - dtype: \, shape: \[32\]
Tensor FeedforwardNN/batch_normalization/batchnorm/mul_1;FeedforwardNN/batch_normalization/batchnorm/add_1;FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd1 - dtype: \, shape: \[32 32\]
Tensor FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd1 - dtype: \, shape: \[20\]
Tensor FeedforwardNN/batch_normalization_1/batchnorm/mul_1;FeedforwardNN/batch_normalization_1/batchnorm/add_1;FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd1 - dtype: \, shape: \[20 32\]
Tensor FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd1 - dtype: \, shape: \[4\]
Tensor FeedforwardNN/batch_normalization_2/batchnorm/mul_1;FeedforwardNN/batch_normalization_2/batchnorm/add_1;FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd1 - dtype: \, shape: \[ 4 20\]
Tensor FeedforwardNN/dense/BiasAdd/ReadVariableOp1 - dtype: \, shape: \[32\]
Tensor FeedforwardNN/dense/MatMul1 - dtype: \, shape: \[32 9\]
Tensor FeedforwardNN/dense/MatMul;FeedforwardNN/dense/BiasAdd - dtype: \, shape: \[ 1 32\]
Tensor FeedforwardNN/dense/leaky_re_lu/LeakyRelu - dtype: \, shape: \[ 1 32\]
Tensor FeedforwardNN/batch_normalization/batchnorm/mul_1;FeedforwardNN/batch_normalization/batchnorm/add_1;FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd2 - dtype: \, shape: \[ 1 32\]
Tensor FeedforwardNN/dense_1/leaky_re_lu_1/LeakyRelu - dtype: \, shape: \[ 1 32\]
Tensor FeedforwardNN/batch_normalization_1/batchnorm/mul_1;FeedforwardNN/batch_normalization_1/batchnorm/add_1;FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd2 - dtype: \, shape: \[ 1 20\]
Tensor FeedforwardNN/dense_2/leaky_re_lu_2/LeakyRelu - dtype: \, shape: \[ 1 20\]
Tensor FeedforwardNN/batch_normalization_2/batchnorm/mul_1;FeedforwardNN/batch_normalization_2/batchnorm/add_1;FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd2 - dtype: \, shape: \[1 4\]
Tensor StatefulPartitionedCall:0 - dtype: \, shape: \[1 4\]
Это мой тест:
Этот фрагмент кода занимает 3,8 секунды:
Код: Выделить всё
interpreter = tf.lite.Interpreter(model_path="8bit_model.tflite")
interpreter.allocate_tensors()
def evaluate_quantized_model8bit(X_test):
predicted_labels = []
for i in range(len(X_test)):
input_data = X_test[i].reshape(1, -1).astype(np.uint8)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
predicted_label = np.argmax(output_data)
predicted_labels.append(predicted_label)
return predicted_labels
predicted_labelsQ = evaluate_quantized_model(test_data)
А вот этот занимает 1,1 с:
Код: Выделить всё
interpreter = tf.lite.Interpreter(model_path="16bit_model.tflite")
interpreter.allocate_tensors()
def evaluate_quantized_model16bit(X_test):
predicted_labels = []
for i in range(len(X_test)):
input_data = X_test[i].reshape(1, -1).astype(np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
predicted_label = np.argmax(output_data)
predicted_labels.append(predicted_label)
return predicted_labels
predicted_labelsQ = evaluate_quantized_model(test_data)
Почему это происходит?
Подробнее здесь:
https://stackoverflow.com/questions/790 ... -bit-model
1729116863
Anonymous
Я квантовал свою модель тензорного потока нейронной сети как с 8-битной, так и с 16-битной точностью, чтобы повысить производительность, ожидая, что 8-битная версия будет быстрее из-за меньших требований к памяти и вычислениям. Однако я заметил, что 8-битная квантованная модель на самом деле медленнее, чем 16-битная модель во время вывода. Вот подробности для обеих моделей: [b]8-бит:[/b] [code] Tensor serving_default_input_2:0 - dtype: \, shape: \[1 9\] Tensor FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd - dtype: \, shape: \[4\] Tensor FeedforwardNN/batch_normalization_2/batchnorm/mul_1;FeedforwardNN/batch_normalization_2/batchnorm/add_1;FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd - dtype: \, shape: \[ 4 20\] Tensor FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd - dtype: \, shape: \[20\] Tensor FeedforwardNN/batch_normalization_1/batchnorm/mul_1;FeedforwardNN/batch_normalization_1/batchnorm/add_1;FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd - dtype: \, shape: \[20 32\] Tensor FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd - dtype: \, shape: \[32\] Tensor FeedforwardNN/batch_normalization/batchnorm/mul_1;FeedforwardNN/batch_normalization/batchnorm/add_1;FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd - dtype: \, shape: \[32 32\] Tensor FeedforwardNN/dense/BiasAdd/ReadVariableOp - dtype: \, shape: \[32\] Tensor FeedforwardNN/dense/MatMul - dtype: \, shape: \[32 9\] Tensor tfl.quantize - dtype: \, shape: \[1 9\] Tensor FeedforwardNN/dense/MatMul;FeedforwardNN/dense/BiasAdd - dtype: \, shape: \[ 1 32\] Tensor FeedforwardNN/dense/leaky_re_lu/LeakyRelu - dtype: \, shape: \[ 1 32\] Tensor FeedforwardNN/batch_normalization/batchnorm/mul_1;FeedforwardNN/batch_normalization/batchnorm/add_1;FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd1 - dtype: \, shape: \[ 1 32\] Tensor FeedforwardNN/dense_1/leaky_re_lu_1/LeakyRelu - dtype: \, shape: \[ 1 32\] Tensor FeedforwardNN/batch_normalization_1/batchnorm/mul_1;FeedforwardNN/batch_normalization_1/batchnorm/add_1;FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd1 - dtype: \, shape: \[ 1 20\] Tensor FeedforwardNN/dense_2/leaky_re_lu_2/LeakyRelu - dtype: \, shape: \[ 1 20\] Tensor FeedforwardNN/batch_normalization_2/batchnorm/mul_1;FeedforwardNN/batch_normalization_2/batchnorm/add_1;FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd1 - dtype: \, shape: \[1 4\] Tensor StatefulPartitionedCall:01 - dtype: \, shape: \[1 4\] Tensor StatefulPartitionedCall:0 - dtype: \, shape: \[1 4\] [/code] [b]16-бит:[/b] [code]Tensor serving_default_input_2:0 - dtype: \, shape: \[1 9\] Tensor FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd - dtype: \, shape: \[32\] Tensor FeedforwardNN/batch_normalization/batchnorm/mul_1;FeedforwardNN/batch_normalization/batchnorm/add_1;FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd - dtype: \, shape: \[32 32\] Tensor FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd - dtype: \, shape: \[20\] Tensor FeedforwardNN/batch_normalization_1/batchnorm/mul_1;FeedforwardNN/batch_normalization_1/batchnorm/add_1;FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd - dtype: \, shape: \[20 32\] Tensor FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd - dtype: \, shape: \[4\] Tensor FeedforwardNN/batch_normalization_2/batchnorm/mul_1;FeedforwardNN/batch_normalization_2/batchnorm/add_1;FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd - dtype: \, shape: \[ 4 20\] Tensor FeedforwardNN/dense/BiasAdd/ReadVariableOp - dtype: \, shape: \[32\] Tensor FeedforwardNN/dense/MatMul - dtype: \, shape: \[32 9\] Tensor FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd1 - dtype: \, shape: \[32\] Tensor FeedforwardNN/batch_normalization/batchnorm/mul_1;FeedforwardNN/batch_normalization/batchnorm/add_1;FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd1 - dtype: \, shape: \[32 32\] Tensor FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd1 - dtype: \, shape: \[20\] Tensor FeedforwardNN/batch_normalization_1/batchnorm/mul_1;FeedforwardNN/batch_normalization_1/batchnorm/add_1;FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd1 - dtype: \, shape: \[20 32\] Tensor FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd1 - dtype: \, shape: \[4\] Tensor FeedforwardNN/batch_normalization_2/batchnorm/mul_1;FeedforwardNN/batch_normalization_2/batchnorm/add_1;FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd1 - dtype: \, shape: \[ 4 20\] Tensor FeedforwardNN/dense/BiasAdd/ReadVariableOp1 - dtype: \, shape: \[32\] Tensor FeedforwardNN/dense/MatMul1 - dtype: \, shape: \[32 9\] Tensor FeedforwardNN/dense/MatMul;FeedforwardNN/dense/BiasAdd - dtype: \, shape: \[ 1 32\] Tensor FeedforwardNN/dense/leaky_re_lu/LeakyRelu - dtype: \, shape: \[ 1 32\] Tensor FeedforwardNN/batch_normalization/batchnorm/mul_1;FeedforwardNN/batch_normalization/batchnorm/add_1;FeedforwardNN/dense_1/MatMul;FeedforwardNN/dense_1/BiasAdd2 - dtype: \, shape: \[ 1 32\] Tensor FeedforwardNN/dense_1/leaky_re_lu_1/LeakyRelu - dtype: \, shape: \[ 1 32\] Tensor FeedforwardNN/batch_normalization_1/batchnorm/mul_1;FeedforwardNN/batch_normalization_1/batchnorm/add_1;FeedforwardNN/dense_2/MatMul;FeedforwardNN/dense_2/BiasAdd2 - dtype: \, shape: \[ 1 20\] Tensor FeedforwardNN/dense_2/leaky_re_lu_2/LeakyRelu - dtype: \, shape: \[ 1 20\] Tensor FeedforwardNN/batch_normalization_2/batchnorm/mul_1;FeedforwardNN/batch_normalization_2/batchnorm/add_1;FeedforwardNN/dense_3/MatMul;FeedforwardNN/dense_3/BiasAdd2 - dtype: \, shape: \[1 4\] Tensor StatefulPartitionedCall:0 - dtype: \, shape: \[1 4\] [/code] [b]Это мой тест:[/b] Этот фрагмент кода занимает 3,8 секунды: [code]interpreter = tf.lite.Interpreter(model_path="8bit_model.tflite") interpreter.allocate_tensors() def evaluate_quantized_model8bit(X_test): predicted_labels = [] for i in range(len(X_test)): input_data = X_test[i].reshape(1, -1).astype(np.uint8) interpreter.set_tensor(input_details[0]['index'], input_data) interpreter.invoke() output_data = interpreter.get_tensor(output_details[0]['index']) predicted_label = np.argmax(output_data) predicted_labels.append(predicted_label) return predicted_labels predicted_labelsQ = evaluate_quantized_model(test_data) [/code] А вот этот занимает 1,1 с: [code]interpreter = tf.lite.Interpreter(model_path="16bit_model.tflite") interpreter.allocate_tensors() def evaluate_quantized_model16bit(X_test): predicted_labels = [] for i in range(len(X_test)): input_data = X_test[i].reshape(1, -1).astype(np.float32) interpreter.set_tensor(input_details[0]['index'], input_data) interpreter.invoke() output_data = interpreter.get_tensor(output_details[0]['index']) predicted_label = np.argmax(output_data) predicted_labels.append(predicted_label) return predicted_labels predicted_labelsQ = evaluate_quantized_model(test_data) [/code] Почему это происходит? Подробнее здесь: [url]https://stackoverflow.com/questions/79095486/why-is-my-8-bit-quantized-model-slower-than-my-16-bit-model[/url]