Я создал venv и установил тензорный поток:
Код: Выделить всё
pip install tensorflow[and-cuda]==2.17Код: Выделить всё
$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2025-01-17 14:38:10.291333: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-17 14:38:10.311168: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-17 14:38:10.317884: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-17 14:38:10.332355: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-17 14:38:11.343134: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1737121093.017637 32829 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1737121093.072040 32829 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1737121093.072721 32829 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-01-17 14:38:13.072881: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2432] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 5.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Код: Выделить всё
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
Cell In[15], line 1
----> 1 model_test = EfficientCapsNet(model_name, mode='test', verbose=True, custom_path=custom_path)
3 model_test.load_graph_weights() # load graph weights (bin folder)
File ~/Documents/Thesis/Efficient-CapsNet-Thesis/models/model.py:147, in EfficientCapsNet.__init__(self, model_name, mode, config_path, custom_path, verbose)
145 self.model_path_new_train = os.path.join(self.config['saved_model_dir'], f"efficient_capsnet{self.model_name}_new_train.h5")
146 self.tb_path = os.path.join(self.config['tb_log_save_dir'], f"efficient_capsnet_{self.model_name}")
--> 147 self.load_graph()
File ~/Documents/Thesis/Efficient-CapsNet-Thesis/models/model.py:152, in EfficientCapsNet.load_graph(self)
150 def load_graph(self):
151 if self.model_name == 'MNIST':
--> 152 self.model = efficient_capsnet_graph_mnist.build_graph(self.config['MNIST_INPUT_SHAPE'], self.mode, self.verbose)
153 elif self.model_name == 'SMALLNORB':
154 self.model = efficient_capsnet_graph_smallnorb.build_graph(self.config['SMALLNORB_INPUT_SHAPE'], self.mode, self.verbose)
File ~/Documents/Thesis/Efficient-CapsNet-Thesis/models/efficient_capsnet_graph_mnist.py:84, in build_graph(input_shape, mode, verbose)
81 y_true = tf.keras.layers.Input(shape=(10,))
82 noise = tf.keras.layers.Input(shape=(10, 16))
---> 84 efficient_capsnet = efficient_capsnet_graph(input_shape)
86 if verbose:
87 efficient_capsnet.summary()
File ~/Documents/Thesis/Efficient-CapsNet-Thesis/models/efficient_capsnet_graph_mnist.py:32, in efficient_capsnet_graph(input_shape)
22 """
23 Efficient-CapsNet graph architecture.
24
(...)
28 network input shape
29 """
30 inputs = tf.keras.Input(input_shape)
---> 32 x = tf.keras.layers.Conv2D(32,5,activation="relu", padding='valid', kernel_initializer='he_normal')(inputs)
33 x = tf.keras.layers.BatchNormalization()(x)
34 x = tf.keras.layers.Conv2D(64,3, activation='relu', padding='valid', kernel_initializer='he_normal')(x)
File ~/Documents/Thesis/thesis_venv/lib/python3.12/site-packages/keras/src/utils/traceback_utils.py:122, in filter_traceback..error_handler(*args, **kwargs)
119 filtered_tb = _process_traceback_frames(e.__traceback__)
120 # To get the full stack trace, call:
121 # `keras.config.disable_traceback_filtering()`
--> 122 raise e.with_traceback(filtered_tb) from None
123 finally:
124 del filtered_tb
File ~/Documents/Thesis/thesis_venv/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py:136, in convert_to_tensor(x, dtype, sparse)
131 if dtype == "bool" or is_int_dtype(dtype):
132 # TensorFlow conversion is stricter than other backends, it does not
133 # allow ints for bools or floats for ints. We convert without dtype
134 # and cast instead.
135 x = tf.convert_to_tensor(x)
--> 136 return tf.cast(x, dtype)
137 return tf.convert_to_tensor(x, dtype=dtype)
138 elif dtype is not None and not x.dtype == dtype:
InternalError: {{function_node __wrapped__Cast_device_/job:localhost/replica:0/task:0/device:GPU:0}} 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE' [Op:Cast] name:
Код: Выделить всё
pip install tensorflow[and-cuda]==2.17Когда я выполнил nvcc --version, я не получил результаты, было предложено установить набор инструментов cuda (
Код: Выделить всё
apt instal nvidia-cuda-toolkit). Я сделал это. Теперь я получаю следующий результат:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
На всякий случай, это вывод nvidia-smi:
Код: Выделить всё
$ nvidia-smi
Fri Jan 17 14:54:11 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro M1000M Off | 00000000:01:00.0 Off | N/A |
| N/A 38C P8 N/A / 200W | 7MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2270 G /usr/lib/xorg/Xorg 2MiB |
+-----------------------------------------------------------------------------------------+
Я проверил другой вопрос, но не смог найти тот, который использовал Python, и установил его так же, как и я, и получил ту же ошибку.
Я совершенно растерян и буду признателен за любую помощь!
Обновление 1: установлен CUDA 12.3
У меня есть последовал этот вопрос/ответ: https://askubuntu.com/a/1288405/2105112 для установки CUDA 12.3. Я считаю, что это сработало:
Код: Выделить всё
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0
Подробнее здесь: https://stackoverflow.com/questions/793 ... cuda-error
Мобильная версия