I have a templated struct and I want to speed compilation times by separating declaration and implementation, using explicit template declaration (I'm using Kokkos 4.1, with Cuda backend, GCC 11.3 and Cuda 11.8).
I have a struct that looks like this:
Код: Выделить всё
template struct Orbit { MetricsAndFields fields; Parameters params; Orbit() = default; Orbit(MetricsAndFields fields, Parameters params) : fields(fields), params(params) {} __host__ __device__ T eom_denominator(const Particle& p); };
Код: Выделить всё
template struct Orbit; // #ifdef ENABLE_GPU template struct Orbit;
Код: Выделить всё
template __host__ __device__ T Orbit::eom_denominator(const Particle& p) { // Implementation }
Код: Выделить всё
ptxas fatal : Unresolved extern function '_ZN5OrbitIL9OrbitType1ELb0E8MyReaderIdN6Kokkos9CudaSpaceEEdS3_E15eom_denominatorERK8ParticleIdE'
When I drop the device qualifier of my function, then it compiles just fine (by warning that I'm not allowed to call host function from device code). So, the problem comes from
Код: Выделить всё
__device__
Код: Выделить всё
__noinline__
Is there any way around that? Or am I forced to have slow compilation times?
Источник: https://stackoverflow.com/questions/781 ... eclaration