You can see my code above, please help me, how can I parallelize the Phi and theta loops, because they eat a lot of space, and they must be inside the i,j loops, the nu loop cannot be removed, it must always be present. If you have any ideas how to speed up this code by a factor of 2 at least, I would be glad to hear your answers. I haven't tried parallelizing on CUDA yet, because I'm trying to solve the problem without it, in the extreme case I'll probably do it with its application.
[code]for(int nu = 0; nu < 20; nu++) { for (int i = 0; i < n; i++) { for (int j = 0; j < m; j++) { double matrixPtr = matrix[i][j].first; double matrixPtrSecond = matrix[i][j].second; double localReS1 = 0.0; double localImS1 = 0.0; for (double theta = 0; theta < 2*M_PI; theta += theta_plus) { for (double phi = 0; phi < M_PI; phi += theta_plus) { double angle = matrixPtr * cosPhi * sintheta + matrixPtrSecond * sinPhi * sintheta; localReS1 += cos(angle); localImS1 += sin(angle); } } #pragma omp critical { ReS += localReS1; ImS += localImS1; } } } } [/code] You can see my code above, please help me, how can I parallelize the Phi and theta loops, because they eat a lot of space, and they must be inside the i,j loops, the nu loop cannot be removed, it must always be present. If you have any ideas how to speed up this code by a factor of 2 at least, I would be glad to hear your answers. I haven't tried parallelizing on CUDA yet, because I'm trying to solve the problem without it, in the extreme case I'll probably do it with its application.
У меня уже открыта параллельная область OMP, и, в зависимости от некоторых прошлых измерений, я хочу решить, выполняю ли я функцию параллельно или последовательно.
Сама функция содержит omp для циклов и omp Single, что создает барьеры.
Существует...
I am trying to introduce openMP to a code to speed up the loop. The loop processes some json inputs and generates json outputs. The json output contains almost exclusively doubles or vector or doubles. and We have some established expected results...