Является ли атомный<int> быстрее, чем атомный<размер_t> для счетчика?

Является ли атомный быстрее, чем атомный<размер_t> для счетчика? ⇐ C++

1 сообщение • Страница 1 из 1

Anonymous

Является ли атомный быстрее, чем атомный<размер_t> для счетчика?

Сообщение Anonymous » 11 ноя 2025, 10:58

У меня есть ограниченная очередь небольшого размера, которая определенно помещается в int. Поэтому я хочу использовать атомный вместо атомного для индексации/счетчика, поскольку int меньше, он должен быть быстрее.
Однако в настоящее время мой тест показывает, что они имеют одинаковую скорость (при использовании std::memory_order_relaxed) при использовании в качестве счетчика, но я не уверен, связано ли это с плохим тестированием (также это виртуальная машина, поэтому текущий результат не самый надежный)
=== System Information ===
OS: Linux 6.8.0-1043-gcp (#46~22.04.1-Ubuntu SMP Wed Oct 22 19:00:03 UTC 2025)
CPU:
Architecture: x86_64
Logical CPUs: 8
Node name: Check
Machine: x86_64
g++ 13.3

atomic vs atomic m i c r o - b e n c h m a r k O P S _ P E R _ B E N C H = 5 0 0 0 0 0 0 0 i n t s h a r e d r e l a x e d t h r e a d s = 1 o p s = 5 0 0 0 0 0 0 0 t i m e = 0 . 3 2 7 1 s o p s / s = 1 5 2 8 7 8 2 0 3 . 3 8 n s / o p = 6 . 5 4 s i z e _ t s h a r e d r e l a x e d t h r e a d s = 1 o p s = 5 0 0 0 0 0 0 0 t i m e = 0 . 3 2 2 2 s o p s / s = 1 5 5 1 8 2 3 9 5 . 5 5 n s / o p = 6 . 4 4 i n t s h a r e d s e q _ c s t t h r e a d s = 1 o p s = 5 0 0 0 0 0 0 0 t i m e = 0 . 3 2 09s ops/s=155793958.46 ns/op=6.42
size_t shared seq_cst threads=1 ops=50000000 time=0.3210s ops/s=155787302.95 ns/op=6.42
int shared CAS threads=1 ops=50000000 time=0.5689s ops/s=87883570.52 ns/op=11.38
size_t shared CAS threads=1 ops=50000000 time=0.5593s ops/s=89389628.93 ns/op=11.19
int per-thread relaxed threads=1 ops=50000000 time=0.3255s ops/s=153615305.49 ns/op=6.51
size_t per-thread relaxed threads=1 ops=50000000 time=0.3205s ops/s=155990264.51 ns/op=6.41
------------------------------------------------------------
int shared relaxed threads=2 ops=50000000 time=1.5077s ops/s=33162809.24 ns/op=30.15
size_t shared relaxed threads=2 ops=50000000 time=1.4249s ops/s=35091189.36 ns/op=28.50
int shared seq_cst threads=2 ops=50000000 time=1.5050s ops/s=33222696.53 ns/op=30.10
size_t shared seq_cst threads=2 ops=50000000 time=1.8041s ops/s=27714312.85 ns/op=36.08
int shared CAS threads=2 ops=50000000 time=2.7661s ops/s=18075669.44 ns/op=55.32
size_t shared CAS threads=2 ops=50000000 time=3.2245s ops/s=15506267.21 ns/op=64.49
int per-thread relaxed threads=2 ops=50000000 time=1.3883s ops/s=36016218.37 ns/op=27.77
size_t per-thread relaxed threads=2 ops=50000000 time=1.3574s ops/s=36835752.33 ns/op=27.15
------------------------------------------------------------
int shared relaxed threads=3 ops=50000000 time=2.3341s ops/s=21421597.47 ns/op=46.68
size_t shared relaxed threads=3 ops=50000000 time=1.8018s ops/s=27750189.59 ns/op=36.04
int shared seq_cst threads=3 ops=50000000 time=2.2140s ops/s=22583354.75 ns/op=44.28
size_t shared seq_cst threads=3 ops=50000000 time=3.3697s ops/s=14838202.43 ns/op=67.39
int shared CAS threads=3 ops=50000000 time=4.1281s ops/s=12112147.93 ns/op=82.56
size_t shared CAS threads=3 ops=50000000 time=5.2940s ops/s=9444581.91 ns/op=105.88
int per-thread relaxed threads=3 ops=50000000 time=2.2324s ops/s=22397277.11 ns/op=44.65
size_t per-thread relaxed threads=3 ops=50000000 time=2.3911s ops/s=20911079.76 ns/op=47.82
------------------------------------------------------------
int shared relaxed threads=4 ops=50000000 time=2.5003s ops/s=19997924.19 ns/op=50.01
size_t shared relaxed threads=4 ops=50000000 time=2.2927s ops/s=21808223.78 ns/op=45.85
int shared seq_cst threads=4 ops=50000000 time=2.4988s ops/s=20009473.73 ns/op=49.98
size_t shared seq_cst threads=4 ops=50000000 time=3.5007s ops/s=14282674.37 ns/op=70.01
int shared CAS threads=4 ops=50000000 time=4.6832s ops/s=10676373.71 ns/op=93.66
size_t shared CAS threads=4 ops=50000000 time=14.6833s ops/s=3405230.01 ns/op=293.67
int per-thread relaxed threads=4 ops=50000000 time=10.0129s ops/s=4993537.05 ns/op=200.26
size_t per-thread relaxed threads=4 ops=50000000 time=10.0845s ops/s=4958110.80 ns/op=201.69
------------------------------------------------------------
int shared relaxed threads=5 ops=50000000 time=11.6976s ops/s=4274378.64 ns/op=233.95
size_t shared relaxed threads=5 ops=50000000 time=11.1007s ops/s=4504231.02 ns/op=222.01
int shared seq_cst threads=5 ops=50000000 time=11.6111s ops/s=4306222.53 ns/op=232.22
size_t shared seq_cst threads=5 ops=50000000 time=18.5938s ops/s=2689067.57 ns/op=371.88
int shared CAS threads=5 ops=50000000 time=24.1049s ops/s=2074265.29 ns/op=482.10
size_t shared CAS threads=5 ops=50000000 time=31.7937s ops/s=1572638.73 ns/op=635.87
int per-thread relaxed threads=5 ops=50000000 time=9.3022s ops/s=5375057.17 ns/op=186.04
size_t per-thread relaxed threads=5 ops=50000000 time=12.7956s ops/s=3907601.15 ns/op=255.91

Есть ли лучший способ оценить их производительность?
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

using steady_clock_t = std::chrono::steady_clock;

static constexpr uint64_t OPS_PER_BENCH = 50'000'000ULL;
static constexpr int MAX_THREADS = 8;

struct Result {
std::string name;
int threads;
double seconds;
uint64_t ops;
};

template
Result run_bench(const std::string& name, int nthreads, F&& f) {
auto start = steady_clock_t::now();
f();
auto end = steady_clock_t::now();
double secs = std::chrono::duration(end - start).count();
return Result{name, nthreads, secs, OPS_PER_BENCH};
}

void print_result(const Result& r) {
double ops_per_sec = r.ops / r.seconds;
double ns_per_op = (r.seconds * 1e9) / r.ops;
std::cout

Подробнее здесь: https://stackoverflow.com/questions/798 ... or-counter

Anonymous

1 сообщение • Страница 1 из 1

Вернуться в «C++»