Это ошибка, связанная с KGDB? ⇐ Linux
-
Anonymous
Это ошибка, связанная с KGDB?
When a specific key is detected by the serial port, it will trigger kgdb_breakpoint, and the master CPU0 will enter the kdb_main_loop to process user commands in a loop.
kgdb_breakpoint ->int3 ->do_int3 ->notify_die ->atomic_notifier_call_chain ->__atomic_notifier_call_chain -> notifier_call_chain ->kgdb_notify ->__kgdb_notify ->kgdb_handle_exception -> kgdb_cpu_enter (kgdb_roundup_cpus send IPI to other slave CPU) ->kdb_stub ->kdb_main_loop slave CPU1, CPU2, CPU3 ... and other CPUs: Using CPU1 as an example: Currently holding the running queue lock of the master CPU0 due to load_balance or other reasons, responding to the NMI sent by master CPU0 through kgdb_roundup_cpus. Enter the following stack:
nmi_handle ->kgdb_nmicallback ->kgdb_cpu_enter (The slave CPU1 will loop touch watchdog and wait for the master CPU0 to exit.) The above is the state before executing the KDB command "go".
When the user executes the KDB command "go", it will trigger a deadlock.
master CPU0 : ->kgdb_cpu_enter ->kgdboc_post_exp_handler ->queue_work_on ->__queue_work ->insert_work ->wake_up_process ->try_to_wake_up ->_raw_spin_lock (Acquire the spin lock of master CPU0 rq->lock, but at this time the spin lock of master CPU0 is held by CPU1) As a result, a deadlock has occurred.
Therefore, when the master CPU0 exits, if the rq->lock of CPU0 is locked, it should not wake up the worker on the system_wq.
example: BUG: spinlock lockup suspected on CPU#0, namex/10450 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20 Call Trace: [] ? __schedule+0x16d/0xac0 [] ? schedule+0x3c/0x90 [] ? schedule_hrtimeout_range_clock+0x10a/0x120 [] ? mutex_unlock+0xe/0x10 [] ? ep_scan_ready_list+0x1db/0x1e0 [] ? schedule_hrtimeout_range+0x13/0x20 [] ? ep_poll+0x27a/0x3b0 [] ? wake_up_q+0x70/0x70 [] ? SyS_epoll_wait+0xb8/0xd0 [] ? entry_SYSCALL_64_fastpath+0x12/0x75 CPU: 0 PID: 10450 Comm: namex Tainted: G O 4.4.65 #1 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019 0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000 ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980 000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0 Call Trace: [] dump_stack+0x85/0xc2 [] spin_dump+0x7f/0x100 [] do_raw_spin_lock+0xa0/0x150 [] _raw_spin_lock+0x15/0x20 [] try_to_wake_up+0x176/0x3d0 [] wake_up_process+0x15/0x20 [] insert_work+0x81/0xc0 [] __queue_work+0x135/0x390 [] queue_work_on+0x46/0x90 [] kgdboc_post_exp_handler+0x48/0x70 [] kgdb_cpu_enter+0x598/0x610 [] kgdb_handle_exception+0xf2/0x1f0 [] __kgdb_notify+0x71/0xd0 [] kgdb_notify+0x35/0x70 [] notifier_call_chain+0x4a/0x70 [] notify_die+0x3d/0x50 [] do_int3+0x89/0x120 [] int3+0x44/0x80 The following modification can solve the problem, but I want to know if this is a bug and how to fix it? diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c index 7ce7bb164..945318ef1 100644 --- a/drivers/tty/serial/kgdboc.c +++ b/drivers/tty/serial/kgdboc.c @@ -22,6 +22,9 @@ #include #include #include +#include + +#include "../kernel/sched/sched.h" #define MAX_CONFIG_LEN 40 @@ -399,7 +402,8 @@ static void kgdboc_post_exp_handler(void) dbg_restore_graphics = 0; con_debug_leave(); } - kgdboc_restore_input(); + if (!raw_spin_is_locked(&(cpu_rq(smp_processor_id())->lock))) + kgdboc_restore_input(); } static struct kgdb_io kgdboc_io_ops = { --
Источник: https://stackoverflow.com/questions/780 ... elated-bug
When a specific key is detected by the serial port, it will trigger kgdb_breakpoint, and the master CPU0 will enter the kdb_main_loop to process user commands in a loop.
kgdb_breakpoint ->int3 ->do_int3 ->notify_die ->atomic_notifier_call_chain ->__atomic_notifier_call_chain -> notifier_call_chain ->kgdb_notify ->__kgdb_notify ->kgdb_handle_exception -> kgdb_cpu_enter (kgdb_roundup_cpus send IPI to other slave CPU) ->kdb_stub ->kdb_main_loop slave CPU1, CPU2, CPU3 ... and other CPUs: Using CPU1 as an example: Currently holding the running queue lock of the master CPU0 due to load_balance or other reasons, responding to the NMI sent by master CPU0 through kgdb_roundup_cpus. Enter the following stack:
nmi_handle ->kgdb_nmicallback ->kgdb_cpu_enter (The slave CPU1 will loop touch watchdog and wait for the master CPU0 to exit.) The above is the state before executing the KDB command "go".
When the user executes the KDB command "go", it will trigger a deadlock.
master CPU0 : ->kgdb_cpu_enter ->kgdboc_post_exp_handler ->queue_work_on ->__queue_work ->insert_work ->wake_up_process ->try_to_wake_up ->_raw_spin_lock (Acquire the spin lock of master CPU0 rq->lock, but at this time the spin lock of master CPU0 is held by CPU1) As a result, a deadlock has occurred.
Therefore, when the master CPU0 exits, if the rq->lock of CPU0 is locked, it should not wake up the worker on the system_wq.
example: BUG: spinlock lockup suspected on CPU#0, namex/10450 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20 Call Trace: [] ? __schedule+0x16d/0xac0 [] ? schedule+0x3c/0x90 [] ? schedule_hrtimeout_range_clock+0x10a/0x120 [] ? mutex_unlock+0xe/0x10 [] ? ep_scan_ready_list+0x1db/0x1e0 [] ? schedule_hrtimeout_range+0x13/0x20 [] ? ep_poll+0x27a/0x3b0 [] ? wake_up_q+0x70/0x70 [] ? SyS_epoll_wait+0xb8/0xd0 [] ? entry_SYSCALL_64_fastpath+0x12/0x75 CPU: 0 PID: 10450 Comm: namex Tainted: G O 4.4.65 #1 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019 0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000 ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980 000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0 Call Trace: [] dump_stack+0x85/0xc2 [] spin_dump+0x7f/0x100 [] do_raw_spin_lock+0xa0/0x150 [] _raw_spin_lock+0x15/0x20 [] try_to_wake_up+0x176/0x3d0 [] wake_up_process+0x15/0x20 [] insert_work+0x81/0xc0 [] __queue_work+0x135/0x390 [] queue_work_on+0x46/0x90 [] kgdboc_post_exp_handler+0x48/0x70 [] kgdb_cpu_enter+0x598/0x610 [] kgdb_handle_exception+0xf2/0x1f0 [] __kgdb_notify+0x71/0xd0 [] kgdb_notify+0x35/0x70 [] notifier_call_chain+0x4a/0x70 [] notify_die+0x3d/0x50 [] do_int3+0x89/0x120 [] int3+0x44/0x80 The following modification can solve the problem, but I want to know if this is a bug and how to fix it? diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c index 7ce7bb164..945318ef1 100644 --- a/drivers/tty/serial/kgdboc.c +++ b/drivers/tty/serial/kgdboc.c @@ -22,6 +22,9 @@ #include #include #include +#include + +#include "../kernel/sched/sched.h" #define MAX_CONFIG_LEN 40 @@ -399,7 +402,8 @@ static void kgdboc_post_exp_handler(void) dbg_restore_graphics = 0; con_debug_leave(); } - kgdboc_restore_input(); + if (!raw_spin_is_locked(&(cpu_rq(smp_processor_id())->lock))) + kgdboc_restore_input(); } static struct kgdb_io kgdboc_io_ops = { --
Источник: https://stackoverflow.com/questions/780 ... elated-bug
Мобильная версия