1024programmer Asp.Net Remember a .NET analysis of CPU explosion in a pharmaceutical company’s business system

Remember a .NET analysis of CPU explosion in a pharmaceutical company’s business system

Remember a .NET analysis of CPU explosion in a pharmaceutical company’s business system

one: background

1. Storytelling

Some time ago, a friend came to me and said that their program had a CPU explosion and asked me to help find out what happened? The best way to solve this problem is to grab a dump and throw it to me. The recommended tool is to use procdump to automate the capture.

Two: Windbg analysis

1. Is the CPU really high?

Still the same old rule, if you want to find this answer, you can use the !tp command.


 0:044> !tp
 logStart: 1
 logSize: 200
 CPU utilization: 88%
 Worker Thread: Total: 8 Running: 4 Idle: 4 MaxLimit: 1023 MinLimit: 4
 Work Request in Queue: 0
 -----------------------------------------------
 Number of Timers: 2
 -----------------------------------------------
 Completion Port Thread:Total: 2 Free: 2 MaxFree: 8 CurrentLimit: 2 MaxLimit: 1000 MinLimit: 4

 

From the data in the hexagram, it can be seen that the current CPU has indeed reached 88%. Next, we will observe whether the machine CPU of this program is powerful. You can use !cpuid to observe.


 0:044> !cpuid
 CP F/M/S Manufacturer MHz
  0 6,94,3 GenuineIntel 3192
  1 6,94,3 GenuineIntel 3192
  2 6,94,3 GenuineIntel 3192
  3 6,94,3 GenuineIntel 3192

 

Judging from the hexagram, Nima is only 4core, which is a bit weak. After all, it is a highly profitable pharmaceutical company, so stingy.

2. Why does the CPU explode?

There are many factors that cause the CPU to explode. There is no standard answer. You need to find the reason yourself. First, we observe the number of threads in this program. You can use the !t command.


 0:044> !t
 ThreadCount: 451
 UnstartedThread: 0
 BackgroundThread: 443
 PendingThread: 0
 DeadThread: 1
 Hosted Runtime: no
                                                                              Lock
  DBG ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
    0 1 22b8 04CE8728 26020 Preemptive 18E5C92C:18E5E4DC 04c86c20 -00001 STA
    3 2 17c8 04B25768 2b220 Preemptive 18CAF3A0:18CB1374 04c86c20 -00001 MTA (Finalizer)
    4 4 238c 04C0CDD8 202b020 Preemptive 18E45D88:18E464DC 04c86c20 -00001 MTA
    5 5 230c 0A6C37A0 202b020 Preemptive 18DAC318:18DAC47C 04c86c20 -00001 MTA
    6 6 23a0 0A70E620 202b220 Preemptive 00000000:00000000 04c86c20 -00001 MTA
    ...

 

From the data in the hexagram, there are currently 451 threads, of which 443 are background threads. Combined with the thread pool threads seen just now!tp, there are only 8 threads, which means that there are 400+ threads in this program. It is created directly through new Thread. This information is more suspicious. Why is it not using thread pool and using Thread? It is strange.

The next idea is to use the ~*e !clrstack command to observe what each thread is doing at this time. Once the command is entered, it takes a long time.


 0:044> ~*e !clrstack
 ...
 OS Thread ID: 0x220c (18)
 Child SP IP Call Site
 184CF614 77dd19dc [HelperMethodFrame: 184cf614] System.Threading.Thread.SleepInternal(Int32)
 184CF680 141975f4 System.Threading.Thread.Sleep(Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs @ 357]
 184CF694 165055b9 xxx.ActionThread`1[[xxx]].Loop()
 184CF878 74467741 System.Threading.Thread+StartHelper.Callback(System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs @ 42]
 184CF888 7446fca1 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @  183]
 184CF8C0 74466742 System.Threading.Thread.StartCallback() [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 105]
 184CFA14 74cbc29f [DebuggerU2MCatchHandlerFrame: 184cfa14]
 ...

 

I didn’t see any particularly obvious business functions on each thread stack in the hexagram. Most of them stopped on Thread.SleepInternal and waited, which made me confused.

3. A sudden enlightenment leads out of confusion

The CPU cannot explode for no reason. It will always be those threads that lift it up. But most of the threads in this program are on Thread.SleepInternal. If they can make the CPU It’s a bit hard to explain.

But the problem must be solved. If there is no breakthrough, we can only bite the bullet and force a breakthrough on Thread.SleepInternal. First use Ctrl+F to search How many threads are stuck on SleepInternal, the screenshot is as follows:

Nima, almost all threads are Sleeping, generally speakingThere are only a few threads sleeping. Next, we will select a thread to see how the business method sleeps. The reference code is as follows:

In this Loop method, I found a lot of Sleep(1). Seeing this, I suddenly thought of the CPU explosion caused by high-frequency context switching.

Which method does the next instruction of this code stop? Loop methods can be decompiled.


 0:047> !clrstack
 OS Thread ID: 0xad8 (47)
 Child SP IP Call Site
 20B5F434 77dd19dc [HelperMethodFrame: 20b5f434] System.Threading.Thread.SleepInternal(Int32)
 20B5F4A0 141975f4 System.Threading.Thread.Sleep(Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs @ 357]
 20B5F4B4 1f123c71 ​​xxx.ActionThread`1[[xxx].Loop()
 20B5F698 74467741 System.Threading.Thread+StartHelper.Callback(System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs @ 42]
 20B5F6A8 1baab7da System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @  183]
 20B5F6E0 74466742 System.Threading.Thread.StartCallback() [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 105]
 20B5F834 74cbc29f [DebuggerU2MCatchHandlerFrame: 20b5f834]
 0:047> !U /d 1f123c71
 Normal JIT generated code
 xxx.ActionThread`1[xxx].Loop()
 ilAddr is 0A324040 pImport is 08AD6468
 Begin 1F123C10, size abd
 1f123c10 55 push ebp
 1f123c11 8bec mov ebp,esp
 1f123c13 57 push edi
 1f123c14 56 push esi
 1f123c15 81ecd4010000 sub esp,1D4h
 1f123c1b c5f877 vzeroupper
 1f123c1e c5d857e4 vxorps xmm4,xmm4,xmm4
 1f123c22 c5fa7fa524feffff vmovdqu xmmword ptr [ebp-1DCh],xmm4
 1f123c2a c5fa7fa534feffff vmovdqu xmmword ptr [ebp-1CCh],xmm4
 1f123c32 b850feffff mov eax,0FFFFFE50h
 1f123c37 c5fa7f6405f4 vmovdqu xmmword ptr [ebp+eax-0Ch],xmm4
 1f123c3d c5fa7f640504 vmovdqu xmmword ptr [ebp+eax+4],xmm4
 1f123c43 c5fa7f640514 vmovdqu xmmword ptr [ebp+eax+14h],xmm4
 1f123c49 83c030 add eax,30h
 ...
 1f123c5a e84115cc55 call coreclr!JIT_DbgIsJustMyCode (74de51a0)
 1f123c5f 90 nop
 1f123c60 90 nop
 1f123c61 e9300a0000 jmp xxx.ActionThread.Loop+0xa86 (1f124696)
 1f123c66 90 nop
 1f123c67 b901000000 mov ecx,1
 1f123c6c e87f54eaea call 09fc90f0 (System.Threading.Thread.Sleep(Int32), mdToken: 06002D01)
 >>> 1f123c71 ​​90 nop
 ...

 

Through the >>> in the hexagram, it can be confirmed that many methods are idling in while (!base.IsTerminated). If the thread of Sleep(1) There may be no problem if there are fewer threads, but it can’t handle more than 400 threads playing together. In the end, high-frequency context switching causes the CPU to explode.

Sleep(1) internally involves the CPU waiting queue, ready queue, and timer _KTIMER kernel object. Because the Windows source code is not open to the public, it is still done internally. You can use !pcr command observes the CPU backpack.


 lkd> !pcr 0
 KPCR for Processor 0 at fffff8058023c000:
     Major 1 Minor 1
 NtTib.ExceptionList: fffff80589089fb0
 NtTib.StackBase: fffff80589088000
 NtTib.StackLimit: 000000137e1fa158
 NtTib.SubSystemTib: fffff8058023c000
 NtTib.Version: 000000008023c180
 NtTib.UserPointer: fffff8058023c870
 NtTib.SelfTib: 000000137dfe0000

 SelfPcr: 0000000000000000
 Prcb: fffff8058023c180
 Irql: 0000000000000000
                  ...

 CurrentThread: ffff910c66906080
 NextThread: 0000000000000000
 IdleThread: fffff80583d27a00

 DpcQueue:

 lkd> dt nt!_KPRCB fffff8058023c180
    +0x008 CurrentThread : 0xffff910c`66906080 _KTHREAD
    +0x010 NextThread: (null)
    +0x018 IdleThread: 0xffffff805`83d27a00 _KTHREAD
    ...
    +0x7c00 WaitListHead: _LIST_ENTRY [0xffff910c`5ec30158 - 0xffff910c`628b1158]
    +0x7c80 DispatcherReadyListHead: [32] _LIST_ENTRY [0xfffff805`80243e00 - 0xfffff805`80243e00]

 

The above [32] is an array queue of 32 priority levels for waiting threads.

With the above analysis results, I finally tell my friends to do the following two things:

  • Reduce the number of threads participating in Thread.Sleep(1).
  • Try to ease it from 1 -> 50, of course the bigger the better.

Three: Summary

The explosion of CPU this time is quite interesting. It is not caused by business methods, but caused by high-frequency context switching caused by a large number of Sleep(1). It is a bit interesting. I leave this article to avoid pitfalls!

This article is from the internet and does not represent1024programmerPosition, please indicate the source when reprinting:https://www.1024programmer.com/remember-a-net-analysis-of-cpu-explosion-in-a-pharmaceutical-companys-business-system/

author: admin

Previous article
Next article

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact us

181-3619-1160

Online consultation: QQ交谈

E-mail: 34331943@QQ.com

Working hours: Monday to Friday, 9:00-17:30, holidays off

Follow wechat
Scan wechat and follow us

Scan wechat and follow us

Follow Weibo
Back to top
首页
微信
电话
搜索