1024programmer Asp.Net Remember Windows 10 memory compression module crash analysis

Remember Windows 10 memory compression module crash analysis

Remember Windows 10 memory compression module crash analysis

one: background

1. Storytelling

While analyzing various faults of .NET programs for free for friends, we often also receive various other types of dumps, such as: Windows crashes, C++ crashes, Mono crashes, really everything, because of the basics Due to the relative lack of knowledge, the analysis is not so smooth. Today I will talk about a Windows crash kernel dump. This dump was given to me by a friend a few days ago and asked me to help take a look. , use windbg to analyze after having the dump.

Two: WinDbg analysis

1. Where to start

As long as there is a crash on the Windows platform, the operating system will maintain an EXCEPTION_POINTERS structure. The interpretation of this structure is very important for analyzing the problem. Use !analyze -v The brief output of the command is as follows:


 2: kd> !analyze -v
 ***************************************************  *****************************
 * *
 *Bugcheck Analysis*
 * *
 ***************************************************  *****************************

 UNEXPECTED_STORE_EXCEPTION (154)
 The store component caught an unexpected exception.
 Arguments:
 Arg1: ffffb402b9851000, Pointer to the store context or data manager
 Arg2: ffffe607bc53df30, Exception information
 Arg3: 0000000000000002, Reserved
 Arg4: 0000000000000000, Reserved
 ...
 EXCEPTION_RECORD: ffffe607bc53eeb8 -- (.exr 0xffffe607bc53eeb8)
 ExceptionAddress: fffff80025b04bd0 (nt!RtlDecompressBufferXpressLz+0x0000000000000050)
    ExceptionCode: c0000006 (In-page I/O error)
   ExceptionFlags: 00000000
 NumberParameters: 3
    Parameter[0]: 0000000000000000
    Parameter[1]: 0000023f30ee99f0
    Parameter[2]: 00000000c0000185
 Inpage operation failed at 0000023f30ee99f0, due to I/O error 00000000c0000185

 EXCEPTION_PARAMETER1: 0000000000000000

 EXCEPTION_PARAMETER2: 0000023f30ee99f0

 CONTEXT: ffffe607bc53e6f0 -- (.cxr 0xffffe607bc53e6f0)
 rax=fffff80025b04b80 rbx=ffff9d808d7fa000 rcx=ffff9d808d7fa000
 rdx=ffff9d808d7fa000 rsi=0000000000000002 rdi=0000023f30ee99f0
 rip=fffff80025b04bd0 rsp=ffffe607bc53f0f8 rbp=0000023f30eea2fe
  r8=0000023f30ee99f0 r9=0000000000000964 r10=ffff9d808d7faea0
 r11=0000023f30eea354 r12=ffffe607bc53f368 r13=ffffb402d84d8000
 r14=ffff9d808d7fb000 r15=0000000000000000
 iopl=0 nv up ei pl zr na po nc
 cs=0010 ss=0000 ds=002b es=002b fs=0053 gs=002b efl=00050246
 nt!RtlDecompressBufferXpressLz+0x50:
 fffff800`25b04bd0 418b08 mov ecx,dword ptr [r8] ds:002b:0000023f`30ee99f0=????????
 Resetting default scope
 ...

 

From the information in the hexagram, it is because the physical memory page mapped by the address 0000023f30ee99f0 is swapped into the memory, and an IO error is thrown. From the assembly instruction ecx,dword ptr [ r8] ds:002b:0000023f30ee99f0=???????? can also be seen.

If you don’t believe it, you can use !vtop and !pte to observe their corresponding physical addresses and physical page numbers, but they cannot be found.


 2: kd> !vtop 0 000000006d34aca0
 Amd64VtoP: Virt 000000006d34aca0, pagedir 00000003d81fb002
 Amd64VtoP: PML4E 00000003d81fb002
 Amd64VtoP: PML4E read error 0x8000FFFF
 Virtual address 6d34aca0 translation fails, error 0x8000FFFF.

 2: kd> !pte 000000006d34aca0
                                            VA 000000006d34aca0
 PXE at FFFF86432190C000 PPE at FFFF864321800008 PDE at FFFF864300001B48 PTE at FFFF860000369A50
 contains 0000000000000000
 contains 0000000000000000
 not valid

 

2. Insight into the thread stack before the exception

After having this preliminary information, let’s observe the register context and thread stack information when the exception occurs. The output is as follows:


 2: kd> .cxr 0xffffe607bc53e6f0; k
 rax=fffff80025b04b80 rbx=ffff9d808d7fa000 rcx=ffff9d808d7fa000
 rdx=ffff9d808d7fa000 rsi=0000000000000002 rdi=0000023f30ee99f0
 rip=fffff80025b04bd0 rsp=ffffe607bc53f0f8 rbp=0000023f30eea2fe
  r8=0000023f30ee99f0 r9=0000000000000964 r10=ffff9d808d7faea0
 r11=0000023f30eea354 r12=ffffe607bc53f368 r13=ffffb402d84d8000
 r14=ffff9d808d7fb000 r15=0000000000000000
 iopl=0 nv up ei pl zr na po nc
 cs=0010 ss=0000 ds=002b es=002b fs=0053 gs=002b efl=00050246
 nt!RtlDecompressBufferXpressLz+0x50:
 fffff800`25b04bd0 418b08 mov ecx,dword ptr [r8] ds:002b:0000023f`30ee99f0=????????
   *** Stack trace for last set context - .thread/.cxr resets it
  # Child-SP RetAddr Call Site
 00 ffffe607`bc53f0f8 fffff800`25a5bc10 nt!RtlDecompressBufferXpressLz+0x50
 01 ffffe607`bc53f110 fffff800`25a5bb14nt!RtlDecompressBufferEx+0x60
 02 ffffe607`bc53f160 fffff800`25a5b9a1 nt!ST_STORE::StDmSinglePageCopy+0x150
 03 ffffe607`bc53f220 fffff800`25b56ff0 nt!ST_STORE::StDmSinglePageTransfer+0xa5
 04 ffffe607`bc53f270 fffff800`25b57904 nt!ST_STORE::StDmpSinglePageRetrieve+0x180
 05 ffffe607`bc53f310 fffff800`25b57aed nt!ST_STORE::StDmPageRetrieve+0xc8
 06 ffffe607`bc53f3c0 fffff800`25a5c071 nt!SMKM_STORE::SmStDirectReadIssue+0x85
 07 ffffe607`bc53f440 fffff800`25aad478 nt!SMKM_STORE::SmStDirectReadCallout+0x21
 08 ffffe607`bc53f470 fffff800`25a5cb57 nt!KeExpandKernelStackAndCalloutInternal+0x78
 09 ffffe607`bc53f4e0 fffff800`25a5713c nt!SMKM_STORE::SmStDirectRead+0xc7
 0a ffffe607`bc53f5b0 fffff800`25a56b70 nt!SMKM_STORE::SmStWorkItemQueue+0x1ac
 0b ffffe607`bc53f600 fffff800`25b58727 nt!SMKM_STORE_MGR::SmIoCtxQueueWork+0xc0
 0c ffffe607`bc53f690 fffff800`25b2b94b ​​nt!SMKM_STORE_MGR::SmPageRead+0x167
 0d ffffe607`bc53f700 fffff800`25ad1020 nt!SmPageRead+0x33
 0e ffffe607`bc53f750 fffff800`25ad023d nt!MiIssueHardFaultIo+0x10c
 0f ffffe607`bc53f7a0 fffff800`25a6e818 nt!MiIssueHardFault+0x29d
 10 ffffe607`bc53f860 fffff800`25c0b6d8 nt!MmAccessFault+0x468
 11 ffffe607`bc53fa00 00007ff8`c3089fa2 nt!KiPageFault+0x358
 12 00000067`4ca7f270 00000000`00000000 0x00007ff8`c3089fa2

 

Judging from the call stack information in the hexagram, the source of the code is from User mode (0x00007ff8c3089fa2). It should be accessing the content on the user mode address 0000023f30ee99f0. Since The corresponding physical page is not in the memory, triggering the nt!KiPageFault interrupt, which is the page fault interrupt marked by the number 0xe in the idt table. The output is as follows:


 lkd> !idt

 Dumping IDT: fffff8050ce87000

 00: fffff80506206400nt!KiDivideErrorFault
 ...
 0e: fffff80506209980 nt!KiPageFault

 

The IO operation MiIssueHardFaultIo is triggered in the page fault interrupt. To retrieve pages from pagefiles, the next step is the page reading logic SmPageRead, and finally RtlDecompressBufferXpressLz.

If you are careful, you will find that there is a keyword Decompress, yes, it is decompression. Why does the imported page need to be decompressed? This is our breaking point.

3. Why is it decompressed

To find the answer to this question, you need to observe the detailed information of this exception thread. You can use .thread to switch to the exception thread context, and then use !thread to observe .


 2: kd> .thread
 Implicit thread is now ffffb402`be04a080

 2: kd> !thread ffffb402`be04a080
 THREAD ffffb402be04a080 Cid 0594.2228 Teb: 000000674c5b8000 Win32Thread: 0000000000000000 RUNNING on processor 2
 Not impersonating
 GetUlongFromAddress: unable to read from fffff8002641152c
 Owning Process ffffb402b8d58080 Image: 
 Attached Process ffffb402b984a040 Image: MemCompression
 fffff78000000000: Unable to get shared data
 Wait Start TickCount 649763
 Context Switch Count 9 IdealProcessor: 0
 ReadMemory error: Cannot get nt!KeMaximumIncrement value.
 UserTime 00:00:00.000
 KernelTime 00:00:00.000
 Win32 Start Address 0x00007ff8c808afb0
 Stack Init ffffe607bc53fb90 Current ffffe607bc53e800
 Base ffffe607bc540000 Limit ffffe607bc539000 Call 0000000000000000
 Priority 8 BasePriority 7 PriorityDecrement 0 IoPriority 2 PagePriority 2
 Child-SP RetAddr: Args to Child: Call Site
 ffffe607`bc53de78 fffff800`25d9856e : 00000000`00000154 ffffb402`b9851000 ffffe607`bc53df30 00000000`00000002 : nt!KeBugCheckEx
 FFFFE607`BC530 FFFFFF800`25c189db: FFFFB402`B9851000 FFFFFE607`bc53df30 FFFFE607 0002 FFFFFE607`bc53dfe0: nt! SMKM_TRAITS> :: SM StunhandleDexceptionFilter+0x7e
 ffffe607`bc53ded0 fffff800`25bcfb1f : fffff800`00000002 fffff800`258d905c ffffe607`bc539000 ffffe607`bc540000 : nt!`SMKM_STORE::SmStDirectReadIssue'::`1': :filt$0+0x22
 ffffe607`bc53df00 fffff800`25c062ff : fffff800`258d905c ffffe607`bc53e4e0 fffff800`25bcfa80 00000000`00000000 : nt!_C_specific_handler+0x9f
 ...

 

From the information in the hexagram, the abnormal thread also has an additional process ffffb402b984a040, which comes from the MemCompression module. Judging from the name, it is called compression and decompression. The logic should be related to it. Next, search the Internet. There is an article that explains it very well: https://www.howtogeek.com/319933/what-is-memory-compression-in-windows -10/

General idea: This is a new feature of Windows 10. It uses memory compression technology to store more memory pages in RAM. Compared with the traditional swap to PageFiles.sys, it has higher performance. The disadvantage is that it requires some solution. CPU time required for compression.

You can also take a peek on Windows 10:

4. Problem Solving

The solution is very simple. Learn how 4S stores solve problems. If you can replace it, don’t repair it. Ask your friends to turn off Memory compression so that they won’t leave
RtlDecompressBufferXpressLz logic, in theory there will be no problem.

After closing, according to feedback from friends, it has not crashed in the past few days.

Three: Summary

Analyzing the kernel state is much more difficult than the user state. It requires a relatively in-depth understanding of the operating system and CPU, and there is a long way to go. . .

Compression technology allows more memory pages to be stored in RAM, which has higher performance than traditional swapping to PageFiles.sys. The disadvantage is that it requires some CPU time for decompression.

You can also take a peek on Windows 10:

4. Problem Solving

The solution is very simple. Learn how 4S stores solve problems. If you can replace it, don’t repair it. Ask your friends to turn off Memory compression so that they won’t leave
RtlDecompressBufferXpressLz logic, in theory there will be no problem.

After closing, according to feedback from friends, it has not crashed in the past few days.

Three: Summary

Analyzing the kernel state is much more difficult than the user state. It requires a relatively in-depth understanding of the operating system and CPU, and there is a long way to go. . .

author: admin

Previous article
Next article

Leave a Reply

Your email address will not be published. Required fields are marked *

The latest and most comprehensive programming knowledge, all in 1024programmer.com

© 2023 1024programmer - Encyclopedia of Programming Field
Contact Us

Contact us

181-3619-1160

Online consultation: QQ交谈

E-mail: [email protected]

Working hours: Monday to Friday, 9:00-17:30, holidays off

Follow wechat
Scan wechat and follow us

Scan wechat and follow us

首页
微信
电话
搜索