Remember Windows 10 memory compression module crash analysis
one: background
1. Storytelling
While analyzing various faults of .NET programs for free for friends, we often also receive various other types of dumps, such as: Windows crashes, C++ crashes, Mono crashes, really everything, because of the basics Due to the relative lack of knowledge, the analysis is not so smooth. Today I will talk about a Windows
crash kernel dump. This dump was given to me by a friend a few days ago and asked me to help take a look. , use windbg to analyze after having the dump.
Two: WinDbg analysis
1. Where to start
As long as there is a crash on the Windows platform, the operating system will maintain an EXCEPTION_POINTERS
structure. The interpretation of this structure is very important for analyzing the problem. Use !analyze -v
The brief output of the command is as follows:
2: kd> !analyze -v
*************************************************** *****************************
* *
*Bugcheck Analysis*
* *
*************************************************** *****************************
UNEXPECTED_STORE_EXCEPTION (154)
The store component caught an unexpected exception.
Arguments:
Arg1: ffffb402b9851000, Pointer to the store context or data manager
Arg2: ffffe607bc53df30, Exception information
Arg3: 0000000000000002, Reserved
Arg4: 0000000000000000, Reserved
...
EXCEPTION_RECORD: ffffe607bc53eeb8 -- (.exr 0xffffe607bc53eeb8)
ExceptionAddress: fffff80025b04bd0 (nt!RtlDecompressBufferXpressLz+0x0000000000000050)
ExceptionCode: c0000006 (In-page I/O error)
ExceptionFlags: 00000000
NumberParameters: 3
Parameter[0]: 0000000000000000
Parameter[1]: 0000023f30ee99f0
Parameter[2]: 00000000c0000185
Inpage operation failed at 0000023f30ee99f0, due to I/O error 00000000c0000185
EXCEPTION_PARAMETER1: 0000000000000000
EXCEPTION_PARAMETER2: 0000023f30ee99f0
CONTEXT: ffffe607bc53e6f0 -- (.cxr 0xffffe607bc53e6f0)
rax=fffff80025b04b80 rbx=ffff9d808d7fa000 rcx=ffff9d808d7fa000
rdx=ffff9d808d7fa000 rsi=0000000000000002 rdi=0000023f30ee99f0
rip=fffff80025b04bd0 rsp=ffffe607bc53f0f8 rbp=0000023f30eea2fe
r8=0000023f30ee99f0 r9=0000000000000964 r10=ffff9d808d7faea0
r11=0000023f30eea354 r12=ffffe607bc53f368 r13=ffffb402d84d8000
r14=ffff9d808d7fb000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
cs=0010 ss=0000 ds=002b es=002b fs=0053 gs=002b efl=00050246
nt!RtlDecompressBufferXpressLz+0x50:
fffff800`25b04bd0 418b08 mov ecx,dword ptr [r8] ds:002b:0000023f`30ee99f0=????????
Resetting default scope
...
From the information in the hexagram, it is because the physical memory page mapped by the address 0000023f30ee99f0
is swapped into the memory, and an IO error is thrown. From the assembly instruction ecx,dword ptr [ r8] ds:002b:0000023f30ee99f0=????????
can also be seen.
If you don’t believe it, you can use !vtop
and !pte
to observe their corresponding physical addresses and physical page numbers, but they cannot be found.
2: kd> !vtop 0 000000006d34aca0
Amd64VtoP: Virt 000000006d34aca0, pagedir 00000003d81fb002
Amd64VtoP: PML4E 00000003d81fb002
Amd64VtoP: PML4E read error 0x8000FFFF
Virtual address 6d34aca0 translation fails, error 0x8000FFFF.
2: kd> !pte 000000006d34aca0
VA 000000006d34aca0
PXE at FFFF86432190C000 PPE at FFFF864321800008 PDE at FFFF864300001B48 PTE at FFFF860000369A50
contains 0000000000000000
contains 0000000000000000
not valid
2. Insight into the thread stack before the exception
After having this preliminary information, let’s observe the register context and thread stack information when the exception occurs. The output is as follows:
2: kd> .cxr 0xffffe607bc53e6f0; k
rax=fffff80025b04b80 rbx=ffff9d808d7fa000 rcx=ffff9d808d7fa000
rdx=ffff9d808d7fa000 rsi=0000000000000002 rdi=0000023f30ee99f0
rip=fffff80025b04bd0 rsp=ffffe607bc53f0f8 rbp=0000023f30eea2fe
r8=0000023f30ee99f0 r9=0000000000000964 r10=ffff9d808d7faea0
r11=0000023f30eea354 r12=ffffe607bc53f368 r13=ffffb402d84d8000
r14=ffff9d808d7fb000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
cs=0010 ss=0000 ds=002b es=002b fs=0053 gs=002b efl=00050246
nt!RtlDecompressBufferXpressLz+0x50:
fffff800`25b04bd0 418b08 mov ecx,dword ptr [r8] ds:002b:0000023f`30ee99f0=????????
*** Stack trace for last set context - .thread/.cxr resets it
# Child-SP RetAddr Call Site
00 ffffe607`bc53f0f8 fffff800`25a5bc10 nt!RtlDecompressBufferXpressLz+0x50
01 ffffe607`bc53f110 fffff800`25a5bb14nt!RtlDecompressBufferEx+0x60
02 ffffe607`bc53f160 fffff800`25a5b9a1 nt!ST_STORE::StDmSinglePageCopy+0x150
03 ffffe607`bc53f220 fffff800`25b56ff0 nt!ST_STORE::StDmSinglePageTransfer+0xa5
04 ffffe607`bc53f270 fffff800`25b57904 nt!ST_STORE::StDmpSinglePageRetrieve+0x180
05 ffffe607`bc53f310 fffff800`25b57aed nt!ST_STORE::StDmPageRetrieve+0xc8
06 ffffe607`bc53f3c0 fffff800`25a5c071 nt!SMKM_STORE::SmStDirectReadIssue+0x85
07 ffffe607`bc53f440 fffff800`25aad478 nt!SMKM_STORE::SmStDirectReadCallout+0x21
08 ffffe607`bc53f470 fffff800`25a5cb57 nt!KeExpandKernelStackAndCalloutInternal+0x78
09 ffffe607`bc53f4e0 fffff800`25a5713c nt!SMKM_STORE::SmStDirectRead+0xc7
0a ffffe607`bc53f5b0 fffff800`25a56b70 nt!SMKM_STORE::SmStWorkItemQueue+0x1ac
0b ffffe607`bc53f600 fffff800`25b58727 nt!SMKM_STORE_MGR::SmIoCtxQueueWork+0xc0
0c ffffe607`bc53f690 fffff800`25b2b94b nt!SMKM_STORE_MGR::SmPageRead+0x167
0d ffffe607`bc53f700 fffff800`25ad1020 nt!SmPageRead+0x33
0e ffffe607`bc53f750 fffff800`25ad023d nt!MiIssueHardFaultIo+0x10c
0f ffffe607`bc53f7a0 fffff800`25a6e818 nt!MiIssueHardFault+0x29d
10 ffffe607`bc53f860 fffff800`25c0b6d8 nt!MmAccessFault+0x468
11 ffffe607`bc53fa00 00007ff8`c3089fa2 nt!KiPageFault+0x358
12 00000067`4ca7f270 00000000`00000000 0x00007ff8`c3089fa2
Judging from the call stack information in the hexagram, the source of the code is from User mode (0x00007ff8c3089fa2)
. It should be accessing the content on the user mode address 0000023f30ee99f0
. Since The corresponding physical page is not in the memory, triggering the nt!KiPageFault
interrupt, which is the page fault interrupt marked by the number 0xe
in the idt table. The output is as follows:
lkd> !idt
Dumping IDT: fffff8050ce87000
00: fffff80506206400nt!KiDivideErrorFault
...
0e: fffff80506209980 nt!KiPageFault
The IO operation MiIssueHardFaultIo
is triggered in the page fault interrupt. To retrieve pages from pagefiles, the next step is the page reading logic SmPageRead
, and finally RtlDecompressBufferXpressLz.
If you are careful, you will find that there is a keyword Decompress
, yes, it is decompression. Why does the imported page need to be decompressed? This is our breaking point.
3. Why is it decompressed
To find the answer to this question, you need to observe the detailed information of this exception thread. You can use .thread
to switch to the exception thread context, and then use !thread
to observe .
2: kd> .thread
Implicit thread is now ffffb402`be04a080
2: kd> !thread ffffb402`be04a080
THREAD ffffb402be04a080 Cid 0594.2228 Teb: 000000674c5b8000 Win32Thread: 0000000000000000 RUNNING on processor 2
Not impersonating
GetUlongFromAddress: unable to read from fffff8002641152c
Owning Process ffffb402b8d58080 Image:
Attached Process ffffb402b984a040 Image: MemCompression
fffff78000000000: Unable to get shared data
Wait Start TickCount 649763
Context Switch Count 9 IdealProcessor: 0
ReadMemory error: Cannot get nt!KeMaximumIncrement value.
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0x00007ff8c808afb0
Stack Init ffffe607bc53fb90 Current ffffe607bc53e800
Base ffffe607bc540000 Limit ffffe607bc539000 Call 0000000000000000
Priority 8 BasePriority 7 PriorityDecrement 0 IoPriority 2 PagePriority 2
Child-SP RetAddr: Args to Child: Call Site
ffffe607`bc53de78 fffff800`25d9856e : 00000000`00000154 ffffb402`b9851000 ffffe607`bc53df30 00000000`00000002 : nt!KeBugCheckEx
FFFFE607`BC530 FFFFFF800`25c189db: FFFFB402`B9851000 FFFFFE607`bc53df30 FFFFE607 0002 FFFFFE607`bc53dfe0: nt! SMKM_TRAITS> :: SM StunhandleDexceptionFilter+0x7e
ffffe607`bc53ded0 fffff800`25bcfb1f : fffff800`00000002 fffff800`258d905c ffffe607`bc539000 ffffe607`bc540000 : nt!`SMKM_STORE::SmStDirectReadIssue'::`1': :filt$0+0x22
ffffe607`bc53df00 fffff800`25c062ff : fffff800`258d905c ffffe607`bc53e4e0 fffff800`25bcfa80 00000000`00000000 : nt!_C_specific_handler+0x9f
...
From the information in the hexagram, the abnormal thread also has an additional process ffffb402b984a040
, which comes from the MemCompression
module. Judging from the name, it is called compression and decompression.
The logic should be related to it. Next, search the Internet. There is an article that explains it very well: https://www.howtogeek.com/319933/what-is-memory-compression-in-windows -10/
General idea: This is a new feature of Windows 10. It uses memory compression technology to store more memory pages in RAM. Compared with the traditional swap to PageFiles.sys, it has higher performance. The disadvantage is that it requires some solution. CPU time required for compression.
You can also take a peek on Windows 10:
4. Problem Solving
The solution is very simple. Learn how 4S stores solve problems. If you can replace it, don’t repair it. Ask your friends to turn off Memory compression
so that they won’t leave
RtlDecompressBufferXpressLz
logic, in theory there will be no problem.
After closing, according to feedback from friends, it has not crashed in the past few days.
Three: Summary
Analyzing the kernel state is much more difficult than the user state. It requires a relatively in-depth understanding of the operating system
and CPU
, and there is a long way to go. . .
Compression technology allows more memory pages to be stored in RAM, which has higher performance than traditional swapping to PageFiles.sys. The disadvantage is that it requires some CPU time for decompression.
You can also take a peek on Windows 10:
4. Problem Solving
The solution is very simple. Learn how 4S stores solve problems. If you can replace it, don’t repair it. Ask your friends to turn off Memory compression
so that they won’t leave
RtlDecompressBufferXpressLz
logic, in theory there will be no problem.
After closing, according to feedback from friends, it has not crashed in the past few days.
Three: Summary
Analyzing the kernel state is much more difficult than the user state. It requires a relatively in-depth understanding of the operating system
and CPU
, and there is a long way to go. . .