1. Question posing
Function is an important concept in C language. Making good use of functions can make full use of the functions of the system library to write programs that are independent of modules and easy to maintain and modify. Functions are not a unique concept of C language. Methods, procedures, etc. in other languages are essentially functions. It can be seen that the function is important in teaching. In teaching, the method of drawing simple stack diagrams is generally used to describe function calls, but because students do not have an intuitive understanding of stacks, it is difficult to understand them deeply, so the teaching effect is often not ideal, which limits the understanding and application of modular programming ideas .
2. Solution
After introducing the necessary relevant knowledge such as stack and assembly language in the course “Principles of Microcomputer”, by disassembling the C language program code in the high-level language development environment, It enables students to understand the stack changes in function calls by analyzing the assembly code, understand the underlying mapping relationship between high-level language and low-level language in practice, and understand the essence of function calls. This article analyzes and explains the specific process of function calling by disassembling part of the code of a 32-bit C language program under Visual C++6.0.
3. Function call process
The function call process is mainly composed of several steps such as parameter passing, address jump, local variable allocation and initial value assignment, function body execution, and result return[ 1].
3.1. Parameter passing and function jumping
Parameters are passed from actual parameters to formal parameters. In the underlying implementation, the actual parameters are pushed onto the stack according to the function call rules. After the parameter transfer is completed, the current program jumps to the subroutine through the CALL instruction.
3.2. Local variable allocation and assignment
The “{” of the function is considered as the opportunity to allocate local variable space. At the assembly level, local variable allocation is reflected as a continuous area allocated to the low address end with the EBP register as the base address in the stack, and the local variables in the function are addressed through the relative addressing mode of the EBP register. Since the growth direction of the stack is from the high address end to the low address end, the address of the local variable defined first in the function is larger, and the address of the variable defined later gradually becomes smaller, and the addresses of the adjacently defined variables must be adjacent [2]. Since global data and local data are defined in unused data areas and not adjacent to local variables, according to the principle of program locality, adjacent data will be cached, so for the same operation, the operational efficiency of local variables as operands is Possibly higher than operations involving global variables. At the same time, local variable allocation and recycling only need to move the stack pointer ESP, so the efficiency is the highest.
3.3. Parameters of the addressing function
The parameters are stored in the high address end with EBP as the base address. The access to the parameters is also realized through the relative addressing operation of the EBP register.
3.4. Executing statements in the function body
The statements related to specific functions in the function are converted into a series of assembly statements.
3.5. Return value
The return statement returns the return value to the calling function. Under the hood, arguments are passed to the calling function via the EAX register or the EDX register.
3.6. Return to the calling function
The “}” in the function is interpreted as the function body has been executed. When “}” is encountered, all the local variables in the stack and the values of the registers pushed into the stack in the program will be popped out, and the return address of the function pushed into the stack during the execution of the previous CALL instruction will be popped to the instruction pointer register EIP, thereby returning to calling function.
3.7. Stack balance
Stack balance refers to popping the parameters that were pushed onto the stack before the function call, and restoring the stack to its state before the call [3]. Since the parameter is useless data after the function call is complete, it needs to be moved off the stack.
There is no need for stack balancing in C language. However, at the assembly level, it is determined according to the calling convention that the calling function or the called function completes the stack balance.
The common form of C language function call stack is shown in Figure 1[4]:
The parameters are pushed onto the stack by the calling function, and the CALL instruction pushes the return address of the function onto the stack. After entering the sub-function, it is necessary to save the original value of EBP, allocate local variable space, and save the initial value of the register. In the function, the local variables are accessed through the “EBP-displacement” method, and the parameters are accessed through the “EBP+displacement” method [5].
Every time a function call occurs, a stack frame will be established in the stack, and the stack frame will be released after the function call. However, the system has limited stack resources, so if there are too many layers of function calls (such as recursive calls), stack overflow errors may occur.
4. Disassembly code analysis
The following will disassemble the code related to the function call in VisualC++6.0 Debug mode, and reveal the key points of the function call through the analysis of the assembly code detail. The complete C language program code is shown in Figure 2:
The disassembly code of the Function(i,&j) statement is shown in Figure 3.Prompt:
First find the local variables i and j in the main function (their positions in the stack are EBP-8 and EBP-4), and push them into the stack. The default function convention of the Visual C/C++ compiler for C language programs is _cdecl[6]. The stacking convention for this parameter is from right to left, and the “_” modifier is added before the function name. First push the address of j onto the stack, and then push the value of i onto the stack
stack. Call a function with the call instruction. It can be seen from the Call instruction that the “_” modifier is added to the function function after compilation. When the Call instruction is executed, the return address of the function is automatically pushed onto the stack, and then it goes to the function definition to start executing this function.
The disassembly result of “{” of the funciton function is shown in Figure 4:
Inside the function, when a “{” is encountered, a local space is allocated and initialized with the value “0xCCH”. The initial value of a local variable that is not initialized at the time of definition is related to “0xCCH”. Therefore, since the int type variable occupies four bytes, its initial value is -858993460 (0xCCCCC-CCCH); two consecutive 0xCCHs correspond to the Chinese character “hot”, so when
displays the uninitialized variable in the function in the form of characters It will be displayed as “hot…”; the pointer type variable points to the memory whose address is 0xCCCC-CCH. This makes it easy to spot uninitialized variables in debug mode.
The basic storage unit of the stack is four bytes, and for the data smaller than four bytes, the space is allocated according to the four-byte alignment. Therefore, the char type variable ch allocates four bytes of space although the data itself requires two bytes. array When the byte array allocates space, each character occupies one byte, and if there are not enough four characters, it is stored in four-byte alignment. So the total amount of local variable
space is 40H+4+4×2+4=50H. The address of local variable ch is EBP-4, the addresses of a and b are respectively EBP-8, EBP-0CH, and the address of array is EBP-10h. The disassembly results of all the statements between the function opening and closing parentheses are shown in Figure 5:
If the variable has an initial value, the disassembly will generate a Mov instruction to assign it a value. For variables without initial value, each byte is 0xCCH. For character arrays, the situation is slightly more complicated. The character string constant “abc” is stored in the global data area. When the array needs to be initialized by referring to its value, the global data is actually copied to the
local array array in the stack. Since the register is 32 bits, it can only assign up to 4 characters at a time, so one or more assembly statements may be generated after disassembling the statement that assigns the initial value to the array. The access to the contents of the array is done through the register indirect address of [“EBP+ the first address of the array + the offset], so the initialization of the local array is time-consuming but the efficiency of the access is high.
Access local variables and parameters within the function through [EBP + displacement/- displacement] to complete. The return value of the function is placed in the EAX register for the calling function.
It can be seen that at the assembly level, the function does not store local variables inside, and the local variables are only used when the function is called Space will be allocated for the function on the stack when it occurs. Therefore, it is wrong to return the value of the local variable after the function is called.
When the function “}” is encountered, the operation is shown in Figure 6:
Restore the original values of registers EDI, ESI, EBX; transfer ESP back to EBP; pop the original value of EBP. At this time, ESP points to the return address of the function. Execute the pop instruction, pop the return address of the function into the EIP register and return to the calling function. At this time, only the parameters that were pressed when calling the function remain in the stack and have not been cleaned up.
The stack balance statement in the calling function is shown in Figure 7:
According to the _cdecl convention, stack balancing needs to be done by the calling function. According to the number of parameters pushed into the stack (2) and the size of the parameters, the calling function uses the command add ESP, 8 to pop all the parameters out. At this point the stack is restored to its state before the call. A complete function call process is completed.