Reversing Malicious Code
Goal is to understand common malware characteristics at a code level
May include potential branches of execution with code analysis
Overview of the code lifecycle
Source code is translated into object code by a compiler
Object code is then combined with libraries and an executable file is created
To run the file, the operating system reads various information from the executable file, allocates memory, and loads required libraries into memory
Control is transferred to the code to execute
At this final stage is where we examine the code with a debugger
Note: Libraries may be loaded during the programs execution
Ghidra
Developed by NSA
Its decompiler produces a C representation of the code to speed up analysis
Includes support for writing java and python scripts to automate analysis
Help is accessed via F1 key
Ghidra v10 includes a debugger
Create a new project
File --> New Project
Choose the project type
Click
Finish
Drag and drop the specimen into the project window
Accept defaults in the
Imports
windows and click Ok
Launch the code browser and being the auto-analysis
Make sure to enable
WindowsPE x86 Propagate External Parameters
optionFinally click the Analyze button and wait for Ghidra to finish
Once auto analysis is completed an Auto Analysis summary will show any warnings or issues encountered during the process
A common warning is that the file does not
contain debug information
This is common and not an issue
Before Proceeding save the project and take a snapshot
Ghidra Overview
Main window is the
Listing View
which presents the target programs code and dataWill initially bring you to the beginning of the file in the
Listing View
--> notice theMZ
stringIf you scroll down from there you can examine the programs header
Program Tree
Window is in the top left and shows the different sections and headers
Section names are typically:
To jump to the
.text
section double click the.text
node
FUN in Ghidra
In Ghidra the
FUN_
prefix generically refers to a function while the numeric value refers to the address where the function is loaded into memoryOriginal name of the function is normally lost during compilation
Execution occurs linearly one instruction after the next
On the far left you will have a 32 bit address such as
00401007
(hex)This address represents the location of code in memory after the program is loaded, not the address of a location on disk i.e. within a file hex editor
On the right there are x86 assembly instructions
Note: - This is the beginning of the
.text
section, not the beginning of the program, that occurs at the entry point
Function graph view provides a visual perspective on code
Click on the function you want i.e.
FUN_00401007
Browse to
Window --> Function Graph
menu itemHelpful for visualizing loops and complex conditionals within a function but the
Listing view
is more compact nd easier for some people to navigateThe color of the arrows symbolize code flow
If the code block ends in a conditional jump green arrows indicate the path here execution will continue if the condition is met
If the condition is not met a red arrow will show where execution continues
If the arrow is blue the code ends in an unconditional jump
View Imports to review a programs external dependencies
The import address table (IAT) helps direct code analysis
You can view imports in the Symbol Tree window but we will access this information via
Window --> Symbol References
Filter symbols by "
Imported
" to focus on dependencies
Look for API call patterns associated with malware behavior
We can examine imports to identify potential functionality associated with common malware characteristics
Learn more about an API call at microsoft.com
Types of API Calls:
Refers to if the function supports ANSI (8 bit character)
Wide refers to a two byte character representation (UTF-16)
Extended is when MSFT updates a function and the new function is not compatible with the old one
Instructions reference registers, immediate values and memory
Instructions have two components:
operation and operand
Instructions can have 0-3 operands
An Operand can be:
Consider
MOV EAX, 0x6453
EAX is the destination (first)
0x6453 is the source (second)
You are setting EAX to the value 0x6453
Operands may be implied
Intel processor uses registers to track the state of computation as instructions are executed
Registers are on chip memory locations
Instructions act on registers and memory locations
A CPU has a series of registers
We monitor registers to track arguments, variables, and function return values
The x86 architecture uses the following general purpose registers to hold code and data
Special use registers hold flags and track program execution
EIP
points to the next instruction to executeEFLAGS
bit represents the outcome of computers and they control CPU operations
Segment registers include:
32 bit registers can also be accessed as 16 and 8 bit registers
On 32 bit arch, registers can be accessed by their default
dword
sizeTo access a registers lower
16 bits
the leadingE
is omitted from the name e.g.EAX
becomesAX
The naming scheme for
EAX EBX ECX EDX
is as followedE<letter>X
-->dword
32 bit value of the register<letter>X
--> lower word 16 bit value of the register<letter>H
--> high byte 8 bit of the<letter>X
value of the register<letter>L
--> low byte 8 bit of theletterX>
value of the register
The length of a word, dword, and qword are 16, 32, and 64 bits
A
word
in assembly is the natural size for a unit of data16 bit
processor has16-bit
wordsMany tools consider a word to be 16 bits regardless of processor size
Additional common data sizes:
The operand for one push instruction is a pointer to a string
A
pointer
is a variable that holds a memory address (it points to a memory location)When the address that the pointer points to is accessed it is called dereferencing because the pointer references another location in memory
Pointers are more efficient, rather than copying around a data structure in memory its more efficient to copy the value of a pointer (4 bytes on 32 bit systems)
A
PUSH
instruction before aCALL
often represents arguments passed to the function specified by theCALL
Memory can be accessed directly by many assembly instructions
Example:
Brackets mean fetch data at the specified address (dereference)
This is direct addressing because we are dereferencing an immediate value
The result is that 4 bytes of data at 0x410230 will be moved to
EAX
Some tools like
IDA
omit brackets for direct addresses (IDA: dword_410230
)Memory may also be addressed by reference indirectly
The address may be calculated or in a register
This is called an
Effective Address
and it enables us to work efficiently with data structuresFormat:
Base + (Index * Scale) + Displacement
Indirect Referencing: address of the destination is calculated or it resides in a register. The calculated address is called the effective address (EA)
If the address sits in a register, it is still different from direct memory addressing where the register is the destination
In indirect memory addressing the register holds the address of the destination.
Large advantage of indirect memory addressing is the capability to efficiently work with data structures
You can increment the value of a single register to step through fields of a data structure or the same field of an array of data structures
If the scale is used and index register must also be used
Examples of indirectly addressing memory
[EAX]
: Access dynamically allocated memory (base)[EBP + 0x10]
: Access data on the stack (base + displacement)[EAX + EBX * 8]
: Access an array with 8-byte structure ( base + index * scale)EAX +EBX + 0xC]
: Access fields of a two dimensional array of structures (base + index + displacement)Indirect memory addressing may pose challenges for static code analysis because registers are not populated until runtime
Strings are an example of a data structure
Data structures groups simple variables into more complex types
Examples of data structures include: strings, linked lists, sockets, and file handles
When reversing determine the type of data structure by usage
Data structures enable us to group bytes and advance our understanding of the code
**Code vs Data **
Context determines the answer
RegOpenKeyExA
ExampleThe API call will have to have a symbolic constant i.e.
PUSH 0x80000001
During compilation it will be changed from the symbolic constant into the hex representation
Right click the hex value, choose
Set Equate
and then chooseHKEY_CURRENT_USER
to change it back to the symbolic constantWill bring clarity to the code
Branch instructions direct code execution to another location
The flow of execution i.e. control flow is sequential until a branching instruction is reached
Then the
EIP
is updated and execution is transferred to another location in memoryThe code under review contains two types of jumps
Jumps are an example of a branching instruction
Unconditional jumps always perform a jump
JMP, CALL, RET
Conditional jumps only jump if a condition is met:
JCC, Loop
Conditional jump represents a decision point
Conditional jumps require that we review multiple instructions
To evaluate whether a conditional is true, arithmetic instructions and Boolean are used
sub ecx, 8
Will test if ECX is equal to 8and eax, eax
will test if EAX is equal to zeroIf the result of zero then the
ZF
bit is set in the flags register
Jumps
A
Jcc
instruction will be performed if a jump condition is metForm:
Jcc
Comments
Use the
;
key to add a commentCan add EOL comments, Pre, Post or other types of comments
HTTP Command and Control
These APIS enable HTTP C2
To view the API calls
The code references variables, which holds code or data not known at compile time
Local variables are relevant for the current function and are not saved
Local variables are stored on the stack relative to
ESP
andEBP
Global variables are accessible from all functions e.g.
DAT_00403374
Also static variables can be only used from within the function that allocates it, but unlike local variables it does not get marked for reuse when the function exists
Viewing Function Call Trees
Window --> Function Call Tree
View the outgoing calls on the right side
View is ideal for determining which functions are called from the current function
Once you determine what the current function is being used for make sure to
Rick Click --> Edit Label
and give it a meaningful name
GetTempFileNameW
Creates a file name for a temp file
Can explore other function references to find new IOCs
Look for a
PUSH
tolpPrefixString_XXXXXX
MSFT documentation states the first three characters make up the temp file name prefix
To assist Ghidra:
Functions
A function is a group of instructions that performs a specific task (read, write files, send network data, log keystrokes)
Three Basic Components
Calling a function involves a jump to another memory location
After the function is done execution continues at the instruction after the original function call
Calling a function involves two control transfers
Function format:
return = function(arg0, arg1)
Specific events occur when calling a function
Specific events occur when returning from a function
Within a function, the prologue and epilogue perform setup and cleanup activities
Most functions contain a standard prologue and epilogue
The prologue occurs at the start of the function
Function epilogue occurs at the end of the function
The stack is a section in memory used to store saved registers, local variables and function parameters
The stack is LIFO Last in First out
PUSH
adds an element andPOP
removes oneESP
points to the next item on the stack and changes with instructions likePUSH POP CALL LEAVE RET
EBP a.k.a frame pointer
serves as an unchanging referenceEBP - value = local variable
registers may also be usedEBP + value = parameter
When
EBP
is set up in the function prologue in this manner, it means that when you see code referenceEBP
minus some value i.e.[EBP -8]
it is accessing a local variableWhen its
EBP
plus some value i.e.[EBP +8]
it is referencing a parameter that was passed inWhen cleaning up the stack compilers use some tricks
Compilers may
POP
off a value i.e.POP EDX
which has the result of adding four toESP
It is also very common to see a value added to
ESP
the used of theRET
(which can also pop stuff off the stack, and theleave
instruction
Functions are called according to calling conventions
The convention describes how data is passed into and out of functions
The implementation of the convention may vary by compiler
The
cdecl
convention (most common) has these characteristics
The
stdcall
convention has the following characteristics
Additional calling conventions include fastcall and thiscall
fastcall
Arguments are stored in registers
Any extra arguments are placed on the stack
The callee cleans up arguments on the stack
thiscall
Used in C++ code (member functions)
This convention includes a reference to this pointer
For MSFT compilers, ECX holds the "this" pointer and the callee cleans up the arguments on the stack
For GNU compilers the "this" pointer is pushed onto the stack last and the caller cleans up
Reviewing strings reveals filenames and directories of interest
To Locate a reference to a string right click on it and choose to show references
Loops in malware
Used to encrypt and decrypt network traffic --> loop over each character in the string to send
Attempt to connect to C2 server --> loop over a lists of servers
Perform a port scan --> try to connect to a port 1-65535
Log keystrokes --> Check state for each key code 0...92
Similar to JCC the Cs in
LOOPcc
represent the conditional code that must be met for the loop instruction to branch to the address specifiedThe conditions are:
Reviewing imports to direct our code analysis
The import table lists functions used to access the resource section
The resource
.rsrc
section is often used to store information like icons, dialog boxes, and version informationHowever malware may hide executables here
Malware that drops files is called a
dropper
CreateMutexA
CreateMutexA
--> creates or opens a mutex objectMalware authors often use a mutex to avoid re-infecting a machine
Keylogging
GetKeyState
andGetAsyncKeyState
--> Determine if a particular key is pressedGetWindowText
--> Retrieves text from a windows title barOpenClipboard
,GetClipboardData
, andCloseClipboard
--> Opens the clipboard for access, gathers data, and then closes the clipboardGetWindowText
--> obtains the text of a windows title bar, combined with the two previous APIs an attacker could learn about what keys are pressed and what the application context is.GetAsyncKeyState
determines if a key is currently up or down or if it was pressed since the last call to the API
64 Bit Malware
Vast majority is 32 bit
We will see more 64 bit in the future as they become the standard
Two types of 64 bit malware have been common
Analyze 32-bit malware on 64-bit OS with caution
32 bit code running on a 64 bit operating systems runs in the
WOW64 Subsystem
32 bit executables load 32 bit dlls
32 bit dlls are located in
%SystemRoot%\Syswow64
32 bit processes reference Software hive registry values in
Wow6432Node
using registry redirectionSome executables run subtly different under WoW64 than on a native 32 bit OS
64-Bit Assembly Differences
All general purpose registers are expanded to 64 bits
EAX
-->RAX
There are eight new general purpose registers
R8 --> R15
Special use registers are exted and renamed
EIP --> RIP
RSP
notRBP
is often used to access parameters and variablesCalling convention resembles
fastcall
(parameters via registers)
There is a new addressing mode (
RIP
+ displacement)
Last updated