ETW: NT KERNEL LOGGER AND CALL STACKS

|READ_TIME: 13 MIN

In this post, we are going to explore how to obtain process call stacks to later analyze if they are using evasion techniques such as Direct or Indirect Syscalls. To achieve this, we will build a sensor that collects this telemetry via ETW. If you are not familiar with how ETW works, I recommend taking a look at my previous post where I explain the basics.

Which provider did I use?

In the Windows ecosystem, hundreds of ETW providers coexist, each reporting a multitude of events, which can be overwhelming when trying to locate a specific behavior.

To detect and analyze direct and indirect system calls (syscalls), I have selected three fundamental options that present different approaches and capabilities:

  • Microsoft-Windows-Threat-Intelligence: This provider has established itself as the standard in modern EDRs. Its design allows for collecting telemetry on known malicious behaviors, such as remote memory manipulation or the use of apc_queuing. Its main limitation is that its access is restricted. It requires the consuming process to have PPL (Protected Process Light) level privileges and an Antimalware certificate, which complicates its use.

  • Microsoft-Windows-Kernel-Audit-API-Calls: This is a specialized option that focuses on monitoring some of the kernel API calls. It is very useful for auditing the execution flow of certain functions but offers less visibility into system events. Its scope is more granular and specific than that of other providers.

  • NT Kernel Logger: Unlike the previous ones, the NT Kernel Logger is not a common provider, but is categorized as a System Logger. It is the quintessential instrumentation tool for obtaining a global view of the system (processes, threads, disk and network I/O). However, its configuration is different and presents greater complexity.

Although all three allow for capturing the call stack, I have decided to use the NT Kernel Logger.

I am aware that it is not the ideal option for a production environment due to the massive amount of events it generates and its impact on the CPU. However, since this is a research project and a personal lab, it offers me total visibility of the system. In addition, one of my goals is to learn how to optimize code and manage large volumes of data, so facing this challenge seems like the perfect opportunity for me.

Later I intend to also integrate Microsoft-Windows-Threat-Intelligence and I will make another entry when that happens.


Activating NT Kernel Logger

Configuring the NT Kernel Logger is not as straightforward as subscribing to a normal provider. Being a special session (System Trace Session), I had to struggle with several configuration parameters of the Windows APIs until I got what I was looking for.

Choosing the events

The first thing was to define which events I wanted to monitor. For a system logger, the available events are defined within the EVENT_TRACE_PROPERTIES structure, specifically in the EnableFlags field. They can also be consulted in the Windows documentation under the MSNT_SystemTrace class.

After reviewing the options, I identified that for my use case the minimum indispensable was to use EVENT_TRACE_FLAG_SYSTEMCALL. This group of events is generated by the PerfInfo class and encompasses two types:

  • SysCallEnter: Fires exactly when a thread invokes a kernel function. It gives us the address of the syscall instruction that has been executed.

  • SysCallExit: Is generated when the operation ends in the kernel and returns to user mode. It mostly serves to measure execution times.

From our perspective, the most interesting one is the first: SysCallEnter.

Other events that may be useful to us apart from EVENT_TRACE_FLAG_SYSTEMCALL are Process (EVENT_TRACE_FLAG_PROCESS), thread (EVENT_TRACE_FLAG_THREAD), and image load (EVENT_TRACE_FLAG_IMAGE_LOAD) events.

Activation and buffer management

Once the events have been identified, the next thing is to activate the provider. For this we use the StartTraceW function, filling the EVENT_TRACE_PROPERTIES structure. In general, the parameters are standard for any ETW provider, with these exceptions:

  1. The Wnode.Guid field must contain the value SystemTraceControlGuid.

  2. Given the large volume of data, the buffers must have adequate values to avoid loss of events (events lost). My recommendations after several tests are:

    • BufferSize: Controls the size of each buffer in KB. A value of 64 KB is the sweet spot between granularity and performance.

    • MinimumBuffers: Minimum number of buffers reserved at the start. A value of 20 is usually enough to absorb the initial peak.

    • MaximumBuffers: Maximum memory limit. I have configured it to 128 to give some margin.

    • FlushTimer: In real-time mode, setting it to 1 second ensures smooth telemetry.

  3. As we have mentioned before, the EnableFlags parameter must contain the value EVENT_TRACE_FLAG_SYSTEMCALL.

Alignment

As stated in the Remarks section of the official documentation for EVENT_TRACE_PROPERTIES, the WNODE structure must be correctly aligned since the session and log file names must live in the same memory block as the structure itself, and Windows accesses these using offsets.

Implementation example

To make it a bit clearer how all that translates to code, I have created the following example:

rust
pub fn start_nt_kernel_logger() -> Result<CONTROLTRACE_HANDLE, u32> {
    // Define the reserved GUID for the NT Kernel Logger
    // SystemTraceControlGuid: {9e814aad-3204-11d2-9a82-006008a86939}
    let system_trace_control_guid = GUID::from_values(
        0x9e814aad, 0x3204, 0x11d2, [0x9a, 0x82, 0x00, 0x60, 0x08, 0xa8, 0x69, 0x39],
    );

    // Prepare the session name in UTF-16 format (required by APIs ending in 'W')
    // The NT Kernel Logger strictly requires to be called "NT Kernel Logger"
    let logger_name = "NT Kernel Logger";
    let mut name_wide: Vec<u16> = OsStr::new(logger_name).encode_wide().collect();
    name_wide.push(0); // Null-terminator

    let name_len_bytes = name_wide.len() * std::mem::size_of::<u16>();
    let struct_size = std::mem::size_of::<EVENT_TRACE_PROPERTIES>();
    
    // Calculate the total size of the contiguous buffer:
    // Structure size + Session name size + (Optional) Log name size
    let total_size = struct_size + name_len_bytes;

    // Reserve aligned memory initialized to zeros
    let mut buffer: Vec<u8> = vec![0; total_size];
    
    // Get a mutable pointer to the structure within our buffer
    let props_ptr = buffer.as_mut_ptr() as *mut EVENT_TRACE_PROPERTIES;

    unsafe {
        // --- WNODE_HEADER CONFIGURATION ---
        (*props_ptr).Wnode.BufferSize = total_size as u32;
        (*props_ptr).Wnode.Flags = WNODE_FLAG_TRACED_GUID;
        (*props_ptr).Wnode.ClientContext = 1; // 1 = QPC (Query Performance Counter)
        (*props_ptr).Wnode.Guid = system_trace_control_guid;

        // --- ETW SESSION CONFIGURATION ---
        (*props_ptr).LogFileMode = EVENT_TRACE_REAL_TIME_MODE;
        
        // Buffer adjustment
        (*props_ptr).BufferSize = 64;           // 64 KB per buffer
        (*props_ptr).MinimumBuffers = 20;       // Minimum pre-allocated buffers
        (*props_ptr).MaximumBuffers = 128;      // Maximum buffers limit
        (*props_ptr).FlushTimer = 1;            // Force flush every 1 second
        
        // Here we indicate which events we want
        (*props_ptr).EnableFlags = EVENT_TRACE_FLAG_SYSTEMCALL;
        
        // Tell Windows where the string with the session name starts
        (*props_ptr).LoggerNameOffset = struct_size as u32;
        
        // Copy the UTF-16 string bytes right at the end of the structure
        ptr::copy_nonoverlapping(
            name_wide.as_ptr(),
            buffer.as_mut_ptr().add(struct_size) as *mut u16,
            name_wide.len(),
        );

        // --- SESSION START ---
        let mut handle = CONTROLTRACE_HANDLE { Value: 0 };
        
        // Call the Windows API.
        // We pass the handle it should return, the name, and our prepared buffer.
        let status = StartTraceW(&mut handle, name_wide.as_ptr(), props_ptr);
        ...
    }
}

Activating "Stack Walking"

Once we have the provider active, the next step is to ask it to send us the call stacks associated with the events we have requested. To do this, we must call the TraceSetInformation function right after having called StartTraceW, as indicated in the documentation.

The TraceSetInformation function asks us for several parameters, including:

  • TraceId: Handle returned as the first parameter to StartTraceW.
  • InformationClass: The identifier of what we want to activate or query. For our use case, it is TraceStackTracingInfo or value 3.
  • TraceInformation: The identifier of the event for which we want it to send us the call stack.

For the last parameter TraceInformation, we must fill the CLASSIC_EVENT_ID structure, which looks like this:

cpp
typedef struct _CLASSIC_EVENT_ID {
  GUID  EventGuid;
  UCHAR Type;
  UCHAR Reserved[7];
} CLASSIC_EVENT_ID, *PCLASSIC_EVENT_ID;

According to the documentation, you must go to NT Kernel Logger constants, look for the class and look for the event. In our case, it is the same SysCallEnter event from PerfInfo, as we identified at the beginning of the blog.

With all this in mind, the code for this part would look something like this:

rust
// Define the necessary constants for SysCallEnter
// PerfInfo GUID: {ce1dbfb4-137e-4530-91bb-3d304035e14e} 
const PERF_INFO_GUID: GUID = GUID::from_values( 
	0xce1dbfb4, 
	0x137e, 
	0x4530, 
	[0x91, 0xbb, 0x3d, 0x30, 0x40, 0x35, 0xe1, 0x4e] 
); 
const EVENT_TYPE_SYSCALL_ENTER: u8 = 0x33;

pub fn start_nt_kernel_logger() -> Result<CONTROLTRACE_HANDLE, u32> {
	...
	
	// --- SESSION START ---
	let mut handle = CONTROLTRACE_HANDLE { Value: 0 };
	
	// Call the Windows API.
	// We pass the handle it should return, the name, and our prepared buffer.
	let status = StartTraceW(&mut handle, name_wide.as_ptr(), props_ptr);
	
	if status == 0 {
		// If the session started successfully, we configure Stack Walking
		// We want the kernel to capture the stack every time a SysCallEnter (0x33) occurs.
		let mut stack_id_info = CLASSIC_EVENT_ID {
			EventGuid: PERF_INFO_GUID,
			Type: EVENT_TYPE_SYSCALL_ENTER,
			Reserved: [0; 7],
		};
	
		unsafe {
			let set_info_status = TraceSetInformation(
				handle,
				TraceStackTracingInfo, // Value 3
				&stack_id_info as *const _ as *const c_void,
				std::mem::size_of::<CLASSIC_EVENT_ID>() as u32,
			);
			...
		}
	}
}

Consuming events from NT Kernel Logger

Once we have our provider activated and configured, it's time to start consuming events. To do this, we must first open the ETW session we just configured, which we achieve with the OpenTraceW function.

This function receives as a parameter a structure of type EVENT_TRACE_LOGFILEW. The most relevant fields of this structure that we must configure are the following:

  • LoggerName: The name of the session we want to connect to. In our case, it must be exactly "NT Kernel Logger".

  • ProcessTraceMode: Defines how we want to receive the data. For our real-time analysis, we will use the PROCESS_TRACE_MODE_REAL_TIME flag. It is also recommended to add PROCESS_TRACE_MODE_EVENT_RECORD to work with the modern event format. Additionally, for this specific use case, it is recommended to also use PROCESS_TRACE_MODE_RAW_TIMESTAMP; later we will see why.

  • EventRecordCallback: Is the pointer to our "callback" function. Every time the kernel generates an event (a syscall, a new process, etc.), Windows will call this function passing it a pointer to EVENT_RECORD.

  • Context: An optional pointer to any of our own data structures that we want to retrieve within the callback. It is extremely useful for passing data or internal states of our application without using global variables.

Once we have opened the session with OpenTraceW, it returns a handle that we must pass to the next function: ProcessTrace. ProcessTrace is a blocking function. It will "hijack" the current thread while the session is active, and therefore it is common to launch it in a separate thread. While the thread is hijacked, events will continue to arrive at our callback.

The Rust implementation of this part would be as follows:

rust
// Define the Callback that will process each event
unsafe extern "system" fn record_callback(record: *mut EVENT_RECORD) {
    let event_guid = (*record).EventHeader.ProviderId;
    let event_type = (*record).EventHeader.EventDescriptor.Opcode;

    // Here we would filter by PERF_INFO_GUID and type 0x51 (SysCallEnter)
    if event_guid == PERF_INFO_GUID && event_type == 0x51 {
        // Syscall analysis logic
        ...
    }
}

pub fn consume_trace(handle: CONTROLTRACE_HANDLE) {
    let mut log_file: EVENT_TRACE_LOGFILEW = unsafe { std::mem::zeroed() };
    
    let mut session_name: Vec<u16> = OsStr::new("NT Kernel Logger")
        .encode_wide()
        .collect();
    session_name.push(0);

    log_file.LoggerName = PWSTR(session_name.as_mut_ptr());
    log_file.ProcessTraceMode = PROCESS_TRACE_MODE_REAL_TIME 
		| PROCESS_TRACE_MODE_EVENT_RECORD 
		| PROCESS_TRACE_MODE_RAW_TIMESTAMP;
    log_file.Anonymous1.EventRecordCallback = Some(record_callback);

    unsafe {
        // Open the session
        let trace_handle = OpenTraceW(&mut log_file);
        // Create a new thread
        let handle = std::thread::spawn(move || {
	        if trace_handle.Value != !0 {
	            // Start processing 
	            let _ = ProcessTrace(&[trace_handle], None, None);
	        }
	    });
	    ...
    }
}

Call stack events

I mentioned before that we should add the PROCESS_TRACE_MODE_RAW_TIMESTAMP flag for convenience in the future, and the fact is that System Logger providers like NT Kernel Logger work a bit differently from the rest. In a conventional provider (meaning those modern providers that use an XML manifest), the call stack is included in the same event that reaches the consumer. However, in a System Logger, the kernel generates an independent additional event that we will later have to correlate with the original event.

The problem lies in that correlation, since call stack events come by default with the timestamp in raw (CPU cycles), while normal events usually come formatted in system time. This makes it impossible to compare the exact time both events were generated and, therefore, correlation becomes impossible. By using PROCESS_TRACE_MODE_RAW_TIMESTAMP, we force all events to use the same base clock (QPC - Query Performance Counter).

With the same format in both events (the one from SysCallEnter and the one from the call stack), we can now link them using the PID, the TID, and the exact timestamp of the event.

As good practice dictates, it is advisable to do this outside the record_callback function. If we start parsing call stacks and searching in correlation dictionaries within the callback itself, we will block the consumption of new events. If the system is under high load, we will fill the ETW buffers and Windows will start discarding events (dropped events). The ideal is to extract the data quickly, copy it to our own structure, and send it to another thread to do the heavy lifting.

The Rust code for this part (using an mpsc channel and the Context field) would be as follows:

New consume_trace() function

rust
pub fn consume_trace(handle: CONTROLTRACE_HANDLE) {
    // Create the communication channel between the ETW thread and our processor
    let (tx, rx) = mpsc::channel::<ParsedEvent>();
    let mut log_file: EVENT_TRACE_LOGFILEW = unsafe { std::mem::zeroed() };
    let mut session_name: Vec<u16> = OsStr::new("NT Kernel Logger")
        .encode_wide()
        .collect();
    session_name.push(0);
    
    log_file.LoggerName = PWSTR(session_name.as_mut_ptr());
    log_file.ProcessTraceMode = PROCESS_TRACE_MODE_REAL_TIME 
        | PROCESS_TRACE_MODE_EVENT_RECORD 
        | PROCESS_TRACE_MODE_RAW_TIMESTAMP;
    log_file.Anonymous1.EventRecordCallback = Some(record_callback);
    
    // Pass the Sender pointer to Context to be able to use it in the callback
    log_file.Context = &tx as *const _ as *mut std::ffi::c_void;

    unsafe {
        let trace_handle = OpenTraceW(&mut log_file);
        
        // Thread 1: Hijacked by ProcessTrace pushing events to the channel
        let _etw_thread = thread::spawn(move || {
            if trace_handle.Value != !0 {
                let _ = ProcessTrace(&[trace_handle], None, None);
            }
        });

        // Thread 2: The current thread (or a new one) acting as Correlator
        let mut pending_syscalls: HashMap<(u32, i64), ParsedEvent> = HashMap::new();

        // Consume the events 
        for event in rx {
            if event.provider_id == PERF_INFO_GUID {
                // Store the syscall waiting for its stack
                pending_syscalls.insert((event.tid, event.timestamp), event);
            } else if event.provider_id == STACK_WALK_GUID {
                // Check if we have a previous syscall with the same TID and Timestamp
                if let Some(syscall) = pending_syscalls.remove(&(event.tid, event.timestamp)) {
                    // If we find it, we would process the memory addresses of the stack
                    ...
                }
            }
        }
    }
}

New record_callback() function

rust
// Define a safe structure to move between threads.
#[derive(Debug)]
pub struct ParsedEvent {
    pub provider_id: GUID,
    pub opcode: u8,
    pub pid: u32,
    pub tid: u32,
    pub timestamp: i64, // Raw timestamp
    ...
}

unsafe extern "system" fn record_callback(record: *mut EVENT_RECORD) {
    let header = &(*record).EventHeader;
    let event_guid = header.ProviderId;
    let event_type = header.EventDescriptor.Opcode;
    
    // Extract the channel Sender that we passed through the Context field of EVENT_TRACE_LOGFILEW
    let sender_ptr = (*record).UserContext as *mut Sender<ParsedEvent>;
    if sender_ptr.is_null() { return; }
    let sender = &*sender_ptr;

    // PerfInfo (Syscalls) or StackWalkGuid (Call Stacks)
    // 51 in decimal is 0x33 in Hex (SysCallEnter)
    if (event_guid == PERF_INFO_GUID && event_type == 0x33) || event_guid == STACK_WALK_GUID {
        
        let parsed = ParsedEvent {
            provider_id: event_guid,
            opcode: event_type,
            pid: header.ProcessId,
            tid: header.ThreadId,
            timestamp: header.TimeStamp.QuadPart,
        };

        // Send the event to the processor thread asynchronously. 
        let _ = sender.send(parsed);
    }
}