Qiling Framework: API Call Logging To JSON
Hey there! If you're diving into the world of emulation with the Qiling Framework, you've probably realized how powerful it is. And if you're like me, you also want to grab all the juicy details about the API calls your emulated program is making. You're in the right place! This article focuses on how to capture those API calls and their parameters and store them in a neat, easy-to-use JSON format. Let's make our lives a whole lot easier by avoiding the hassle of parsing through logs manually!
The Challenge: Extracting API Call Information
When you're emulating a binary, you often end up with a flood of information in your logs. Sure, you see the API calls like RegOpenKeyW, lstrlenW, and RegSetValueExW, along with their parameters. But manually sifting through the logs to extract all that information and structure it is a time-consuming task, and we don't want to reinvent the wheel by building a custom parser if a better solution is available. Qiling provides us with a fantastic emulation environment, and we should leverage its features to make the process smoother. The goal here is to get a structured output, preferably in JSON, so we can analyze the behavior of the emulated program systematically.
Understanding the Problem
The example logs you provided illustrate the core of the problem: a series of API calls with their arguments. While humans can read this and understand what's happening, computers need a structured format to make sense of it all. Here's why simply printing to the console isn't ideal:
- Manual Parsing: You'd have to write a script to read the logs, identify the API calls, and extract each parameter. That's a lot of work. Besides the effort, you'll also be tied to a specific format, and if the log format changes, your script breaks. This approach is not scalable, and it's prone to errors.
- Lack of Structure: The raw logs are just plain text. There's no way to easily group related data, search for specific patterns, or perform advanced analysis.
- Scalability Issues: As the number of API calls increases, the manual analysis becomes even more tedious.
The Need for a Structured Solution
A structured format, like JSON, provides the following advantages:
- Machine-Readability: JSON is easy for computers to parse. You can use libraries in almost any programming language to load and process the data.
- Data Organization: You can organize the data in a hierarchical manner, making it easier to understand the relationships between different API calls and their arguments.
- Analysis and Automation: Structured data is perfect for analysis. You can write scripts to search, filter, and analyze the data to understand the program's behavior.
- Reusability: The structured data can be easily used in other tools and processes, such as generating reports, creating visualizations, or integrating with other security analysis tools.
The Solution: Leveraging Qiling's Capabilities
Qiling Framework is already equipped with powerful features for interacting with the emulated environment. Though directly accessing the required data using the report.generate_report(qil) method isn't the direct path to the solution, there are many ways to make it work. The core idea is to intercept the API calls and log the relevant information. Qiling provides the tools to hook into functions, get the arguments, and print them to a custom JSON structure. Let's dig in and see how we can modify your code and enhance your emulation process to get the JSON output you desire. The steps include setting up the emulation environment, hooking API calls, extracting their parameters, and finally, formatting the data in JSON for easy access.
Code Modification: Hooking and Logging
Here's an approach to modify your Python code to intercept and log the API calls and their parameters into a JSON format. We will use ql.hook_call() to trace the calls and then extract the data from the registers. This approach offers flexibility and control over the data captured and the output format.
import os
import argparse
import json
from qiling import *
api_calls = [] # Store API call data
def my_callback(ql, address, params):
"""Callback function to log API calls.
Args:
ql: Qiling object.
address: The address of the function.
params: A dictionary of the parameters.
"""
function_name = ql.os.current_task.thread.context.get_reg_str('pc') # Get function name. This may vary depending on the arch, but it is a starting point.
call_info = {
"function": function_name,
"params": {},
"address": hex(address)
}
# Iterate through parameters to get values, you may need to customize this according to your requirements.
for param_name, param_value in params.items():
try:
# try to decode the parameters using ql.mem.string().
if isinstance(param_value, int):
call_info["params"][param_name] = hex(param_value) # Convert integer to hex
elif isinstance(param_value, str):
call_info["params"][param_name] = param_value # String is ok.
else:
call_info["params"][param_name] = str(param_value) # Other types
except Exception as e:
call_info["params"][param_name] = "Error decoding param: " + str(e) # Catch decoding errors
api_calls.append(call_info)
print(json.dumps(call_info, indent=4)) # Print each call in JSON format
def main(path : str, rootfs : str):
global api_calls
ql = Qiling(path, rootfs)
# Hook all Windows API calls. This may be too broad, so you may want to focus on specific APIs.
# For this, you would want to create a list of functions you want to trace, and hook them one by one.
# Alternatively, you can search for the APIs using ql.os.get_api_address().
ql.hook_call(my_callback, QL_CALL_ALL_WINDOWS)
ql.run()
# Output all API calls to a JSON file after emulation
with open("api_calls.json", "w") as f:
json.dump(api_calls, f, indent=4)
print("[+] Successfully emulated the binary.")
print("[+] API calls saved to api_calls.json")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='DARTS Emulator')
parser.add_argument('input_file', help="Input binary file.")
parser.add_argument('-r', '--rootfs', help="Qiling RootFS path")
args = vars(parser.parse_args())
qiling_rootfs = args['rootfs']
if not qiling_rootfs:
qiling_rootfs = os.path.join(os.getcwd(), "examples", "rootfs", "x86_windows")
bin_file = args['input_file']
print(f">> ROOTFS : {qiling_rootfs}")
print(f">> Binary : {bin_file}")
main([bin_file], qiling_rootfs)
Code Explanation
api_calls = []: This is a list that will store all the API call information in a structured way.my_callback(ql, address, params): This function is called every time a Windows API function is called. This is where the magic happens!function_name: In this example, it tries to get the function name using pc register, however, the implementation of that may vary depending on the architecture and the OS version. The user should check the registers, the Qiling documentation, and possibly reverse engineer the binary to find the actual name. You can find this address in the logs or by debugging the program with a debugger.- Parameter Extraction: The
paramsargument is a dictionary that includes the names and values of the function's arguments. We iterate over this dictionary and extract the data for the parameters. When decoding the parameters, you may encounter different types, such as strings, integers, and memory addresses. The code handles some basic cases. - JSON Output: Each API call's information is converted to JSON using
json.dumps()and printed to the console. This shows the call details as they occur during emulation. The information is then appended to theapi_callslist.
main(path, rootfs): This function sets up the Qiling environment and calls our callback function to hook the API calls. Note the use ofql.hook_call(my_callback, QL_CALL_ALL_WINDOWS). This tells Qiling to call ourmy_callbackwhenever a Windows API function is called. This can also be used withQL_CALL_WINDOWS, but in this case, only the calls that belong to the Windows libraries will be hooked. This approach can be refined to hook specific calls only.- JSON file output: At the end of the emulation process, the data stored in the
api_callslist is written to a JSON file namedapi_calls.json. This provides a complete record of the emulated API calls.
Running the Code
-
Save the Code: Save the modified code as a Python file (e.g.,
qiling_api_logger.py). -
Prepare the RootFS and Binary: Make sure you have a root file system compatible with your target binary. Also, ensure that the binary you want to emulate is available.
-
Run the Script: Execute the script from the command line, providing the binary file and, if needed, the rootfs path:
python qiling_api_logger.py <path_to_binary> -r <path_to_rootfs>Replace
<path_to_binary>with the path to the binary you want to emulate and<path_to_rootfs>with the path to the root file system. If you don't specify the rootfs path, it will try to use the default path. -
Check the output: The script will print the captured API call information in JSON format to the console during emulation and write all the calls to
api_calls.json. Now, you will have a JSON file containing all the API calls, including the arguments and the address.
Customization and Enhancements
The example provides a solid starting point for capturing API calls in JSON format. Here are a few ways you can customize it and enhance its functionality:
- Filtering API Calls: Instead of hooking all API calls, you can focus on specific ones. This will reduce the amount of data captured and make the analysis easier. You could use regular expressions or a list of function names to filter the calls. This involves modifying the
hook_callconfiguration. - Parameter Decoding: Improve the parameter decoding logic. The example provides some basic handling. You can add more specific decoding for different data types (e.g., strings, pointers, structures). The
ql.mem.string()method can be useful for reading strings from memory. - Contextual Information: Include additional contextual information, such as the module name, thread ID, or any other relevant data that helps with analysis.
- Error Handling: Add more robust error handling to deal with potential issues, like invalid memory addresses or unexpected parameter types. This will prevent your script from crashing during emulation.
- Output Formatting: Customize the JSON output format to match your specific needs. You can choose to include more or fewer details, use different indentation levels, or organize the data in a different structure.
- Performance Optimization: For large binaries, consider optimizing the hooking and logging process to minimize performance impact. You might want to implement a more selective hooking strategy or use asynchronous logging to avoid blocking the emulation process.
Conclusion
By following these steps, you can successfully capture API calls and their parameters in a structured JSON format. This will greatly simplify the process of analyzing the behavior of your emulated binaries. The key is to leverage the features provided by the Qiling Framework and customize the hooking and logging mechanisms to fit your specific requirements. This approach makes it easier to understand the binary's behavior, identify potential vulnerabilities, and perform comprehensive security analysis. Remember to adjust the code to your specific needs, and don't hesitate to experiment with different approaches to find the one that best suits your requirements.
External Link:
For more in-depth information about the Qiling Framework, I recommend checking out the official Qiling Framework Documentation. It's a fantastic resource for learning about all the framework's capabilities.