.. _here: https://github.com/Cisco-Talos/pyrebox/issues
.. _config_examples: https://github.com/Cisco-Talos/pyrebox/tree/master/mw_monitor/config_examples
.. _readme: https://github.com/Cisco-Talos/pyrebox/tree/master/mw_monitor/third_party/deviare2_db/README.rst
.. _script: https://github.com/Cisco-Talos/pyrebox/tree/master/mw_monitor/third_party/msdn_parser/zynamics_msdn_crawler.py
.. _populate_db.py: https://github.com/Cisco-Talos/pyrebox/tree/master/mw_monitor/populate_db.py
.. _ida_scripts: https://github.com/Cisco-Talos/pyrebox/tree/master/mw_monitor/ida_scripts
.. _readthedocs.io: https://pyrebox.readthedocs.io/en/latest/
.. _questions: https://github.com/Cisco-Talos/pyrebox/issues?utf8=%E2%9C%93&q=is%3Aissue%20label%3Aquestion%20

Malware Monitor
===============

Malware monitor is a set of PyREBox scripts for automatically extracting useful information during malware
analysis. Moreover, it tries to help the analyst in the first phase, by providing insights about how a 
given malware sample deploys its main payload (i.e., unpacking, process injection, process
hollowing, file dropping, file downloading...). Also, it collects various types of information that
can be imported into IDA to enrich the IDB database. Malware monitor consists of several
modules that can be activated/deactivated and configured by editing a json file. Each module produces
several logs in different formats. 

The **api tracer** module allows to trace Windows API function calls, and to automatically extract 
the input and output parameters. An IDA Python script allows to import and visualize this 
information in IDA. 

The **dumper** module allows to dump the memory of a process during its execution. 
This module is configurable by the user, who can choose the best moment to trigger the memory dump.

The **coverage** module collects an execution trace that can be used to colorize basic blocks in IDA.
This features provides the user information about which code paths get executed, and which do not.

Finally, the **memory monitor** module (refered to as *interproc* in the scripts), monitors different 
memory-related operations and events, and also allows to monitor process interaction events, like new 
processes created, memory injection to existing processes, and so on.
This last module is orthogonal to the other three. Since it monitors process creation and opening,
it allows to monitor not only the initial process, but all those related to it. 
For example, if *api tracer* is turned on, and the *memory monitor* detects that the first process
creates a second process, *api tracer* will start monitoring this new process and will generate an 
API call trace for it as well.

Configuration files
-------------------

Malware monitor has two different configuration files:

mw_monitor.conf
***************

Each of the four modules generates several log files. The names of the logs can be configured in this
configuraiton file, that must be accessible from the directory where PyREBox is started. A common option
is to place it in the same folder as the pyrebox.conf file.

This file allows to configure the path and file names for the logs generated by the different
modules. It also allows to determine the file name of the results *bundle*, which is .tar.gz file
containing all the collected results. You can find a self-explanatory configuration file
under the config_examples_ directory.

interproc
^^^^^^^^^

- **bin_log_name.** This file is a binary log (serialized data) of the data collected during memory operation monitoring.
- **text_log_name.** This file is a text log of all the events related to memory monitoring captured during the execution. 
- **basic_stats_name.** This file is a structured text summary of the data collected.

dumper
^^^^^^

- **path.** This option allows to choose the path where we want to dump the memory of the process, loaded dlls, and the rest of the VAD regions that do not overlap the main process memory or any DLL.

coverage
^^^^^^^^

- **cov_log_name.** This file is a binary log of the instruction trace collected. This log can be imported into IDA with a corresponding script.
- **cov_text_name.** This file is a text log that summarizes the instruction trace collected. Each line in this log represents a transition from one VAD region to a different VAD region, and includes both the origin and destiny address.

api_tracer
^^^^^^^^^^

- **text_log_name.** This file is a text log containing the recoded API calls, with or without their parameters (depends on configuration).
- **bin_log_name.** This file is a binary log containing the same information as the text log. This file can be imported into IDA with a corresponding script.


mw_monitor_run.json
*******************

This json file allows to turn on/off each of the modules separately, under the *modules* section.
It also allows to configure different parameters for each module. Malware monitor also provides
sample execution automation, which can be configured in this json file.

general
^^^^^^^

- **files_bundle.** The path, in the host system, of a zip file containing several files (typically a .exe and .dll dependencies). The files inside this zip container will be copied into the guest system, under the *files_path* path.
- **files_path.** The directory in the guest system where the files will be copied at startup.
- **main_executable.** The name of the file to execute, among the ones in the zip file **files_bundle.**
- **api_database.** The path the API database generated with Deviare and the MSDN crawler. 

interproc
^^^^^^^^^

- **basic_stats.** Boolean value that turns on/off the *basic_stats* report generation, which contains a summary of the observed memory operations.
- **bin_log** Boolean value that turns on/off the binary log generation of this module.
- **text_log** Boolean value that turns on/off the text log generation of this module. This log contains a trace of all the memory operations monitored.

dumper
^^^^^^

- **dump_on_exit.** Boolean that determines if the process memory should be dumped when it exits.
- **dump_at.** Value that allows to configure when to dump the process memory. It accepts 3 possible formats: an address, a symbol, and a symbol followed by an address. In the first case, the process memory will be dumped when the control flow reaches a given address under the context of the process. In the second case, the process memory will be dumped when the control flow reaches the symbol specified (generally, an specific API call). The third option will dump the process memory when the process calls an API function, specifically from a given address.

.. code-block:: none
   0x00400000
   user32.dll!CharNextW
   user32.dll!CharNextW!0x00400000

coverage
^^^^^^^^

- **procs.** A list of strings that specifies the process names of the processes which should be traced in order to generate a coverage file. If a none value or an empty list are specified, all the monitored processes (the initial one, and any related process) will be recorded.

api_tracer
^^^^^^^^^^

- **bin_log.** Boolean value that allows to turn on/off the generation of the binary log. 
- **text_log.** Boolean value that allows to turn on/off the generation of the text log.
- **light_mode.** Boolean value that allows to turn on/off the light mode. Under light mode, function call arguments are not dereferenced, resulting in an slightly faster execution of the guest system.
- **exclude_apis.** A list of API functions to exclude from being logged.
- **exclude_modules.** List of module names to exclude from being traced. Any call to a function in a module in this list will not be logged.
- **exclude_origin_modules.** List of module names to exclude from being traced. Any call originating from a module in this list, will not be logged.
- **include_apis.** A list of API functions to include in the trace, even if the module where it is located is in some exclusion list. This finer-granularity option overrides any exclusion rule.
- **procs.** A list of strings that specifies the process names of the processes which should be traced in order to generate a coverage file. If a none value or an empty list are specified, all the monitored processes (the initial one, and any related process) will be recorded.


IDA Python scripts
------------------

We provide IDA Python scripts under the ida_scripts_ directory. There are 2 main scripts:

- **mw_monitor_coverage.py**. Allows to read the coverage binary log and to colorize the basic blocks that have been executed.
- **mw_monitor_ida_functions_rename.py**. Opens a new tab in IDA that allows to load the api tracer binary log and to visualize the API calls traced, as well as their origin and destiny addresses and parameters.

In order to run these scripts, you will need to copy the entire mw_monitor directory to a path that must be accessible
from your IDA setup. These IDA scripts have several dependencies under the mw_monitor/ directory of this project.

API tracer database
-------------------

The API tracer relies on an sqlite database in order to inspect automatically API parameters. This database can be 
generated with a combination of the Deviare project, the MSDN crawler published by Zynamics, and a custom script
that allows to integrate both data sources into the sqlite database that malware monitor uses. 

In order to generate the database, you will first need to clone the Deviare project (https://github.com/nektra/deviare2)
and slighly modify the DbGenerator subproject to produce an sqlite database. See the readme_ file for information
about which files must be patched. Then, run the DbGenerator project for the corresponding version (32 or 64 bit)
windows machine, to generate the initial sqlite database.

This database still lacks information about which parameters are input parameters, and which are output parameters. This
information can be obtained from the MSDN. In order to parse the MSDN, use the provided script_. This script is based on the
msdn_crawler script published by Zynamics. This modified script will produce an xml file with information for each API 
documented in the MSDN.

Finally, the last step involves running the populate_db.py_ script, in order to populate the sqlite database with the
information extracted with the MSDN crawler.

Documentation
-------------

This documentation is also hosted toguether with the main PyREBox documentation at readthedocs.io_.

Bugs, questions and support
---------------------------

If you think you've found a bug, please report it here_.

Before creating a new issue, please go through the questions_ opened by other users before.

This program is provided "AS IS", and no support is guaranteed. That said, in order to help
us solve your issues, please include as much information as possible in order to reproduce the bug:

- Operating system used to compile and run PyREBox.
- The specific operating system version and emulation target you are using.
- Shell command / script / task you were trying to run.
- Any information about the error such as error messages, Python (or IPython) stack trace, or QEMU stack trace.
- Any other relevant information