This is a set of shell scripts and a sample feature vector collection application to help automate the training and testing of dynamic machine learning malware classifiers. Arch Linux is the main testing platform.
- Populate
TestSuite/Training
and/orTestSuite/Testing
with APK files with the naming format<M/B><Number>-<Name>.apk
. Where
<M/B>
represents the classification of the application (malicious or benign)<Number>
represents an optional, arbitrary number to help with identification and sorting of different types of applications, and<Name>
is the name or identifier of the application.- An example would be
B001-FooApp.apk
.
- Set the devices and number of emulators in
TestSuite/collect-data.sh
. - Run
collect-data.sh <Testing/Training>
to start profiling applications and collecting feature vectors. - Feature vectors will be saved to
arff/
and the machine learning classifiers will be accordingly trained and tested witharff/weka.sh
The feature vector collection application called Antimalware and is an Eclipse project. See below for a short section on increasing Eclipse's memory if you are trying to load it in Eclipse. The collected data is stored on an sdcard on the device.
.
├── Antimalware - The data collection application
│ ├── libs - The modified Weka library
│ └── src
├── arff - Collected feature vectors and classifiers
├── Results
└── TestSuite
├── AVDs
├── Device-Images
├── logs
├── Testing - Applications
└── Training - Applications
Importing the Antimalware Android project into Eclipse is simple. However, Eclipse's memory needs to be increased to load the Weka library used. First, find eclipse.ini in your system.
$ sudo find / -name 'eclipse.ini'
/usr/share/eclipse/eclipse.ini
Then edit it and increase the memory settings:
$ vim /usr/share/eclipse/eclipse.ini
[...]
--launcher.XXMaxPermSize
2048m
[...]
--launcher.defaultAction
[...]
-Xms1024m
-Xmx2028m
[...]