alexander-h-liu / malconv-pytorch Goto Github PK
View Code? Open in Web Editor NEWPytorch implementation of MalConv
License: MIT License
Pytorch implementation of MalConv
License: MIT License
pytorch version: 0.4.0
pandas version: 0.24.1
python3 train.py config/example.yaml 123
-->
Experiment:
example_sd_123
Training Set:
Total 2 files
Malware Count : 1
Goodware Count: 1
Validation Set:
Total 2 files
Malware Count : 1
Goodware Count: 1
Traceback (most recent call last):
File "train.py", line 151, in
history['tr_loss'].append(loss.cpu().data.numpy()[0])
IndexError: too many indices for array
Hi
When I run your repo (training code), dataloader reads file name from a file that keeps file name and label as you instructed. But I got a problem with this when it cannot read a file name that is composed of mixed lower and upper case characters. I am not sure why it is the case. When I changed the file name to all lower cases or upper cases, it could read and loaded this file. But I could not change for all files since thousands of files have name with mixed lower and upper case characters.
How can I modify your code to read all files with mixing upper and lower case characters?
Thanks
Current yaml.load
has a depreciated behavior:
train.py:21: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
conf = yaml.load(open(config_path,'r'))
Line 21 in 939cb59
The fix is given there https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation, and the previous line should be replace by the following:
conf = yaml.load(open(config_path,'r'), Loader=yaml.FullLoader)
I might be because of the newest version of pytorch or something else, but loss.cpu().data.numpy()
is no longer a list and thus should be changed.
Before:
Line 151 in 939cb59
Line 183 in 939cb59
Now
history['tr_loss'].append(loss.cpu().data.numpy())
history['val_loss'].append(loss.cpu().data.numpy())
python3 train.py <config_file_path> <random_seed>
Is there any constrains or rule to set the random seed?
Hey together,
i run
(myenv) mnoppel@srv:~/projects/MalConv-Pytorch$ python3 train.py config/example.yaml 123
Usage: python3 run_exp.py <config file path> <seed>
like explained in the readme.md.
Any idea what might be wrong?
Kind regards,
Max
The class MalConv
in this repo appears to be missing the ReLU activation after applying the first fully-connected layer.
From page 10 of "Malware Detection by Eating a Whole EXE," the authors state that they use the ReLU activation.
Our final MalConv architecture used the common ReLU activation function. We did tests on other activations such as ELU (Clevert, Unterthiner, and Hochreiter 2016), Leaky ReLU (Maas, Hannun, and Ng 2013), and PReLU (He et al. 2015b). While not detrimental, we found no positive impact from their inclusion.
The diagram on Figure 6 does not state that it uses ReLU, but this appears to be an oversight.
Additionally, we know that a nonlinear activation function must be used between the two fully-connected layers because without a nonlinearity, the composition of two linear layers is equivalent to a single linear layer. This is because linear functions are closed under composition.
These authors have implemented MalConv (https://github.com/endgameinc/ember/blob/master/malconv/malconv.py); in their paper (https://arxiv.org/abs/1804.04637), they write
As a comparative study, we trained MalConv on the raw binaries underlying the dataset. We used the model architecture and training setup as prescribed and verified by the authors, except that we train with a batch size of 100 instead of 256 due to GPU memory constraints."
So we can be confident that the Endgame, Inc. implementation is the network that the MalConv authors intended. The Endagme, Inc. implementation includes a ReLU layer after the first FC layer.
Thank you for the work.
One simple thing that is missing is the requirements.txt
that lists all necessary
python packages. To my experience running your package, and from a simple grep "import" -r *
,
here is what is required:
pyyaml
torch
numpy
pandas
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.