Comments (10)
Hello,
which features are you using? Are you passing MalConv the file as list of bytes?
from secml_malware.
Hello, I am passing the binary file to End2EndModel.bytes_to_numpy(code, net.get_input_max_length(), 256, False) whose return value is then passed into Malconv.
from secml_malware.
Can you paste here the code that you are using?
Because I tested the notebook several times with several malware, and it works.
So there is something that I am missing here
from secml_malware.
import os
import magic
from secml.array import CArray
from secml_malware.models.malconv import MalConv
from secml_malware.models.c_classifier_end2end_malware import CClassifierEnd2EndMalware, End2EndModel
net = MalConv()
net = CClassifierEnd2EndMalware(net)
net.load_pretrained_model()
folder = "secml_malware/data/malware_samples/test_folder"
X = []
y = []
file_names = []
for i, f in enumerate(os.listdir(folder)):
path = os.path.join(folder, f)
# microsoft binaries do not have these properties in them
#if 'petya' not in path:
# continue
#if "PE32" not in magic.from_file(path):
# continue
with open(path, "rb") as file_handle:
code = file_handle.read()
x = End2EndModel.bytes_to_numpy(
code, net.get_input_max_length(), 256, False
)
_, confidence = net.predict(CArray(x), True)
# I am getting a confidence level of [0.5, 0.5] here
if confidence[0, 1].item() < 0.5:
continue
print(f"> Added {f} with confidence {confidence[0,1].item()}")
X.append(x)
conf = confidence[1][0].item()
y.append([1 - conf, conf])
file_names.append(path)
from secml_malware.
So, if these file are not PE32 or PE32+, then I understand why MalConv is not predicting.
Can you share to me one sample example? Because if they are not full binaries, then it is not going to work.
What is the output of the bytes_to_numpy
function? If it does not start with the MZ signature, there is something wrong with the data.
from secml_malware.
Yeah I think Microsoft said in their documentation that they do not include headers of the malware. The documentation is here: https://arxiv.org/pdf/1802.10135.pdf You can find the dataset here: https://www.kaggle.com/competitions/malware-classification/data. I am not sure what the MZ signature is but the first 30 number of the output of the bytes_to_numpy: [[49., 48., 48., 48., 49., 48., 48., 48., 32., 54., 65., 32., 70., 70., 32., 54., 56., 32., 65., 51., 32., 49., 54., 32., 48., 48., 32., 49., 48., 32.]
from secml_malware.
So it's not applicable, that is not a binary, as clearly stated by the data paragraph of the classification challenge.
from secml_malware.
For each file, the raw data contains the hexadecimal representation of the file’s binary content, without the header (to ensure sterility). The data does include the binary files, just without the headers.
from secml_malware.
Yeah, so it is not a real binary.
from secml_malware.
Never mind, that makes sense. Thank you!
from secml_malware.
Related Issues (20)
- How to run lightGBM and SOREL model using secml_malware? HOT 2
- No data preprocessing for SorelNet? HOT 2
- Error while running the sample attack code from blackbox_tutorial.ipynb HOT 4
- real sample generation HOT 5
- can't attack EMBER model HOT 1
- Differences Between WhiteBox Attacks HOT 7
- Adding support for QuoVadis models HOT 2
- AttributeError: 'NoneType' object has no attribute 'dos_header' HOT 4
- No such file or directory: 'secml_malware/data/malware_samples/test_folder' HOT 3
- lightGBM and SOREL model weights? HOT 1
- Support for ensemble models HOT 1
- SOREL ATTACK HOT 1
- CGammaSectionsEvasionProblem attack budget HOT 6
- FGSM Attacking Running for days HOT 1
- Train models HOT 5
- Fix numpy retrocompatibility for CClassifierEmber
- issue installing secml-malware with pip with python 3.12
- Wrong ember prediction
- GAMMA section injections might load sections at random
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from secml_malware.