Comments (5)
@ForeverZH0204 We hope to release the dataset but are waiting for a review on the paper before release, I will post here when it's out
from malware-prediction-rnn.
thanks a lot!
I also read the paper related.I have several questions to list here and waiting for your are available:
1. how to understand the name 'a particular time into file execution'?
if i have a record with length of 20s, sample it with a interval of 5 seconds, what is the value of 'a particular time into file execution'? I suppose that to be 4, in other word, the actual amount of data used in the model.Is that right?
2. Confusion with figure7 , table X and table XI
As i read in the readme.md and the code, figure7 and table X come from the setting that use the whole training set and then test with a feature(s)-omit test set? And table XI comes from a omition on the whole dataset ? then explore the difference of total process's impact score.
If so, the conclusion of The
'impact score increases relative to others as more features are
omitted, this may indicate that total processes are combined
with other inputs to create discriminating features, though the
input is not highly impactful alone.'
is really hard for me to accept
I hope that i have a mistake.
thanks for your reply!
Wish u a good day.
from malware-prediction-rnn.
Hi @ForeverZH0204 - in answer to your questions:
-
By "a particular time into file execution" we mean the real time since the start of the execution of the sample. We are arguing that more snapshots (i.e. more data) has a higher correlation with accuracy than the real time since the file began executing.
-
You are right, Fig 7 and Table X looks at omission of data in the test set and Table XI looks at omission during training and testing. We are looking at the impact of all the features but in the discussion of the total processes feature, we argue that it's average impact score grows as more features are ommitted (the impact score is the fall in
accuracy / number of features omitted
). For some features, the impact does not really change when just this single feature is omitted, this feature + one other feature, or this feature + 2 other features. This implies that for these features the impact of their omission is not really affected by co-omission of other features. Because it the impact score of "total processes" increases with the number of features omitted at the same time, we believe this indicates that total processes is combined with other features in the RNN to give distinguishing representations between malicious and benign samples. In Table XI the omission of total processes sees one of the biggest falls in accuracy, so we think it is a useful feature for the model but that it's usefulness is realised when combined with other data. We can train further models with different combinations of inputs to test this (but we did not yet for this paper).
Thank you for your questions and I hope that has made it a little more clear - I will work on a presentation of the work which explains these points more clearly.
from malware-prediction-rnn.
thanks for your reply!
But for question 2,if we want to explore the relationship between the difference and a certain variable,I think we need to keep the other conditions unchanged.
from malware-prediction-rnn.
When will exactly data set will be released for further research
from malware-prediction-rnn.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from malware-prediction-rnn.