Code Monkey home page Code Monkey logo

Comments (5)

mprhode avatar mprhode commented on July 4, 2024

@ForeverZH0204 We hope to release the dataset but are waiting for a review on the paper before release, I will post here when it's out

from malware-prediction-rnn.

ForeverRuri avatar ForeverRuri commented on July 4, 2024

thanks a lot!

I also read the paper related.I have several questions to list here and waiting for your are available:

1. how to understand the name 'a particular time into file execution'?

if i have a record with length of 20s, sample it with a interval of 5 seconds, what is the value of 'a particular time into file execution'? I suppose that to be 4, in other word, the actual amount of data used in the model.Is that right?

2. Confusion with figure7 , table X and table XI

As i read in the readme.md and the code, figure7 and table X come from the setting that use the whole training set and then test with a feature(s)-omit test set? And table XI comes from a omition on the whole dataset ? then explore the difference of total process's impact score.
If so, the conclusion of The

'impact score increases relative to others as more features are
omitted, this may indicate that total processes are combined
with other inputs to create discriminating features, though the
input is not highly impactful alone.'

is really hard for me to accept
I hope that i have a mistake.

thanks for your reply!
Wish u a good day.

@mprhode

from malware-prediction-rnn.

mprhode avatar mprhode commented on July 4, 2024

Hi @ForeverZH0204 - in answer to your questions:

  1. By "a particular time into file execution" we mean the real time since the start of the execution of the sample. We are arguing that more snapshots (i.e. more data) has a higher correlation with accuracy than the real time since the file began executing.

  2. You are right, Fig 7 and Table X looks at omission of data in the test set and Table XI looks at omission during training and testing. We are looking at the impact of all the features but in the discussion of the total processes feature, we argue that it's average impact score grows as more features are ommitted (the impact score is the fall in accuracy / number of features omitted). For some features, the impact does not really change when just this single feature is omitted, this feature + one other feature, or this feature + 2 other features. This implies that for these features the impact of their omission is not really affected by co-omission of other features. Because it the impact score of "total processes" increases with the number of features omitted at the same time, we believe this indicates that total processes is combined with other features in the RNN to give distinguishing representations between malicious and benign samples. In Table XI the omission of total processes sees one of the biggest falls in accuracy, so we think it is a useful feature for the model but that it's usefulness is realised when combined with other data. We can train further models with different combinations of inputs to test this (but we did not yet for this paper).

Thank you for your questions and I hope that has made it a little more clear - I will work on a presentation of the work which explains these points more clearly.

from malware-prediction-rnn.

ForeverRuri avatar ForeverRuri commented on July 4, 2024

thanks for your reply!
But for question 2,if we want to explore the relationship between the difference and a certain variable,I think we need to keep the other conditions unchanged.

from malware-prediction-rnn.

vinayakumarr avatar vinayakumarr commented on July 4, 2024

When will exactly data set will be released for further research

from malware-prediction-rnn.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.