Code Monkey home page Code Monkey logo

Comments (4)

pkumza avatar pkumza commented on May 28, 2024

Good question.
In fact, LibRadar can be divided into two parts including 'clustering' part and 'instant detection' part.
I downloaded more than 1 million apps and extract static features from them. Then I clustered them into groups and record the groups that have more than 1000 items, and take them as Lib. Some volunteers and I tagged some items so we got tgst5.dat. You can refer to LibRadar - ICSE 2016 for more details.
I don't think that is there anyone want to use the 'clustering' part code to do this work again because it costs a dozen servers for a month to create these data. At the same time, the code I used are ugly and they are just like patches and patches = _ =.
Therefore, I released the instant detection part onto my github as LibRadar. I hope that's enough for users.

from libradar.

bichselb avatar bichselb commented on May 28, 2024

Ok. Thank you so much for you fast reply!

from libradar.

jevinskie avatar jevinskie commented on May 28, 2024

I just found your awesome project this week. Separating first party and third party code is exactly what I have been looking for! My previous whitelist approach, as your paper clearly shows, is a losing approach.

I am very interested in the clustering code, however unpolished, since it would allow myself and others to extend and maintain the instant detection database. It would be very cool if there was a way to capture the manual part of the tagging under version control so it can be reused, extended, and updated in a collaborative fashion. At least for my use case, dozens of servers are not a disqualifying requirement.

from libradar.

pkumza avatar pkumza commented on May 28, 2024

@jevinskie Glad to hear that.
Project https://github.com/pkumza/lib-detector is the way to generate raw_data. Unpolished though and difficult to use.
In dev branch in https://github.com/pkumza/LibRadar, I used 5 steps to filter and tag raw_data into tgst5.dat.

By the way, APK files I used to generate data are becoming old and this approach is losing coverage too. Therefore, I am trying to create a new version of LibRadar to update the data automatically as I put new apps into this machine.

from libradar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.