Code Monkey home page Code Monkey logo

ts-finder's Introduction

TS-Finder: privacy enhanced web crawler detection model using temporal–spatial access behaviors

I am Rui Chen, the author of Ts-finder, from Dalian University of Technology, a PhD candidate. My linkedin link is: https://www.linkedin.com/in/rui-chen-aa0a932a4/

This paper was published in The Journal of Supercomputing, can found in: https://link.springer.com/article/10.1007/s11227-024-06133-6

To address the scarcity of crawler detection solutions issues, we are fully opening our crawler detection code to the public !

From this source code, the detection model was implemented by pyTorch, and we also have implemented this detection model used by scala (Spark Framework).

We have identified that crawler detection models are crucial for websites and other network service providers, who typically possess sophisticated big data frameworks (like Spark Framework). Therefore, we have rewritten the model in Scala to reduce the cost of utilizing this model for these providers. The code comments are writen by Chinese, and they will be translated into english when I have more time [Thank you so much].

Abstract

Background: Web crawler detection is critical for preventing unauthorized extraction of valuable information from websites.

Current issues that need to be solved urgently: Current methods rely on heuristics, leading to time-consuming processes and the inability to detect novel crawlers. Privacy protection and communication burdens during training are overlooked, resulting in potential privacy leaks.

Our methods: To address these issues, we propose a federated deep learning crawler detection model that analyzes access behaviors while preserving privacy. First, individual clients locally host website data, while the central server aggregates information for detection model parameters, eliminating raw user data transmission or access. We then develop an innovative algorithm constructing access path trees from user logs, effectively extracting temporal and spatial behavior features. Additionally, we propose a novel time series model with fused additive attention, enabling effective web crawler detection while preserving privacy and reducing data transmission.

Finally, comprehensive evaluations on public datasets demonstrate robust privacy protection and effective detection of emerging crawler types.

ts-finder's People

Contributors

nihaoray avatar

Stargazers

 avatar zkq avatar  avatar Xiaoyu Chen avatar

Watchers

 avatar

ts-finder's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.