Code Monkey home page Code Monkey logo

koe_spider's Introduction

说明

  • 本项目为使用webmagic搭建的一个koe网站内容爬取工具

用法

  • 编译本项目
  • 配置数据库
  • 运行 java -jar 编译jar包名.jar
  • 程序会自动开始下载,下载目录为 D:/audio/
  • 访问 http://localhost:9090/koe/info 查看详细下载信息

开发环境

  • 语言:Java 8

  • 基础框架:Spring Boot

  • 持久层框架:Jpa

  • IDE:IDEA

  • 依赖管理:Maven

  • 数据库:MySQL5.5

  • 版本管理:SVN,git

  • 数据库:MySQL5.5

  • 版本管理:SVN,git

ps

  • webmagic用来提取网页内容还是挺方便的,支持的提取规则多(xpath, selecter等),在Chrome里可以方便获取这2种规则。同时支持多线程,超时,重试,持久化内容等功能,可能这里只用到了不多webmagic的功能。
  • python 还是比较适合用来写爬虫,代码精简,不用搭建复杂的运行环境,占用系统资源小,开发迅速

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.