Code Monkey home page Code Monkey logo

jpider's Introduction

Jpder

功能:主要用于网站信息的简单爬取入库可视化等功能 作者:朱应钦

前期准备

Django Requests Jsonpath lxml

使用说明

进入首页输入提示信息,URL和post/get不可为空,其余均可为空。 User—Agent自动生成

URL翻页

url{}+(0,226,25) {}中添加数值【0为初始量,226为最终量,25为每次增加的量】
例如豆瓣图书的爬取
url = https://book.douban.com/top250?start={}+(0,226,25)

post/get

输入post or get即可

Referer

例如:拉勾网的爬取
https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=

Data翻页

例如:拉勾网的爬取
{'first':true,'pn':+(0-226-25),'kd':'python'}
(0-226-25)【0为初始量,226为最终量,25为每次增加的量】

cookies

同Referer,浏览器截取即可

进入xpath阶段

点击提交后,进入信息获取界面, 基础源码的生成方便用于调试阶段 若传入html值,输入xpath值则可

豆瓣网站爬取案例
URL = https://book.douban.com/top250?start={}+(0,226,25)
Post/Get = get
点击提交
基本位置://tr[@class="item"]
核心字段:td/div/a/@title
拓展字段0:td/div/a/@href
拓展字段1:td/p/text()

若传入json值,输入jsonpath则可

拉勾网站爬取案例
URL:https://www.lagou.com/jobs/positionAjax.json?city=%E6%88%90%E9%83%BD&needAddtionalResult=false&isSchoolJob=0
Post/Get = post
Referer = https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=
data = first:true,pn:+(1-5-1),kd:python
点击提交
基本位置:$..result
核心字段:companyShortName
拓展字段:positionId

jpider's People

Contributors

zhuyingqin avatar

Stargazers

Kevinlinpr avatar  avatar  avatar tanrich avatar

Watchers

James Cloos avatar  avatar

Forkers

alwaysonline233

jpider's Issues

关于源码问题

你好,在知乎上面看到你的文章过后,阅读你的代码,想要改进我的WEB爬虫,能不能讲一下代码大致运行的顺序呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.