Code Monkey home page Code Monkey logo

baidu-search-link's Introduction

百度搜索链接解密 & 百度收录查询

百度的收录情况只给具体数字,却不说哪个被收录哪个没有。所以有必要根据百度搜索将所有链接提取出来。

这里记录了一些开发过程。

使用方法

  1. src/get-link.py中把domin改成你自己的域名
  2. 运行python get-link.py,会有结果打印出来

2358

  1. 提取加密链接 观察网页发现,所有的加密链接都在data-tools=\'(.*?)\'里面,使用正则提取出来即可

  2. 解密链接 模拟访问该加密链接,要么会返回一个自动跳转的js,要么在header中的location给出。要跳转的链接就是我们解密后的链接。

  3. 翻页 网页是压缩过的,所有的换行符都删掉了。导致正则匹配下一页出现很大困难。贪婪模式和非贪婪模式都不好用,目前的措施是将</a>强制换为</a>\n, 即加上换行符。或许可以考虑使用html解析器,但是要测试速度差异。

TODO

  1. 做成api开放
  2. 解决 访问次数会被强制断开的问题,目前是sleep 3秒。

baidu-search-link's People

Contributors

wenfengand avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.