Code Monkey home page Code Monkey logo

cnki-spyder-tool's Introduction

cnki-spyder-tool

一个**知网的爬虫工具,给定作者,可以获得该作者的所有文献的题录

范例

使用时需要将文件tool.py放在正确的位置

import sys
sys.path.append(".")
import tool as cnki
from pprint import pprint as fprint


if __name__ == "__main__":
    tmp = cnki.get_doc_url_set("牛丽慧","南京大学") 
    fprint(tmp)
    res_file = open("res.csv","a",encoding="utf-8",newline="")
    for item in tmp:
        cnki.get_doc_bibilo(item,res_file)
    res_file.close()

文档

Help on module tool:

NAME
    tool

DESCRIPTION
    Eng:
    This module is written for scraping bibilograph of literature from cnki.
    It relies on python modules below:
    1. requests
    2. bs4
    you can easily get them using tool like pip.
    
    中文:
    这个模块用来爬取**知网的文献题录。
    它依赖于以下的Python模块:
    1.requests
    2.bs4
    你可以使用像pip这样的工具容易地获取它们。

FUNCTIONS
    get_doc_bibilo(doc_url, res_file)
        this function return bibliograph of the specified doc url by writting into the specified file
        
        Args:
            doc_url: specified doc url
            res_file: target res file
    
    get_doc_url_set(author_name, unit_name)
        This function simulates the behavior that you open the website:
        http://yuanjian.cnki.com.cn/
        and enter the dictionary:
        {"作者":"foo","作者单位":"bar"}
        and click the search button
        as you get the response page, you can get url set of all the page,
        it needs support of function:
        def get_next_page_doc_url_set(author_name,page_num,unit_name):
        to deal with the demond of page turning
        
        Args:
            author_name:  author's name
            unit_name: author's unit name
           
        Returns:
            return set of doc urls
    
    get_next_page_doc_url_set(author_name, page_num, unit_name)
        this function support the function:
        def get_doc_url_set(author_name,unit_name):
        get urls of specified page.
        
        Args:
            author_name: author's name
            page_num: specified page number
            unit_name: author's unit name
            
        Return:
            return set of doc urls

FILE
    /root/workbench/cnki-spyder-tool/tool.py

cnki-spyder-tool's People

Contributors

pkuimyy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.