Code Monkey home page Code Monkey logo

py-web-search's Introduction

py-web-search

NOTE: This project is not being maintained anymore.

Latest VersionJoin the chat at https://gitter.im/rohithpr/py-web-search

A Python module to fetch and parse results from different search engines.

Warning: Do not make queries rapidly! The servers may block you.

Related project

Use the search-api to get results in JSON format using http requests. (Does not need Python)

Table of Contents

Search engines supported

Installation

Python3: Install using pip:

    pip install py-web-search

Python2: Not available on PyPI at the moment. You can download this repository and set it up manually.

Usage

Web search

    from pws import Google
    from pws import Bing

    print(Google.search(query='hello world', num=5, start=2, country_code="es"))
    print(Bing.search('hello world', 5, 2))
    
    # Arguments:
    # search(query, num, start, sleep, recent)
    # query: Required. The keyword that will be searched.
    # num: Default 10. The number of results returned.
    # start: Default 0. The number of top results that are to be ignored.
    # sleep: Default True. If True, the program will wait for a second, when applicable, to avoid overwhelming the servers.
    # recent: Default None. The following values are allowed: 'h': hour, 'd': day, 'w': week, 'm': month and 'y': year.(Buggy)
    # country_code: For local results.

Prints 5 results from the the third result onwards (ignores the first 2) in the following format.

    {
        'url': '...',
        'expected_num': 5,
        'received_num' : 5, # There will be a difference in case of insufficient results
        'start': 2,
        'search_engine': 'google',
        'total_results': ...,
        'results':
        [
            {
                'link': '...',
                'link_text': '...',
                'link_info': '...',
                'related_queries': [...],
                'additional_links':
                {
                    linktext: link,
                    ...
                }
        	},
        	...
        ]
    }

News search

    from pws import Bing
    from pws import Google

    print(Bing.search_news(query='github', 10, 0, True, 'h'))
    print(Google.search_news('github', 10, 0, True, 'd', "es"))
    
    # Arguments:
    # search_news(query, num, start, sleep, recent)
    # query: Required. The keyword that will be searched.
    # num: Default 10. The number of results returned.
    # start: Default 0. The number of top results that are to be ignored.
    # sleep: Default True. If True, the program will wait for a second, when applicable, to avoid overwhelming the servers.
    # recent: Default None. The following values are allowed: 'h': hour, 'd': day, 'w': week, 'm': month and 'y': year.(Buggy)
    # country_code: For local results. 

Prints 10 results from the the first result onwards (ignores the first 0) in the following format.

    {
        'url': '...',
        'num': 10,
        'start': 0,
        'search_engine': 'bing',
        'results':
        [
            {
                'link': '...',
                'link_text': '...',
                'link_info': '...',
                'source': '...',
                'time': '...',
                'additional_links':{}, # Always empty for Bing.
            },
            ...
        ]
    }

Todo

  • Other search engines
  • Images etc.

Contribution

Feel free to add any features that you think might be useful.

py-web-search's People

Contributors

bubavv avatar dhondta avatar gitter-badger avatar golyo88 avatar rohithpr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py-web-search's Issues

TypeError: __init__() got an unexpected keyword argument 'strict'

from pws import Google
from pws import Bing
print(Google.search(query='hello world', num=5, start=2))
print(Bing.search('hello world', 5, 2))

When I run this example, I get this error:

TypeError: init() got an unexpected keyword argument 'strict'

on line 3.

AttributeError: 'NoneType' object has no attribute 'find_all'

This happen when the query is "strange" or maybe too long or syntactically wrong. For example "ASD adasmd asidjkasd" or "All euro2016 matches of the year" (with no space between Euro and 2016). It can't find related queries. I have handled this by commenting :

if related_queries == []:
    related_queries = Bing.scrape_related(soup)

(Ln:113 and Ln:114 of bing.py)

No module named html.parser

Hello

When I am doing the import as described on docs, I get this error

No module named html.parser

Could you help me?
Thank you

No Google Search

When you use this module for searching in Google, there's no JSON response as it sends a simple message with the URL of searching (e.g.: you are looking for 'github', 10 results, and the response of the module is: 'url': 'https://www.google.com/search?q=github&num=10&start=0&tbm=nws#q=github&tbas=0&tbs=sbd:1&tbm=nws&gl=d'

Bug in Google.search()

I using python 3.4.3 64 bit in windows 8.

>>> from pws import Google
>>> print(Google.search(query='hello world', num=5, start=2, country_code="es"))

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\py34\lib\encodings\cp874.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 232-234:
 character maps to <undefined>

No have results in Google

I using python 3.4.3 in windows 8. py-web-search 0.2.4 . At Thailand.

>>> a=Google.search('apple', 10, 2,'d','en')
>>> a
{'total_results': 21480000000, 'search_engine': 'google', 'received_num': 0, 'ex
pected_num': 10, 'country_code': None, 'url': 'https://www.google.com/search?q=a
pple&num=10&start=2', 'start': 2, 'related_queries': ['apple watch', 'apple tv',
 'apple watch \ufffd\u04a4\ufffd', 'apple show', 'apple \ufffd\ufffd\ufffd\u047e
\ufffd\ufffd', 'appleaa', 'apple itune', 'apple tv \ufffd\u04a4\ufffd'], 'result
s': []}
>>> a=Google.search_news('apple', 10, 2,'d','en')
>>> a
{'num': 10, 'total_results': 2168000000, 'country_code': None, 'search_engine':
'google', 'url': 'https://www.google.com/search?q=apple&num=10&start=2&tbm=nws#q
=apple&tbas=0&tbs=sbd:1&tbm=nws&gl=en', 'start': 2, 'results': []}
>>> a=Google.search_news('apple', 10, 0,'d','en')

No have results. About https ?

Number of search results

Hello!
Can you add the function which will show only number of search result of various phrases in int type?

For example:
print(Google.search_count('hello world'))

246 000 000

If you will realize it, you will rock. ;)
With regards,
Ivan.

Google Search Limitation

Problem: By performing a lot of searches consecutively, Google detects the bot nature of the Python script and changes its responses into alternative pages with Captcha control.

Solution: While Google starts sending alternative pages, fall back to using Splinter library to perform browser automation with a Human-like behavior by spacing requests with a random wait timer. This is far slower but makes the script continue to work.

Note: If you are interested, let me know as I already implemented this solution. Note that this bypasses Google's control and this will certainly work during a limited time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.