Code Monkey home page Code Monkey logo

se-scraper's Introduction

Search Engine Scraper - se-scraper

npm Donate Known Vulnerabilities

This node module allows you to scrape search engines concurrently with different proxies.

If you don't have extensive technical experience or don't want to purchase proxies, you can use my scraping service.

Table of Contents

Se-scraper supports the following search engines:

  • Google
  • Google News
  • Google News App version (https://news.google.com)
  • Google Image
  • Bing
  • Bing News
  • Infospace
  • Duckduckgo
  • Yandex
  • Webcrawler

This module uses puppeteer and a modified version of puppeteer-cluster. It was created by the Developer of GoogleScraper, a module with 1800 Stars on Github.

Installation

You need a working installation of node and the npm package manager.

For example, if you are using Ubuntu 18.04, you can install node and npm with the following commands:

sudo apt update;

sudo apt install nodejs;

# recent version of npm
curl -sL https://deb.nodesource.com/setup_10.x -o nodesource_setup.sh;
sudo bash nodesource_setup.sh;
sudo apt install npm;

Chrome and puppeteer need some additional libraries to run on ubuntu.

This command will install dependencies:

# install all that is needed by chromium browser. Maybe not everything needed
sudo apt-get install gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget;

Install se-scraper by entering the following command in your terminal

npm install se-scraper

If you don't want puppeteer to download a complete chromium browser, add this variable to your environment. Then this module is not guaranteed to run out of the box.

export PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1

Docker Support

I will maintain a public docker image of se-scraper. Pull the docker image with the command:

docker pull tschachn/se-scraper

Confirm that the docker image was correctly pulled:

docker image ls

Should show something like that:

tschachn/se-scraper             latest           897e1aeeba78        21 minutes ago      1.29GB

You can check the latest tag here. In the example below, the latest tag is latest. This will most likely remain latest in the future.

Run the docker image and map the internal port 3000 to the external port 3000:

$ docker run -p 3000:3000 tschachn/se-scraper:latest

Running on http://0.0.0.0:3000

When the image is running, you may start scrape jobs via HTTP API:

curl -XPOST http://0.0.0.0:3000 -H 'Content-Type: application/json' \
-d '{
    "browser_config": {
        "random_user_agent": true
    },
    "scrape_config": {
        "search_engine": "google",
        "keywords": ["test"],
        "num_pages": 1
    }
}'

Many thanks goes to slotix for his tremendous help in setting up a docker image.

Minimal Example

Create a file named minimal.js with the following contents

const se_scraper = require('se-scraper');

(async () => {
    let scrape_job = {
        search_engine: 'google',
        keywords: ['lets go boys'],
        num_pages: 1,
    };

    var results = await se_scraper.scrape({}, scrape_job);

    console.dir(results, {depth: null, colors: true});
})();

Start scraping by firing up the command node minimal.js

Quickstart

Create a file named run.js with the following contents

const se_scraper = require('se-scraper');

(async () => {
    let browser_config = {
        debug_level: 1,
        output_file: 'examples/results/data.json',
    };

    let scrape_job = {
        search_engine: 'google',
        keywords: ['news', 'se-scraper'],
        num_pages: 1,
        // add some cool google search settings
        google_settings: {
            gl: 'us', // The gl parameter determines the Google country to use for the query.
            hl: 'en', // The hl parameter determines the Google UI language to return results.
            start: 0, // Determines the results offset to use, defaults to 0.
            num: 100, // Determines the number of results to show, defaults to 10. Maximum is 100.
        },
    };

    var scraper = new se_scraper.ScrapeManager(browser_config);

    await scraper.start();

    var results = await scraper.scrape(scrape_job);

    console.dir(results, {depth: null, colors: true});

    await scraper.quit();
})();

Start scraping by firing up the command node run.js

Contribute

I really help and love your help! However scraping is a dirty business and it often takes me a lot of time to find failing selectors or missing JS logic. So if any search engine does not yield the results of your liking, please create a static test case similar to this static test of google that fails. I will try to correct se-scraper then.

That's how you would proceed:

  1. Copy the static google test case
  2. Remove all unnecessary testing code
  3. Save a search to file where se-scraper does not work correctly.
  4. Implement the static test case using the saved search html where se-scraper currently fails.
  5. Submit a new issue with the failing test case as pull request
  6. I will fix it! (or better: you submit a pull request directly)

Proxies

se-scraper will create one browser instance per proxy. So the maximal amount of concurrency is equivalent to the number of proxies plus one (your own IP).

const se_scraper = require('se-scraper');

(async () => {
    let browser_config = {
        debug_level: 1,
        output_file: 'examples/results/proxyresults.json',
        proxy_file: '/home/nikolai/.proxies', // one proxy per line
        log_ip_address: true,
    };

    let scrape_job = {
        search_engine: 'google',
        keywords: ['news', 'scrapeulous.com', 'incolumitas.com', 'i work too much', 'what to do?', 'javascript is hard'],
        num_pages: 1,
    };

    var scraper = new se_scraper.ScrapeManager(browser_config);
    await scraper.start();

    var results = await scraper.scrape(scrape_job);
    console.dir(results, {depth: null, colors: true});
    await scraper.quit();
})();

With a proxy file such as

socks5://53.34.23.55:55523
socks4://51.11.23.22:22222

This will scrape with three browser instance each having their own IP address. Unfortunately, it is currently not possible to scrape with different proxies per tab. Chromium does not support that.

Custom Scrapers

You can define your own scraper class and use it within se-scraper.

Check this example out that defines a custom scraper for Ecosia.

Examples

Scraping Model

se-scraper scrapes search engines only. In order to introduce concurrency into this library, it is necessary to define the scraping model. Then we can decide how we divide and conquer.

Scraping Resources

What are common scraping resources?

  1. Memory and CPU. Necessary to launch multiple browser instances.
  2. Network Bandwith. Is not often the bottleneck.
  3. IP Addresses. Websites often block IP addresses after a certain amount of requests from the same IP address. Can be circumvented by using proxies.
  4. Spoofable identifiers such as browser fingerprint or user agents. Those will be handled by se-scraper

Concurrency Model

se-scraper should be able to run without any concurrency at all. This is the default case. No concurrency means only one browser/tab is searching at the time.

For concurrent use, we will make use of a modified puppeteer-cluster library.

One scrape job is properly defined by

  • 1 search engine such as google
  • M pages
  • N keywords/queries
  • K proxies and K+1 browser instances (because when we have no proxies available, we will scrape with our dedicated IP)

Then se-scraper will create K+1 dedicated browser instances with a unique ip address. Each browser will get N/(K+1) keywords and will issue N/(K+1) * M total requests to the search engine.

The problem is that puppeteer-cluster library does only allow identical options for subsequent new browser instances. Therefore, it is not trivial to launch a cluster of browsers with distinct proxy settings. Right now, every browser has the same options. It's not possible to set options on a per browser basis.

Solution:

  1. Create a upstream proxy router.
  2. Modify puppeteer-cluster library to accept a list of proxy strings and then pop() from this list at every new call to workerInstance() in https://github.com/thomasdondorf/puppeteer-cluster/blob/master/src/Cluster.ts I wrote an issue here. I ended up doing this.

Technical Notes

Scraping is done with a headless chromium browser using the automation library puppeteer. Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol.

If you need to deploy scraping to the cloud (AWS or Azure), you can contact me at [email protected]

The chromium browser is started with the following flags to prevent scraping detection.

var ADDITIONAL_CHROME_FLAGS = [
    '--disable-infobars',
    '--window-position=0,0',
    '--ignore-certifcate-errors',
    '--ignore-certifcate-errors-spki-list',
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-dev-shm-usage',
    '--disable-accelerated-2d-canvas',
    '--disable-gpu',
    '--window-size=1920x1080',
    '--hide-scrollbars',
    '--disable-notifications',
];

Furthermore, to avoid loading unnecessary ressources and to speed up scraping a great deal, we instruct chrome to not load images and css and media:

await page.setRequestInterception(true);
page.on('request', (req) => {
    let type = req.resourceType();
    const block = ['stylesheet', 'font', 'image', 'media'];
    if (block.includes(type)) {
        req.abort();
    } else {
        req.continue();
    }
});

Making puppeteer and headless chrome undetectable

Consider the following resources:

se-scraper implements the countermeasures against headless chrome detection proposed on those sites.

Most recent detection counter measures can be found here:

se-scraper makes use of those anti detection techniques.

To check whether evasion works, you can test it by passing test_evasion flag to the config:

let config = {
    // check if headless chrome escapes common detection techniques
    test_evasion: true
};

It will create a screenshot named headless-test-result.png in the directory where the scraper was started that shows whether all test have passed.

Advanced Usage

Use se-scraper by calling it with a script such as the one below.

const se_scraper = require('se-scraper');

// those options need to be provided on startup
// and cannot give to se-scraper on scrape() calls
let browser_config = {
    // the user agent to scrape with
    user_agent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3835.0 Safari/537.36',
    // if random_user_agent is set to True, a random user agent is chosen
    random_user_agent: false,
    // whether to select manual settings in visible mode
    set_manual_settings: false,
    // log ip address data
    log_ip_address: false,
    // log http headers
    log_http_headers: false,
    // how long to sleep between requests. a random sleep interval within the range [a,b]
    // is drawn before every request. empty string for no sleeping.
    sleep_range: '',
    // which search engine to scrape
    search_engine: 'google',
    compress: false, // compress
    // whether debug information should be printed
    // level 0: print nothing
    // level 1: print most important info
    // ...
    // level 4: print all shit nobody wants to know
    debug_level: 1,
    keywords: ['nodejs rocks',],
    // whether to start the browser in headless mode
    headless: true,
    // specify flags passed to chrome here
    chrome_flags: [],
    // the number of pages to scrape for each keyword
    num_pages: 1,
    // path to output file, data will be stored in JSON
    output_file: '',
    // whether to also passthru all the html output of the serp pages
    html_output: false,
    // whether to return a screenshot of serp pages as b64 data
    screen_output: false,
    // whether to prevent images, css, fonts and media from being loaded
    // will speed up scraping a great deal
    block_assets: true,
    // path to js module that extends functionality
    // this module should export the functions:
    // get_browser, handle_metadata, close_browser
    //custom_func: resolve('examples/pluggable.js'),
    custom_func: '',
    throw_on_detection: false,
    // use a proxy for all connections
    // example: 'socks5://78.94.172.42:1080'
    // example: 'http://118.174.233.10:48400'
    proxy: '',
    // a file with one proxy per line. Example:
    // socks5://78.94.172.42:1080
    // http://118.174.233.10:48400
    proxy_file: '',
    // whether to use proxies only
    // when this is set to true, se-scraper will not use
    // your default IP address
    use_proxies_only: false,
    // check if headless chrome escapes common detection techniques
    // this is a quick test and should be used for debugging
    test_evasion: false,
    apply_evasion_techniques: true,
    // settings for puppeteer-cluster
    puppeteer_cluster_config: {
        timeout: 30 * 60 * 1000, // max timeout set to 30 minutes
        monitor: false,
        concurrency: Cluster.CONCURRENCY_BROWSER,
        maxConcurrency: 1,
    }
};

(async () => {
    // scrape config can change on each scrape() call
    let scrape_config = {
        // which search engine to scrape
        search_engine: 'google',
        // an array of keywords to scrape
        keywords: ['cat', 'mouse'],
        // the number of pages to scrape for each keyword
        num_pages: 2,

        // OPTIONAL PARAMS BELOW:
        google_settings: {
            gl: 'us', // The gl parameter determines the Google country to use for the query.
            hl: 'fr', // The hl parameter determines the Google UI language to return results.
            start: 0, // Determines the results offset to use, defaults to 0.
            num: 100, // Determines the number of results to show, defaults to 10. Maximum is 100.
        },
        // instead of keywords you can specify a keyword_file. this overwrites the keywords array
        keyword_file: '',
        // how long to sleep between requests. a random sleep interval within the range [a,b]
        // is drawn before every request. empty string for no sleeping.
        sleep_range: '',
        // path to output file, data will be stored in JSON
        output_file: 'output.json',
        // whether to prevent images, css, fonts from being loaded
        // will speed up scraping a great deal
        block_assets: false,
        // check if headless chrome escapes common detection techniques
        // this is a quick test and should be used for debugging
        test_evasion: false,
        apply_evasion_techniques: true,
        // log ip address data
        log_ip_address: false,
        // log http headers
        log_http_headers: false,
    };

    let results = await se_scraper.scrape(browser_config, scrape_config);
    console.dir(results, {depth: null, colors: true});
})();

Output for the above script on my machine.

Query String Parameters

You can add your custom query string parameters to the configuration object by specifying a google_settings key. In general: {{search engine}}_settings.

For example you can customize your google search with the following config:

let scrape_config = {
    search_engine: 'google',
    // use specific search engine parameters for various search engines
    google_settings: {
        google_domain: 'google.com',
        gl: 'us', // The gl parameter determines the Google country to use for the query.
        hl: 'us', // The hl parameter determines the Google UI language to return results.
        start: 0, // Determines the results offset to use, defaults to 0.
        num: 100, // Determines the number of results to show, defaults to 10. Maximum is 100.
    },
}

se-scraper's People

Contributors

aularon avatar hugopoi avatar kujaomega avatar nikolait avatar slotix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

se-scraper's Issues

... detected the scraping. Aborting.

Any fix for the following error?
....
at Object.scrape_google_pup (//se-scraper/se-scraper/src/modules/google.js:65:15) name: 'TimeoutError' }
Google detected the scraping. Aborting.
$

run multiple search engines and execute all promises before other action

I was trying to use express with se-scraper and have multiple search engines. It works fine but it just run only the last search engine. (for exemple for ['bing', 'google', 'webcrawler'] it will return 3 time the results of 'webcrawler').

The issue seems to be solved after a bit of debug, at line 43 in node_scraper.js I just change "config = event" with "let config = event"

Error: Proxy output ip <proxy-ip> does not match with provided one

The error I have encountered is: "Error: Proxy output ip socks5://192.169.156.211:50479 does not match with provided one"

I have been trying to run for_the_lulz.js with a proxy file, but I have failed. Why the error is occurred? What is the reason of it?

Update: proxies feature is not working... When I run the test_proxyflag.js with the same proxy it is working successfully but with se-scrapper not.

Scraping with multiple Search Engines? [Question]

I'm trying to scrape by using multiple search engines successively.
e.g.

var searchEnginesList = ['google','bing']

for (let index = 0; index < searchEnginesList.length; index++) {
    const searchEngine = searchEnginesList[index];
    let config = {
        random_user_agent: true,
        write_meta_data: 'true',
        sleep_range: '[1,1]',
        search_engine: searchEngine,
        debug: 'false',
        verbose: 'false',
        keywords: 'kw',
    };
    se_scraper.scrape(config, (err, response) => {
        if (err) {
            console.error(err)
        }
        console.dir(response.results, {
            depth: null,
            colors: true
        });
    });
}

but only one search engine is being used and I'm getting either this error:
Error: Navigation failed because browser has disconnected!
or this error:
Error: WebSocket is not open: readyState 3 (CLOSED)
Is there any way to make this possible? Am I doing something wrong?

Results from google maps = undefined

I was starting with basic configuration and I always had the same result for my keyword. My code looks like this:

const se_scraper = require('se-scraper');

(async () => {
  let browser_config = {
    debug_level: 1,
    output_file: './maps.json',
    test_evasion: false,
    sleep_range: '[1,1]',
    block_assets: false,
    headless: false,

    google_maps_settings: {
      scrape_in_detail: false,
    }
  };

  let scrape_job = {
    search_engine: 'google_maps',
    keywords: ['fryzjer'],
    num_pages: 1,
  };

  var scraper = new se_scraper.ScrapeManager(browser_config);

  await scraper.start();

  var results = await scraper.scrape(scrape_job);

  console.dir(results, {
    depth: null,
    colors: true
  });

  await scraper.quit();
})();

And this is my results from terminal:

[i] [se-scraper] started at [Thu, 11 Jul 2019 10:41:43 GMT] and scrapes google with 1 keywords on 1 pages each.
[i] Using startUrl: https://www.google.com/maps
[i] google scrapes keyword "fryzjer" on page 1
[i] Sleeping for 1s
[i] Scraper took 7457ms to perform 1 requests.
[i] On average ms/request: 7457ms/request
[i] Writing results to ./maps.json
{ results:
   { fryzjer:
      { '1':
         { time: 'Thu, 11 Jul 2019 10:41:51 GMT', results: undefined } } },
  html_output: undefined,
  metadata:
   { elapsed_time: '7457', ms_per_keyword: '7457', num_requests: 1 } }

Do you have any idea what can I do to get results data?

The browser does not start in docker

I follow the instructions completely, however, I get an error

Running on http://0.0.0.0:3000
[i] [se-scraper] started at [Sun, 15 Sep 2019 20:47:56 GMT] and scrapes google with 1 keywords on 1 pages each.
[i] Using startUrl: https://www.google.com
(node:1) UnhandledPromiseRejectionWarning: Error: Navigation failed because browser has disconnected!
    at CDPSession.LifecycleWatcher._eventListeners.helper.addEventListener (/se-scraper/src/puppeteer-cluster/node_modules/puppeteer/lib/LifecycleWatcher.js:46:107)
    at CDPSession.emit (events.js:198:13)
    at CDPSession._onClosed (/se-scraper/src/puppeteer-cluster/node_modules/puppeteer/lib/Connection.js:215:10)
    at Connection._onClose (/se-scraper/src/puppeteer-cluster/node_modules/puppeteer/lib/Connection.js:138:15)
    at WebSocketTransport._ws.addEventListener.event (/se-scraper/src/puppeteer-cluster/node_modules/puppeteer/lib/WebSocketTransport.js:48:22)
    at WebSocket.onClose (/se-scraper/src/puppeteer-cluster/node_modules/puppeteer/node_modules/ws/lib/event-target.js:124:16)
    at WebSocket.emit (events.js:198:13)
    at WebSocket.emitClose (/se-scraper/src/puppeteer-cluster/node_modules/puppeteer/node_modules/ws/lib/websocket.js:191:10)
    at Socket.socketOnClose (/se-scraper/src/puppeteer-cluster/node_modules/puppeteer/node_modules/ws/lib/websocket.js:850:15)
    at Socket.emit (events.js:198:13)
  -- ASYNC --
    at Frame.<anonymous> (/se-scraper/src/puppeteer-cluster/node_modules/puppeteer/lib/helper.js:111:15)
    at Page.goto (/se-scraper/src/puppeteer-cluster/node_modules/puppeteer/lib/Page.js:629:49)
    at Page.<anonymous> (/se-scraper/src/puppeteer-cluster/node_modules/puppeteer/lib/helper.js:112:23)
    at GoogleScraper.load_start_page (/se-scraper/src/modules/google.js:177:46)
    at GoogleScraper.load_search_engine (/se-scraper/src/modules/se_scraper.js:137:27)
    at process._tickCallback (internal/process/next_tick.js:68:7)
(node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 3)
(node:1) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Regex for keywords?

Hi,

First of all - great scraper! really, truly, awesome work! It's a joy playing around with this!

So, my question is in regard to the use of regex for keywords. I wanted to use regex for specialized search string patterns, but I can't figure out how to get this to work. If I wanted all keywords that had 0-5 characters (a-z, 0-9) and ended with the characters 'ej6', how would I make that work?

I attempted:

let config = {
    keywords: ['/^[a-z0-9]{0,5}$/ej6'],
let config = {
    keywords: ['/^[a-z0-9]{0,5}$/' + 'ej6'],

... along with a few other attempts, but I honestly have no idea how to make it work with regex, and maybe there is an even better way using JS (?).

Best,
Simon

STANDARD_TIMEOUT too low

Problem with scraping twitter.com in search engine google: TimeoutError: waiting for selector "#fbar" failed: timeout 10000ms exceeded

this.STANDARD_TIMEOUT = 10000;

Solution extend timeout, this value should be a parameter

this.STANDARD_TIMEOUT = 5000000;

Hey everyone.

Please leave your bug reports and requests for features here. I will maintain this package in the future.

Errors installing on Raspbian Stretch Lite (2018-11-13)

sudo apt-get install gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget

pi@raven:~ $ npm install se-scraper
(node:1569) [DEP0022] DeprecationWarning: os.tmpDir() is deprecated. Use os.tmpdir() instead.
npm ERR! tar.unpack untar error /home/pi/.npm/puppeteer/1.14.0/package.tgz
npm ERR! tar.unpack untar error /home/pi/.npm/ms/2.1.1/package.tgz
npm ERR! error rolling back Error: ENOTEMPTY: directory not empty, rmdir '/home/pi/node_modules/se-scraper/node_modules/chai/lib/chai/interface'
npm ERR! error rolling back [email protected] { Error: ENOTEMPTY: directory not empty, rmdir '/home/pi/node_modules/se-scraper/node_modules/chai/lib/chai/interface'
npm ERR! error rolling back errno: -39,
npm ERR! error rolling back code: 'ENOTEMPTY',
npm ERR! error rolling back syscall: 'rmdir',
npm ERR! error rolling back path: '/home/pi/node_modules/se-scraper/node_modules/chai/lib/chai/interface' }
npm ERR! Error: Method Not Allowed
npm ERR! at errorResponse (/usr/share/npm/lib/cache/add-named.js:260:10)
npm ERR! at /usr/share/npm/lib/cache/add-named.js:203:12
npm ERR! at saved (/usr/share/npm/node_modules/npm-registry-client/lib/get.js:167:7)
npm ERR! at FSReqWrap.oncomplete (fs.js:135:15)
npm ERR! If you need help, you may report this entire log,
npm ERR! including the npm and node versions, at:
npm ERR! http://github.com/npm/npm/issues

npm ERR! System Linux 4.14.98-v7+
npm ERR! command "/usr/bin/node" "/usr/bin/npm" "install" "se-scraper"
npm ERR! cwd /home/pi
npm ERR! node -v v8.11.1
npm ERR! npm -v 1.4.21
npm ERR! code E405
npm ERR! tar.unpack untar error /home/pi/.npm/lodash/4.17.11/package.tgz
npm ERR! tar.unpack untar error /home/pi/.npm/domhandler/2.4.2/package.tgz
npm ERR!
npm ERR! Additional logging details can be found in:
npm ERR! /home/pi/npm-debug.log
npm ERR! not ok code 0
pi@raven:~ $

pi@raven:~ $ node --version v8.11.1
pi@raven:~ $ npm --version 1.4.21

How i can use only proxy?

Hello. The documentation says that the application uses the proxy list + my ip. Can I somehow disable this option so that the application uses only a proxy?

Get number of results from Google News

Hi Nikolai,

I want to get the total number of search results (articles) and top 10 recent news articles related to a keyword for the last six months and not specific to a single region from Google News.

Please guide me regarding this.

Thanks and Regards,
Mayank

using a keyword file

Hello,
It appears to be a great tool. Can you please guide how to use a keyword file? If possible, writing the output into a .json for each keyword instead of one file for all keywords.

Thanks.

Docker support

Awsome module!
Do you plan to build "se-scraper" docker image?
Thank you.

Safe per IP Google limits?

Hi all, does anyone have up-to-date data for how many searches you can perform per IP per time period before getting blocked?

We have a limited scraping need, and spare server resources, so figured this would be a good solution, but would like to know how best to split the work.

I found old posts suggesting around 300 regular (not Google dorks) searches per 24 hours, but have no idea if this is still correct.

Any input greatly appreciated.

Noob Question. Best way to restart search after crash..

Hi all.

If the scraper crashes half way through a search. How do i reset the scraper to start from the search results page it was last on?

Is this set through "Query String Parameters" ?
If it is I have tried to add this to the advanced usage script but had no luck, I keep getting this error when starting the script.

/test.js:26
verbose: true,
^

SyntaxError: Unexpected token :
at Module._compile (internal/modules/cjs/loader.js:721:23)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:787:10)
at Module.load (internal/modules/cjs/loader.js:653:32)
at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
at Function.Module._load (internal/modules/cjs/loader.js:585:3)
at Function.Module.runMain (internal/modules/cjs/loader.js:829:12)
at startup (internal/bootstrap/node.js:283:19)
at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3)

Google results capture extra text

Using the following scrape job config:

let scrape_job = {
        search_engine: 'google',
        keywords: ['trump'],
        num_pages: 1,
        // add some cool google search settings
        google_settings: {
            gl: 'us', // The gl parameter determines the Google country to use for the query.
            hl: 'en' // The hl parameter determines the Google UI language to return results.
        },
    };

Some results will have extra texts for the title and snippet.
For example the following result have the link https://www.foxnews.com/politics/trump-says-appointing-sessions-was-is-biggest-mistake as part of the tittle

{
  "link": "https://www.foxnews.com/politics/trump-says-appointing-sessions-was-is-biggest-mistake",
  "title": "Trump says appointing Sessions was his biggest mistake | Fox Newshttps://www.foxnews.com/politics/trump-says-appointing-sessions-was-is-biggest-mistake",
  "snippet": "President Trump admitted in an interview aired Sunday that he views appointing Jeff Sessions as his attorney general was his “biggest mistake” as president ...",
  "visible_link": "https://www.foxnews.com/politics/trump-says-appointing-sessions-was-is-biggest-mistake",
  "date": "",
  "rank": 7
}

This one has the date-time 3 hours ago in the snippet

{  
   link:'https://www.cnn.com/2019/06/24/politics/iran-immigration-donald-trump-2020-reelection/index.html',
   title:"Iran and immigration: How politics explains Donald Trump's U-turns ...https://www.cnn.com/2019/06/24/politics/iran-immigration...trump-2020.../index.html",
   snippet:"3 hours ago - The whims and last minute reversals of President Donald Trump's impulsive leadership style are sending America and the rest of the world on a wild and sometimes dangerous ride. From Iran to immigration policy and North Korea to trade wars, Trump's tactic of escalating ...",
   visible_link:'https://www.cnn.com/2019/06/24/politics/iran-immigration...trump-2020.../index.html',
   date:'3 hours ago - ',
   rank:4
}

Modularize scraper modules

We don't need this but I think it's a cool idea.

function getScraper(searchEngine, args) {
return new {
google: google.GoogleScraper,
google_news_old: google.GoogleNewsOldScraper,
google_news: google.GoogleNewsScraper,
google_image: google.GoogleImageScraper,
bing: bing.BingScraper,
bing_news: bing.BingNewsScraper,
amazon: amazon.AmazonScraper,
duckduckgo: duckduckgo.DuckduckgoScraper,
duckduckgo_news: duckduckgo.DuckduckgoNewsScraper,
infospace: infospace.InfospaceScraper,
webcrawler: infospace.WebcrawlerNewsScraper,
baidu: baidu.BaiduScraper,
youtube: youtube.YoutubeScraper,
yahoo_news: tickersearch.YahooFinanceScraper,
reuters: tickersearch.ReutersFinanceScraper,
cnbc: tickersearch.CnbcFinanceScraper,
marketwatch: tickersearch.MarketwatchFinanceScraper,
}[searchEngine](args);
}

Instead of hardcoding the list of possible search engines, allow use of any class that extends the base class. Can be a polymorphic addition of the current codebase.

var obj = getScraper(this.config.search_engine, {
config: this.config,
context: {},
pluggable: this.pluggable,
});

let obj;
if (typeof this.config.search_engine === 'string') {
  obj = getScraper(this.config.search_engine, { 
     config: this.config, 
     context: {}, 
     pluggable: this.pluggable, 
 });
} else {
  obj = new this.config.search_engine({
    config: this.config,
    context: {},
    pluggable: this.pluggable
  })
}

Then it's possible for the community to write own scraper modules

const se_scraper = require('se-scraper');
const thirdPartyScraper = require('se-scraper-third-party')

(async () => {
    let scrape_job = {
        search_engine: thirdPartyScraper,
        keywords: ['lets go boys'],
        num_pages: 1,
    };

    var results = await se_scraper.scrape({}, scrape_job);

    console.dir(results, {depth: null, colors: true});
})();

Trouble with offsetting search results

Hi,

I'm running into issues when specifying an offset for the start location of the search. When applying different offsets the scrape returns only the results from the first page of the search.

Has anyone else run into this issue? And is there any way that I can go about solving this? When manually inputting the start URL into the search bar it redirects to https://google.com/webhp rather than the default search path. I'm not sure if this is the issue but any help would be greatly appreciated.

A bit more help for total noobs

Thanks for sharing this I was able to follow the Quickstart instructions until it said "then create a file with the following contents and start scraping." So I installed node.Js And NPM on my Ubuntu Bionic, I created that environment variable with the export command, I installed se-scraper. I then went into nano to create a file with the content. But then what? Do we save it as a .sh file, make it executable and then run it (that's what I tried, but that doesn't work)? Or is there an intermediate step that is obvious for cognoscenti, but not for semi-geeks like many of us? :) I'd be grateful if you could put me on the right track (and save me many hours). Thanks!
Update - I suspect it has to be a js file. Looking for instructions on node.js... But so if you could add that to your readme file, that'd be great!

Docker image not working at download and a build

Hi,

i can't get the docker image:
1/ dowloaded with the command specified in the README.md:
docker pull tschachn/se-scraper
=> it misses a tag.
Since there is not image with the tag "latest", it doesn't work.

It works with:
docker pull tschachn/se-scraper:firsttry

2/ built locally using Dockerfile
It ends with:

Step 10/17 : RUN npm install     && npm run build
 ---> Running in 04260d193eb5

> [email protected] install /se-scraper/node_modules/puppeteer
> node install.js

Chromium downloaded to /se-scraper/node_modules/puppeteer/.local-chromium/linux-686378
added 274 packages from 642 contributors and audited 548 packages in 24.965s
found 0 vulnerabilities

npm ERR! missing script: build

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2019-09-17T09_35_36_210Z-debug.log

Did I miss something ?
I build it with:
sudo docker build -t se_scraper .

Unable to use with authenticated proxy

Hello there,

I'm trying to make my requests using an authenticated proxy but I'm unable to make it work.

The formats that I've tried:

http://username:password@ip:port
http://ip:port:username:password
username:password@ip:port
ip:port:username:password

Here's the output that I'm getting (regardless of the format used):

my_server_ip vs http://username:password@ip:port
Proxy not working properly.
Failed to load the search engine: load_search_engine()

Any ideas on how I can use an authenticated proxy with se-scraper?

Thanks,
Nicolas

Duckduckgo wait_for_results selector doesn't work in case of no results

Hello
It seems like the selector you are currently using to scrape the Duckduckgo Result Page is not properly working in the case that the search doesn't produce any results:

[i] duckduckgo scrapes keyword ... on page 1
Problem with scraping ... in search engine duckduckgo: TimeoutError: waiting for selector ".result__body" failed: timeout 10000ms exceeded

I think the issue here is the selector you are using:

async wait_for_results() {
        await this.page.waitForSelector('.result__body', { timeout: this.STANDARD_TIMEOUT });
    }

If the search produces no results, the Duckduckgo Results Page doesn't have a div with the class .result__body.
As a fix I suggest using one of the following selectors instead (I have not tested it though, but they are present in both cases):

#links
.serp__results

Is possible remove filetype:pdf and get only links?

Congratulations for your work! It is amazing.

I have one question about the pdf filetype. Is possible to add any tag 'filter_file_extensions' at run.js to get only links or a specific filetype?

I need to remove pdf files, but, if I scrap it, I think that is unnecessary for my context.

Could you help me, please?

Install Fail Unable to launch browser for worker, error message: Failed to launch chrome!

Installation fails on Debian 9

Some puppeteer issue.

node v10.15.3
npm v6.4.1

This is related to some recent changes? I remember installing se-scraper about 3 weeks ago without any problems.

user@host:~/sescrap$ node run.js
[se-scraper] started at [Sun, 24 Mar 2019 18:10:37 GMT] and scrapes google with 2 keywords on 3 pages each.
(node:1932) UnhandledPromiseRejectionWarning: Error: Unable to launch browser for worker, error message: Failed to launch chrome!
[0324/191038.478962:FATAL:zygote_host_impl_linux.cc(116)] No usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/master/docs/linux_suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox.
#0 0x562eaff70f29 base::debug::CollectStackTrace()
#1 0x562eafed6593 base::debug::StackTrace::StackTrace()
#2 0x562eafeead1e logging::LogMessage::~LogMessage()
#3 0x562eb155390e service_manager::ZygoteHostImpl::Init()
#4 0x562eafb2c1b7 content::ContentMainRunnerImpl::Initialize()
#5 0x562eafb5efca service_manager::Main()
#6 0x562eafb2a791 content::ContentMain()
#7 0x562eb41c6178 headless::(anonymous namespace)::RunContentMain()
#8 0x562eb41c6205 headless::HeadlessBrowserMain()
#9 0x562eafb5dca3 headless::HeadlessShellMain()
#10 0x562eada951ac ChromeMain
#11 0x7fb4da22c2e1 __libc_start_main
#12 0x562eada9502a _start

Received signal 6
#0 0x562eaff70f29 base::debug::CollectStackTrace()
#1 0x562eafed6593 base::debug::StackTrace::StackTrace()
#2 0x562eaff70ab1 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7fb4e03bb0e0 <unknown>
#4 0x7fb4da23efff gsignal
#5 0x7fb4da24042a abort
#6 0x562eaff6f8e5 base::debug::BreakDebugger()
#7 0x562eafeeaf61 logging::LogMessage::~LogMessage()
#8 0x562eb155390e service_manager::ZygoteHostImpl::Init()
#9 0x562eafb2c1b7 content::ContentMainRunnerImpl::Initialize()
#10 0x562eafb5efca service_manager::Main()
#11 0x562eafb2a791 content::ContentMain()
#12 0x562eb41c6178 headless::(anonymous namespace)::RunContentMain()
#13 0x562eb41c6205 headless::HeadlessBrowserMain()
#14 0x562eafb5dca3 headless::HeadlessShellMain()
#15 0x562eada951ac ChromeMain
#16 0x7fb4da22c2e1 __libc_start_main
#17 0x562eada9502a _start
  r8: 0000000000000000  r9: 00007ffdc80a7e20 r10: 0000000000000008 r11: 0000000000000246
 r12: 00007ffdc80a80b8 r13: 0000000000000161 r14: 00007ffdc80a8a20 r15: 00007ffdc80a8a18
  di: 0000000000000002  si: 00007ffdc80a7e20  bp: 00007ffdc80a8060  bx: 0000000000000006
  dx: 0000000000000000  ax: 0000000000000000  cx: 00007fb4da23efff  sp: 00007ffdc80a7e98
  ip: 00007fb4da23efff efl: 0000000000000246 cgf: 002b000000000033 erf: 0000000000000000
 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]
Calling _exit(1). Core file will not be generated.


TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md

    at Cluster.<anonymous> (/home/user/sescrap/node_modules/se-scraper/src/puppeteer-cluster/dist/Cluster.js:143:23)
    at Generator.throw (<anonymous>)
    at rejected (/home/user/sescrap/node_modules/se-scraper/src/puppeteer-cluster/dist/Cluster.js:5:65)
    at process._tickCallback (internal/process/next_tick.js:68:7)
(node:1932) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:1932) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Set different geolocation for every request

I need to set different gelocation for every search query to Google. I set manually permissions in Browser.js (puppeteer-cluster catalog) because in se-scraper I don’t have access to browser object (or maybe I’m wrong?)

context = yield chrome.createIncognitoBrowserContext(); 
context.clearPermissionOverrides();  
context.overridePermissions('https://www.google.pl', ['geolocation']);

Next I overwritten GoogleScraper due to extends search_keyword method with geolocation params:

async search_keyword(keyword) {
	await this.page.setGeolocation({
	           latitude: latVar,
	           longitude: longVar
	});
	const input = await this.page.$('input[name="q"]');
	await this.set_input_value(`input[name="q"]`, keyword);
	await this.sleep(50);
	await input.focus();
	await this.page.keyboard.press("Enter");
}

I noticed that it’s necessary to reload the page to update Google results with new geo. When I put this code after Enter in search_keyword

await this.page.evaluate(() => {
	location.reload(true)
});

I got:
Error: Execution context was destroyed, most likely because of a navigation

I have also tried with put

await input.focus();
await this.page.keyboard.press("Enter");

but the same as above.

How to properly refresh the page or how to get the results related with geo params?

google_news_old settings

I'm trying to use google_news_old_settings, it seems that it doesn't work (same issue with bing, bing_news, webcrawler)

let config = {
    search_engine: 'google_news_old',
    // use specific search engine parameters for various search engines
    google_news_old_settings: {
        gl: 'us', // The gl parameter determines the Google country to use for the query.
        hl: 'us', // The hl parameter determines the Google UI language to return results.
        start: 0, // Determines the results offset to use, defaults to 0.
        num: 100, // Determines the number of results to show, defaults to 10. Maximum is 100.
    },
}

Publish puppeteer cluster fork

We need to publish puppeteer-cluster fork https://github.com/NikolaiT/puppeteer-cluster

Because

  • More easier to publish se-scraper
  • Avoid using git-submodule and npm postinstall hook
  • Avoid dowloading 2 times chrome when you do npm install
  • More easier to follow the change and rebase of puppeteer-cluster

We can create an organization on npm like @se-scraper/puppeteer-cluster

Or the better way would be to make our PR reworked and merge in puppeteer-cluster project.

Help wanted

image

I need help guys, I think I added a couple of vulns :/

custom results selector

With the python version I recall being able to get google ads with the search results.
But with this version and after looking at the google results selectors I can tell that it is not possible.

Is there a way to give the developer the ability to provide a custom selector ?

Scrape "People also ask" from search engines

Thanks for this wonderful tool! One function I found useful is to scrape "People also ask" section from search engines. Both Google and Bing have this functionality, it often shows up when the input query is some kind of question, e.g. "capital of USA". It includes a list of question/answers related to the input query.

Thanks!

Dockerfile

Hi,

I'm trying to build the docker image from a cloned repo (i needed to add littles changes) but the build crash due to:

npm ERR! missing script: build npm ERR! A complete log of this run can be found in: npm ERR! /root/.npm/_logs/2019-11-27T14_55_05_361Z-debug.log The command '/bin/sh -c npm install && npm run build' returned a non-zero code: 1

I've tried to add "build": "npm build ." in the script part of package.json, the build is now passing but i can't run it.

any idea ?

Baidu wrong

Hello, Baidu is really used. There are a lot of problems in it, it doesn't support multiple pages, and multiple search words will cause duplication. I guess it is not cleaned before typing.

Unable to enable google's

By default google searches will filter duplicate/similar results. Normally, to override this behavior the filter=0 HTTP parameter is included in the google search request's URL.

I believed the following scrape_job config would provide this override:

let scrape_job = {
    keywords: ["my duplicate generating search"],
    search_engine: 'google',
    google_settings: {
	filter: 0, 
    },
};

However se-scraper still iterates over google search results as if the google's duplicate filtering is still enabled.

Any suggestions to force se-scraper / google to not omit similar search results?

Limit the number of search result

How can I limit the number of search result for example for google_image take 2 images only ?
I don't find the properties to set the result count.
Thanks !

Examples require statements

I think examples containing the following

const se_scraper = require('./../src/node_scraper.js');

Need to be replaced with

const se_scraper = require('se-scraper');

json output error

the json output seems to have some error... when processing the output json file

Formatting Scraper Results

Hey Everyone!

Loving the tool but slightly new to it all. Just wondering if anyone could help we out.

I'm wanting to be able to format the results data from the scrape into a table, either HTML or a formatted CSV, just to make it a little easier to read.

Does anyone know a way in which I can do this? Apologies if this seems like a stupid question.

Thanks.

Rotating proxy avaible ?

Hi,

Is there any possibility to retry request if response code is != 200 ? or if connection to proxy failed ?

/BR

Browser Disconnect after crash

I am using Bing Search to crawl search results but after a while it throws an error and fails to continue after that saying browser is disconnected:
Here is the log:

Problem with scraping Marketing-Literatur filetype:pdf in search engine bing: Error: Navigation failed because browser has disconnected!
(node:11024) UnhandledPromiseRejectionWarning: Error: Timeout hit: 120000
    at Object.<anonymous> (C:\Users\neyazee\node_modules\se-scraper\src\puppeteer-cluster\dist\util.js:67:23)
    at Generator.next (<anonymous>)
    at fulfilled (C:\Users\neyazee\node_modules\se-scraper\src\puppeteer-cluster\dist\util.js:4:58)
    at <anonymous>
(node:11024) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:11024) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
[i] bing scrapes keyword "Sachbuchgeschichte filetype:pdf" on page 7

My code to crawl Bing:

var fs = require('fs');
var path = require('path');
var os = require("os");

const se_scraper = require('se-scraper');

var filepath_de = path.join(__dirname, '/data/keywords_de.txt');
var filepath_fr = path.join(__dirname, '/data/keywords_fr.txt');

function read_keywords_from_file(fpath) {
    let kws =  fs.readFileSync(fpath).toString().split(os.EOL);
    // clean keywords
    kws = kws.filter((kw) => {
        return kw.trim().length > 0;
    });
    return kws;
}

let keywords_fr = read_keywords_from_file(filepath_fr);
let keywords_de = read_keywords_from_file(filepath_de);

const Cluster = {
    CONCURRENCY_PAGE: 1, // shares cookies, etc.
    CONCURRENCY_CONTEXT: 2, // no cookie sharing (uses contexts)
    CONCURRENCY_BROWSER: 3, // no cookie sharing and individual processes (uses contexts)
};


// those options need to be provided on startup
// and cannot give to se-scraper on scrape() calls
let browser_config = {
    // the user agent to scrape with
    user_agent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36',
    // if random_user_agent is set to True, a random user agent is chosen
    random_user_agent: true,
    verbose: true,
    // whether to start the browser in headless mode
    headless: false,
    // whether debug information should be printed
    // level 0: print nothing
    // level 1: print most important info
    // ...
    // level 4: print all shit nobody wants to know
    debug_level: 2,
    is_local: false,
    throw_on_detection: false,
    puppeteer_cluster_config: {
        headless: false,
        timeout: 2 * 60 * 1000, // max timeout set to 2 minutes
        monitor: false,
        concurrency: 3, // one scraper per tab
        maxConcurrency: 5, // scrape with 5 tabs
    }
};

(async () => {
    // scrape config can change on each scrape() call

    // scrape config can change on each scrape() call
    let scrape_config_bing_de = {
        // which search engine to scrape
        search_engine: 'bing',
        // an array of keywords to scrape
        keywords: keywords_de,
        // the number of pages to scrape for each keyword
        num_pages: 10,

        // OPTIONAL PARAMS BELOW:
        // https://docs.microsoft.com/en-us/rest/api/cognitiveservices-bingsearch/bing-web-api-v5-reference#query-parameters
        bing_settings: {
            cc: 'DE', // The cc parameter determines the country to use for the query.
            mkt: 'de-DE', // The mkt parameter determines the UI language to return results.
            offset: 0, // Determines the results offset to use, defaults to 0.
            count: 20, // Determines the number of results to show, defaults to 10. Maximum is 100.
        },
        // how long to sleep between requests. a random sleep interval within the range [a,b]
        // is drawn before every request. empty string for no sleeping.
        sleep_range: '',
        // path to output file, data will be stored in JSON
        output_file: 'results/bing_de.json',
        // whether to prevent images, css, fonts from being loaded
        // will speed up scraping a great deal
        block_assets: true,
        // check if headless chrome escapes common detection techniques
        // this is a quick test and should be used for debugging
        test_evasion: false,
        apply_evasion_techniques: true,
        // log ip address data
        log_ip_address: false,
        // log http headers
        log_http_headers: false,
    };

    let results = await se_scraper.scrape(browser_config, scrape_config_bing_de);
    console.dir(results, {depth: null, colors: true});

})();

Is there something I am missing here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.