Code Monkey home page Code Monkey logo

php-webmaster-tools-downloads's Introduction

GWTdata: Download website data from Google Webmaster Tools as CSV.

Introduction

This project provides an easy way to automate downloading of data tables from Google Webmaster Tools and tries to provide a PHP alternative to the Python script available here http://code.google.com/p/webmaster-tools-downloads/, for downloading CSV files from Google Webmaster Tools.

Unlike the python script (or a perfect clone), this solution does NOT require an extra client library or zend package be installed in order to run. Also it has some advanced functionality.

Features

Since the official download list (used by the python script) returns download URLs for 1.) Top Search Queries and 2.) Top Pages only, but via the web interface there're much more downloads available, i extended the GWTdata class, so you can now download website data for:

  • TOP_PAGES
  • TOP_QUERIES
  • CRAWL_ERRORS
  • CONTENT_ERRORS
  • CONTENT_KEYWORDS
  • LATEST_BACKLINKS
  • INTERNAL_LINKS
  • EXTERNAL_LINKS
  • SOCIAL_ACTIVITY

Update notice

In case you want to automate downloading crawl errors, please go here: https://github.com/eyecatchup/GWT_CrawlErrors-php

Usage

This document explains how to automate the file download process from Google Webmaster Tools by showing examples for using the php class GWTdata.

Get started

To get started, the steps are as follows:

  • Download the php file gwtdata.php.
  • Create a folder and add the gwtdata.php script to it.

Example 1 - DownloadCSV()

To download CSV data for a single domain name of choice, the steps are as follows:

  • In the same folder where you added the gwtdata.php, create and run the following PHP script.
    You'll need to replace the example values for "email" and "password" with valid login details for your Google Account and for "website" with a valid URL for a site registered in your GWT account.
<?php
	include 'gwtdata.php';
	try {
		$email = "[email protected]";
		$password = "******";

		# If hardcoded, don't forget trailing slash!
		$website = "http://www.domain.com/";

		$gdata = new GWTdata();
		if($gdata->LogIn($email, $password) === true)
		{
			$gdata->DownloadCSV($website);
		}
	} catch (Exception $e) {
		die($e->getMessage());
	}

This will download and save 9 CSV files to your hard disk:

  • ./TOP_PAGES-www.domain.com-YYYYmmdd-H:i:s.csv
  • ./TOP_QUERIES-www.domain.com-YYYYmmdd-H:i:s.csv
  • ./CRAWL_ERRORS-www.domain.com-YYYYmmdd-H:i:s.csv
  • ./CONTENT_ERRORS-www.domain.com-YYYYmmdd-H:i:s.csv
  • ./CONTENT_KEYWORDS-www.domain.com-YYYYmmdd-H:i:s.csv
  • ./LATEST_BACKLINKS-www.domain.com-YYYYmmdd-H:i:s.csv
  • ./INTERNAL_LINKS-www.domain.com-YYYYmmdd-H:i:s.csv
  • ./EXTERNAL_LINKS-www.domain.com-YYYYmmdd-H:i:s.csv
  • ./SOCIAL_ACTIVITY-www.domain.com-YYYYmmdd-H:i:s.csv

For an example how to limit the download to top search queries, or top pages etc. only, take a look at example 4.

By default, the files will be saved to the same folder where you added the gwtdata.php (and run the script). However the DownloadCSV() method has a second optional parameter to adjust the savepath - see inline comments in gwtdata.php and/or 2nd example.

Example 2 - GetSites()

To download CSV data for all domains that are registered in your Google Webmaster Tools Account and to save the downloaded files to an extra folder, the steps are as follows:

  • In the same folder where you added the gwtdata.php, create a folder named csv.
  • In the same folder where you added the gwtdata.php, create and run the following PHP script.
    You'll need to replace the example values for "email" and "password" with valid login details for your Google Account.
<?php
	include 'gwtdata.php';
	try {
		$email = "[email protected]";
		$password = "******";

		$gdata = new GWTdata();
		if($gdata->LogIn($email, $password) === true)
		{
			$sites = $gdata->GetSites();
			foreach($sites as $site)
			{
				$gdata->DownloadCSV($site, "./csv");
			}
		}
	} catch (Exception $e) {
		die($e->getMessage());
	}

This will download 8 CSV files for each domain that is registered in your Google Webmaster Tools Account and save them to the csv folder.

Example 3 - GetDownloadedFiles()

Same as example two, but using the GetDownloadedFiles() method to get feedback what files have been saved to your hard disk (returning absolute paths).

<?php
	include 'gwtdata.php';
	try {
		$email = "[email protected]";
		$passwd = "******";

		$gdata = new GWTdata();
		if($gdata->LogIn($email, $passwd) === true)
		{
			$sites = $gdata->GetSites();
			foreach($sites as $site)
			{
				$gdata->DownloadCSV($site, "./csv");
			}

			$files = $gdata->GetDownloadedFiles();
			foreach($files as $file)
			{
				print "Saved $file\n";
			}
		}
	} catch (Exception $e) {
		die($e->getMessage());
	}

Example 4 - SetTables()

To download CSV data for a single domain name of choice and top search query data only, the steps are as follows:

  • In the same folder where you added the gwtdata.php, create and run the following PHP script.
    You'll need to replace the example values for "email" and "password" with valid login details for your Google Account and for "website" with a valid URL for a site registered in your GWT account.
<?php
	include 'gwtdata.php';
	try {
		$email = "[email protected]";
		$password = "******";

		# If hardcoded, don't forget trailing slash!
		$website = "http://www.domain.com/";

		# Valid values are "TOP_PAGES", "TOP_QUERIES", "CRAWL_ERRORS",
		# "CONTENT_ERRORS", "CONTENT_KEYWORDS", "INTERNAL_LINKS",
		# "EXTERNAL_LINKS", "SOCIAL_ACTIVITY", and "LATEST_BACKLINKS".
		$tables = array("TOP_QUERIES");

		$gdata = new GWTdata();
		if($gdata->LogIn($email, $password) === true)
		{
			$gdata->SetTables($tables);
			$gdata->DownloadCSV($website);
		}
	} catch (Exception $e) {
		die($e->getMessage());
	}

This will download and save one file only: ./TOP_QUERIES-www.domain.com-Ymd-H:i:s.csv

Example 5 - SetDaterange()

To download CSV data for all domains that are registered in your Google Webmaster Tools Account and for a specific date range only, the steps are as follows:

  • In the same folder where you added the gwtdata.php, create and run the following PHP script.
    You'll need to replace the example values for "email" and "password" with valid login details for your Google Account.
<?php
	include 'gwtdata.php';
	try {
		$email = "[email protected]";
		$password = "******";

		# Dates must be in valid ISO 8601 format.
		$daterange = array("2012-01-10", "2012-01-12");

		$gdata = new GWTdata();
		if($gdata->LogIn($email, $password) === true)
		{
			$gdata->SetDaterange($daterange);

			$sites = $gdata->GetSites();
			foreach($sites as $site)
			{
				$gdata->DownloadCSV($site);
			}
		}
	} catch (Exception $e) {
		die($e->getMessage());
	}

This will download 9 CSV files (see example #1) for each domain that is registered in your Google Webmaster Tools Account containing data for the specified date range.

Example 6 - SetLanguage()

To download data for all domains that are registered in your Google Webmaster Tools Account and top search query data only and for a specific date range only and you want to use a custom language for the CSV headline, the steps are as follows:

  • In the same folder where you added the gwtdata.php, create and run the following PHP script.
    You'll need to replace the example values for "email" and "password" with valid login details for your Google Account.
<?php
	include 'gwtdata.php';
	try {
		$email = "[email protected]";
		$passwd = "******";

		# Language must be set as valid ISO 639-1 language code.
		$language = "de";

		# Dates must be in valid ISO 8601 format.
		$daterange = array("2012-01-01", "2012-01-02");

		# Valid values are "TOP_PAGES", "TOP_QUERIES", "CRAWL_ERRORS",
		# "CONTENT_ERRORS", "CONTENT_KEYWORDS", "INTERNAL_LINKS",
		# "EXTERNAL_LINKS", "SOCIAL_ACTIVITY" and "LATEST_BACKLINKS".
		$tables = array("TOP_QUERIES");

		$gdata = new GWTdata();
		if($gdata->LogIn($email, $passwd) === true)
		{
			$gdata->SetLanguage($language);
			$gdata->SetDaterange($daterange);
			$gdata->SetTables($tables);

			$sites = $gdata->GetSites();
			foreach($sites as $site)
			{
				$gdata->DownloadCSV($site);
			}
		}
	} catch (Exception $e) {
		die($e->getMessage());
	}

This will download one CSV file for each domain that is registered in your Google Webmaster Tools Account containing top queries data for the specified date range and with a german headline.

That's it.

php-webmaster-tools-downloads's People

Contributors

adduc avatar eoghanmurray avatar eyecatchup avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

php-webmaster-tools-downloads's Issues

No CRAWL_ERRORS returned

I'm not getting any CRAWL_ERRORS csv's generated for my tests, can someone check if they are see'ing any.

I do however see them in the crawl errors specific version: https://github.com/eyecatchup/GWT_CrawlErrors-php
Though this uses the api's xml feed method instead of the csv, which atm doesn't seem to give the full results the csv would of.

Jobrapido Fork

Hi, I'm trying to use fork made by Jobrapido for download data from Search Analytics. I'm trying to use only the gwtdata.php file from this fork, because I don't have the other Pre-Requisites like PDO library for PHP and XML library for PHP. Anyone use this fork currently? Does it work? When I call DownloadCSV function from my code with "SEARCH_ANALYTICS" value the scripts download only a html page from Google, but when i use "TOP_PAGES" or "TOP_QUERIES" it works great.

PS: in my code I fix the login problem removing the string "-php" in the gwtdata.php script. I suggest to Jobrapido to do the same thing in his fork.

Login is not working any longer

I now get "HTTP 403 Forbidden" when I try to login to the webmaster tools using this API library. Login information is correct and there is no further description of the occurred error.

Impossible to retrive CONTENT_ERRORS

Hi,

I'm using the last version of your libreri but I'm unable to retrive the "CONTENT_ERRORS" CSV which is the one i'm the most interested in.
Is there something wrong on the current code ?
Has Google changed the name of this information ?

Thanks

403 response

Hi

When requesting from WAMP on my local server it works, but when requesting from the production server 403 is returned..

WAMP response (apache on local windows):

Array
(
    [url] => https://www.google.com/accounts/ClientLogin
    [content_type] => text/plain
    [http_code] => 200
    [header_size] => 299
    [request_size] => 203
    [filetime] => -1
    [ssl_verify_result] => 20
    [redirect_count] => 0
    [total_time] => 2.215
    [namelookup_time] => 0.046
    [connect_time] => 0.062
    [pretransfer_time] => 0.124
    [size_upload] => 611
    [size_download] => 1202
    [speed_download] => 542
    [speed_upload] => 275
    [download_content_length] => 1202
    [upload_content_length] => 611
    [starttransfer_time] => 2.121
    [redirect_time] => 0
    [certinfo] => Array
        (
        )

    [primary_ip] => 77.214.53.152
    [primary_port] => 443
    [local_ip] => 192.168.1.101
    [local_port] => 51516
    [redirect_url] => 
)

Production response:

Array
(
    [url] => https://www.google.com/accounts/ClientLogin
    [content_type] => text/plain
    [http_code] => 403
    [header_size] => 339
    [request_size] => 203
    [filetime] => -1
    [ssl_verify_result] => 0
    [redirect_count] => 0
    [total_time] => 1.088959
    [namelookup_time] => 0.004291
    [connect_time] => 0.008242
    [pretransfer_time] => 0.024296
    [size_upload] => 611
    [size_download] => 362
    [speed_download] => 332
    [speed_upload] => 561
    [download_content_length] => 362
    [upload_content_length] => 611
    [starttransfer_time] => 1.025485
    [redirect_time] => 0
    [redirect_url] => 
    [primary_ip] => 2a00:1450:4001:808::1013
    [certinfo] => Array
        (
        )

    [primary_port] => 443
    [local_ip] => 2a01:4f8:a0:24e3::2
    [local_port] => 40319
)

WMT API deprecated

Your code no longer works due to Google upgrading the webmaster tools / search console api. It is possible to query some analytics data via the new API, i.e. search queries for your site, but it does not appear possible to download other useful data, like external links, index status etc. Do you know of any other way to achieve this?

Max Sites Retrieved

Running the script for Top Queries only maxes out when 28 out ~100 domains are downloaded. Is this related to connection timeouts or other?

Top Queries wrong

Top Queries are not bringing the right data on Top Queries, there's always a difference and it's not a small one.

As far as i could see the data from the URL already comes wrong so it's not a problem with the program, i think.

404 error

I have some problem with the script. Last week it worked perfect for me but now the login generate this response and i don't understand why

Array ( [url] => https://www.google.com/accounts/ClientLogin
[content_type] => text/plain
[http_code] => 404
[header_size] => 338
[request_size] => 203
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 1.116964
[namelookup_time] => 4.8E-5
[connect_time] => 0.019938
[pretransfer_time] => 0.074707
[size_upload] => 609
[size_download] => 65
[speed_download] => 58
[speed_upload] => 545
[download_content_length] => 65
[upload_content_length] => 609
[starttransfer_time] => 1.075156
[redirect_time] => 0
[certinfo] => Array ( )
[primary_ip] => 216.58.192.4
[primary_port] => 443
[local_ip] => 184.168.200.182
[local_port] => 36531
[redirect_url] => )

Anyone can help me?

Data do not match

Hi,

I tried to create a dynamic listing of "trending keywords" using this GWT API. Now I compared the information given by the API with the once I found directly in my Google Webmastertool. There is not one single match!!!

Where the GWT shows positions between 1,0 and 15 for the beste keywords the API result is between 15 and 500!!!

I already tried to set the "date range" to the same value than in the GWTs - but the results are the same. What happened here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.