Code Monkey home page Code Monkey logo

xsitemap's Introduction

xsitemap

Description

This R' package aim to ease the work with XML sitemap and SEO related tasks. Tutorials will come later

Install

#Github (dev version)
library(devtools)
devtools::install_github("pixgarden/xsitemap")

Getting started

load xsitemap package

library(xsitemap)

xsitemap Functions

1. xsitemapGet()

This is the main function. Add domain hostname or an XML URL as a parameter

xsitemap_urls <- xsitemapGet("https://www.nationalarchives.gov.uk/")

2. xsitemapCheckHTTP()

Will check if sitemap urls are sending 200 http code.Beware it can take some time depending on the number of URLs

xsitemap_urls_http <- xsitemapCheckHTTP(xsitemap_urls)

3. xsitemapGuess.R()

Will try to guess XML Urls in this order:

sitemap_index.xml, sitemaps.xml, sitemap.xml, sitemap-index.xml", sitemap.xml.gz

4. xsitemapGetFromRobotsTxt()

Will search for xml sitemap URL inside robots.txt

5. xsitemapCheckWordpress()

Will check classic Wordpress sitemap urls

Tutorials

/!\ Work in progress /!\

English : https://www.gokam.co.uk/xsitemap-package/

French : https://www.gokam.fr/xsitemap/

Feedbacks

Questions and feedbacks welcome!

You want to contribute ? Open a pull request ;-) If you encounter a bug or want to suggest an enhancement, please open an issue.

  • François

xsitemap's People

Contributors

elalbaicin avatar pixgarden avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

xsitemap's Issues

don't work when sitemap URLs finish with .xml.gz

https://www.booking.com/sitembk-reviews-index-hotel-review.xml

<sitemap> <loc> http://www.booking.com/sitembk-reviews-hotel-review.en-gb.0000.xml.gz </loc> <lastmod>2018-01-15</lastmod> </sitemap> <sitemap> <loc> http://www.booking.com/sitembk-reviews-hotel-review.de.0000.xml.gz </loc> <lastmod>2018-01-15</lastmod> </sitemap> <sitemap> <loc> http://www.booking.com/sitembk-reviews-hotel-review.en-us.0000.xml.gz </loc> <lastmod>2018-01-15</lastmod> </sitemap>

Error in curl::curl_fetch_memory when Drupal two pages sitemap.

When your site have more then 2000 links, Drupal start separate sitemap in two pages.

Examples:
https://www.tm.gov.lv/sitemap.xml
https://www.sam.gov.lv/sitemap.xml

xsitemap get correct second url "http://www.tm.gov.lv/sitemap.xml?page=1" , but not get content. Show error

Reaching for XML sitemap... http://www.tm.gov.lv/sitemap.xml?page=1
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Failure when receiving data from the peer

XML Parsing Issue

When using xsitemapGet(x) I'm getting a lot of url's returning
"xmlParseEntityRef: no name" along with opening & ending tag mismatch. Although this package works flawlessly for the vast majority of url's.
After reading this stackexchange post https://stackoverflow.com/questions/7604436/xmlparseentityref-no-name-warnings-while-loading-xml-into-a-php-file
I'm thinking potentially the issue is caused by invalid XML on the target domain, potentially due to a validation issue? and wondering if that post gave clues on how to modify the function to extract more sitemaps through better handling/validation of poorly constructed XML sitemaps?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.