Code Monkey home page Code Monkey logo

generate-sitemap's Introduction

generate-sitemap

GitHub release (latest by date) build GitHub GitHub top language

The generate-sitemap GitHub action generates a sitemap for a website hosted on GitHub Pages, and has the following features:

  • Support for both xml and txt sitemaps (you choose using one of the action's inputs).
  • When generating an xml sitemap, it uses the last commit date of each file to generate the <lastmod> tag in the sitemap entry.
  • Supports URLs for html and pdf files in the sitemap, and has inputs to control the included file types (defaults include both html and pdf files in the sitemap).
  • Checks content of html files for <meta name="robots" content="noindex"> directives, excluding any that do from the sitemap.
  • Parses a robots.txt, if present at the root of the website, excluding any URLs from the sitemap that match Disallow: rules for User-agent: *.
  • Sorts the sitemap entries in a consistent order, such that the URLs are first sorted by depth in the directory structure (i.e., pages at the website root appear first, etc), and then pages at the same depth are sorted alphabetically.

The generate-sitemap GitHub action is designed to be used in combination with other GitHub Actions. For example, it does not commit and push the generated sitemap. See the Examples for examples of combining with other actions in your workflow.

Requirements

This action relies on actions/checkout@v2 with fetch-depth: 0. Setting the fetch-depth to 0 for the checkout action ensures that the generate-sitemap action will have access to the commit history, which is used for generating the <lastmod> tags in the sitemap.xml file. If you instead use the default when applying the checkout action, the <lastmod> tags will be incorrect. So be sure to include the following as a step in your workflow:

    steps:
    - name: Checkout the repo
      uses: actions/checkout@v2
      with:
        fetch-depth: 0 

Inputs

path-to-root

Required The path to the root of the website relative to the root of the repository. Default . is appropriate in most cases, such as whenever the root of your Pages site is the root of the repository itself. If you are using this for a GitHub Pages site in the docs directory, such as for a documentation website, then just pass docs for this input.

base-url-path

Required This is the url to your website. You must specify this for your sitemap to be meaningful. It defaults to https://web.address.of.your.nifty.website/ for demonstration purposes.

include-html

Required This flag determines whether html files are included in your sitemap. Default: true.

include-pdf

Required This flag determines whether pdf files are included in your sitemap. Default: true.

sitemap-format

Required Use this to specify the sitemap format. Default: xml. The sitemap.xml generated by the default will contain lastmod dates that are generated using the last commit dates of each file. Setting this input to anything other than xml will generate a plain text sitemap.txt simply listing the urls.

Outputs

sitemap-path

The generated sitemap is placed in the root of the website. This output is the path to the generated sitemap file relative to the root of the repository. If you didn't use the path-to-root input, then this output should simply be the name of the sitemap file (sitemap.xml or sitemap.txt).

url-count

This output provides the number of urls in the sitemap.

excluded-count

This output provides the number of urls excluded from the sitemap due to <meta name="robots" content="noindex"> within html files.

Examples

Example 1: Minimal Example

In this example workflow, we use all of the default inputs except for the base-url-path input. The result will be a sitemap.xml file in the root of the repository. After completion, it then simply echos the outputs.

name: Generate xml sitemap

on:
  push:
    branches:
      - master

jobs:
  sitemap_job:
    runs-on: ubuntu-latest
    name: Generate a sitemap
    steps:
    - name: Checkout the repo
      uses: actions/checkout@v2
      with:
        fetch-depth: 0 
    - name: Generate the sitemap
      id: sitemap
      uses: cicirello/[email protected]
      with:
        base-url-path: https://THE.URL.TO.YOUR.PAGE/
    - name: Output stats
      run: |
        echo "sitemap-path = ${{ steps.sitemap.outputs.sitemap-path }}"
        echo "url-count = ${{ steps.sitemap.outputs.url-count }}"
        echo "excluded-count = ${{ steps.sitemap.outputs.excluded-count }}"

Example 2: Webpage for API Docs

This example workflow illustrates how you might use this to generate a sitemap for a Pages site in the docs directory of the repository. It also demonstrates excluding pdf files, and configuring a plain text sitemap.

name: Generate API sitemap

on:
  push:
    branches:
      - master

jobs:
  sitemap_job:
    runs-on: ubuntu-latest
    name: Generate a sitemap
    steps:
    - name: Checkout the repo
      uses: actions/checkout@v2
      with:
        fetch-depth: 0 
    - name: Generate the sitemap
      id: sitemap
      uses: cicirello/[email protected]
      with:
        base-url-path: https://THE.URL.TO.YOUR.PAGE/
        path-to-root: docs
        include-pdf: false
        sitemap-format: txt
    - name: Output stats
      run: |
        echo "sitemap-path = ${{ steps.sitemap.outputs.sitemap-path }}"
        echo "url-count = ${{ steps.sitemap.outputs.url-count }}"
        echo "excluded-count = ${{ steps.sitemap.outputs.excluded-count }}"

Example 3: Combining With Other Actions

Presumably you want to do something with your sitemap once it is generated. In this example workflow, we combine it with the action peter-evans/create-pull-request. First, the cicirello/generate-sitemap action generates the sitemap. And then the peter-evans/create-pull-request monitors for changes, and if the sitemap changed will create a pull request.

name: Generate xml sitemap

on:
  push:
    branches:
      - master

jobs:
  sitemap_job:
    runs-on: ubuntu-latest
    name: Generate a sitemap
    steps:
    - name: Checkout the repo
      uses: actions/checkout@v2
      with:
        fetch-depth: 0 
    - name: Generate the sitemap
      id: sitemap
      uses: cicirello/[email protected]
      with:
        base-url-path: https://THE.URL.TO.YOUR.PAGE/
    - name: Create Pull Request
      uses: peter-evans/create-pull-request@v3
      with:
        title: "Automated sitemap update"
        body: > 
          Sitemap updated by the [generate-sitemap](https://github.com/cicirello/generate-sitemap) 
          GitHub action. Automated pull-request generated by the 
          [create-pull-request](https://github.com/peter-evans/create-pull-request) GitHub action.

License

The scripts and documentation for this GitHub action is released under the MIT License.

generate-sitemap's People

Contributors

cicirello avatar muratkalenderoglu avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.