Code Monkey home page Code Monkey logo

urlnormalization's Introduction

Code Coverage Nuget

Toimik.UrlNormalization

.NET 8 C# URL normalizer.

Features

URL normalization, also known as URL canonicalization, is the process of normalizing (standardizing) the text representation of a URL to determine if differently-formatted URLs are identical.

All URLs

  • Duplicate slashes are removed
    file://example.com/foo//bar.htmlfile://example.com/foo/bar.html

  • Default port is removed
    ftp://example.com:21/ftp://example.com/

  • Dot-segments are removed
    file://example.com/foo/./bar/baz/../quxfile://example.com/foo/bar/qux

  • Empty path is converted to "/"
    ftp://example.comftp://example.com/

  • Percent-encoded triplets are uppercased
    ftp://example.com/foo%2aftp://example.com/foo%2A

  • Percent-encoded triplets of unreserved characters are decoded
    ftp://example.com/%7Efooftp://example.com/~foo

  • Scheme and host are lowercased
    FTP://[email protected]/Fooftp://[email protected]/Foo

HTTP-specific URLs

  • Directory index can be removed (optional, via removableDirectoryIndexNames)
    http://example.com/default.asphttp://example.com/
    http://example.com/a/index.htmlhttp://example.com/a/

  • Fragment can be removed (optional, via isFragmentIgnored)
    http://example.com/bar.html#section1http://example.com/bar.html

  • Scheme can be changed (optional, via PreferredScheme)
    https://example.com/http://example.com/

  • Query parameters are sorted
    http://example.com/display?lang=en&article=fredhttp://example.com/display?article=fred&lang=en

  • User-info can be removed (optional, via isUserInfoIgnored)
    http://user:[email protected]http://example.com/

  • Empty query is removed
    http://example.com/display?http://example.com/display

Quick Start

Installation

Package Manager

PM> Install-Package Toimik.UrlNormalization

.NET CLI

> dotnet add package Toimik.UrlNormalization

Usage

UrlNormalizer.cs

// Use default arguments
// var normalizer = new UrlNormalizer();

// Use custom arguments
var normalizer = new UrlNormalizer(isAdjacentSlashesCollapsed: false);

var url = ...
var normalizedlUrl = normalizer.Normalize(url);

HttpUrlNormalizer.cs

// Use default arguments
// var normalizer = new HttpUrlNormalizer();

// Use custom arguments
var normalizer = new HttpUrlNormalizer(
    preferredScheme: "https",
    isUserInfoIgnored: false,
    removableDirectoryIndexNames: new HashSet<string>(0), // override the default
    isFragmentIgnored: false);

var url = ...
var normalizedlUrl = normalizer.Normalize(url);

urlnormalization's People

Contributors

nurhafiz avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.