Code Monkey home page Code Monkey logo

csv-validator's Introduction

CSV Validator

Version

.NET Core CSV text file validator. Enables the quick verification of column separated data files. Columns can be checked against multiple requirements for correctness.

This application is provided via a CLI and a NuGet package. Details for using both are provided below.

CLI Usage

The application is command line based, and has two arguments:

validate --file "input-datafile.csv" --with "configuration.json"

Configuration

To configure the verification a JSON file is used with the following format:

{
	"rowSeperator": "\r\n",
	"columnSeperator": ",",
	"hasHeaderRow": true,
	"columns": {
		"1": {
			"name": "ID",
			"isRequired": true,
			"unique": true
		},
		"2": {
			"name": "DOB",
			"pattern": "^\\d\\d\\d\\d-\\d\\d-\\d\\d$"
		},
		"3": {
			"name": "NOTES",
			"maxLength": "250"
		}
	}
}

The pattern property uses regular expressions but it is important to escape the characters else the application will fail when reading the configuration file.

rowSeperator can be any number of characters, rows can also be separated by characters and do not need the new line characters to be available in the input file.

columnSeperator can be one or more characters.

The columns require the number, which is the ordinal of the column in the input file, you do not need to specify all columns, only those that are to be validated.

Supported validation

{
    // validates the column has content
    "isRequired": true|false,
    // validates the content is unique in this column across the full file
    "unique": true|false,
    // validates a string against a regular expression
    "pattern": "regular expression string",
    // Maximum allowable length for a column
    "maxLength": "int",
    // Check if content is numerical
    "isNumeric": true|false
}

API Usage

CSV Validator is also available as a NuGet package, to enable in application validation of text files. The API conforms to netstandard 2.0.

Installation

dotnet add package csvvalidator
<ItemGroup>
  <PackageReference Include="csvvalidator" Version="1.0.1" />
</ItemGroup>

Usage

Validator validator = Validator.FromJson(config);
RowValidationError[] errors = validator.Validate(inputStream);

Dealing with validation errors.

Errors are reported heirarchicly, by row and then columns.

foreach(RowValidationError current in errors) 
{
	// row errors provides details of the row number and the content
	Console.WriteLine($"Errors in row[{current.Row}]: {current.Content}");
	foreach(ValidationError error in current.Errors)
	{
		// all errors then that occur on that row are reported in the error collection
		Console.WriteLine($"{error.Message} at {error.AtCharacter}");
	}
}

csv-validator's People

Contributors

alexander-sml avatar barry-jones avatar limking24 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

csv-validator's Issues

Improve validator instantiation

Update validator configuration to make constructing validator configurations easier.

Examples:

ValidatorConfiguration config = new ValidatorConfiguration();
config.Set(1, isNumeric: true);
config.Set(2, maxLength: 20, isRequired: true);
Validator validator = Validator.FromConfiguration(config);

provide a summary of the errors

After the validation is complete currently the application only provides information about the overall failure or success of the file. When a failure occurs it should display a summary of the errors.

Feature Request - Header validation

If a header is expected:

  • Validate the number of columns expected (eg: if we specify 10 columns in the definition check that we have 10 headers)
  • Validate the header name - if the 'name' property is defined in the schema

does not support header rows

Header rows are not validated and when a header row is checked against the row requirements it can produce invalid failures

Handle quoted columns

When a file uses a comma to separate fields, commas in quoted fields are not escaped so they get processed as a new column.

Might be worth looking at CsvHelper as they seem to handle this

NuGet package

Hi, this looks awesome.

Any chance of deploying as a NuGet package to use as a library within another application.

Quoted column regex weirdness

Looks like the Regex isn't a bit too capture happy :(

Example with the following input:
somthing,before,\"just, a string\",seperated,after

It actually returns:
image

Provide an indicator of progress

Large files, especially those which will pass validation do not give any indication to the user that the application is progressing.

Max column check

a) Is there any way to reject if the file has additional columns which are not defined?
or
b) Can I specify maxColumns, along with columnSeperator, which is checked for each row?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.