Code Monkey home page Code Monkey logo

ya-csv's Introduction

ya-csv

Event based CSV parser and writer for Node.js suitable for processing large CSV streams.

  • Designed for high performance and ease of use.
  • RFC 4180 compliance with optional extensions.
  • Zero dependencies.

Example

// A simple echo program:
var csv = require('ya-csv');

var reader = csv.createCsvStreamReader(process.openStdin());
var writer = csv.createCsvStreamWriter(process.stdout);

reader.addListener('data', function(data) {
    writer.writeRecord(data);
});

reader.addListener('error', function(e) {
    console.error('Oops!');
});

Installation

npm install ya-csv

Current version requires at least Node.js v0.2.3 and it's tested with Node.js v0.4.12, 0.6.11, 0.7.5 and v0.10.24. Hope it works with the other versions in between too.

Features

  • event based, suitable for processing big CSV streams
  • configurable separator, quote and escape characters (comma, double-quote and double-quote by default)
  • ignores lines starting with configurable comment character (off by default)
  • supports memory-only streaming

More examples

Echo first column of the data.csv file:

// equivalent of csv.createCsvFileReader('data.csv') 
var reader = csv.createCsvFileReader('data.csv', {
    'separator': ',',
    'quote': '"',
    'escape': '"',       
    'comment': '',
});
var writer = new csv.CsvWriter(process.stdout);
reader.addListener('data', function(data) {
    writer.writeRecord([ data[0] ]);
});

Return data in objects rather than arrays: either by grabbing the column names from the header row (first row is not passed to the data listener):

var reader = csv.createCsvFileReader('data.csv', { columnsFromHeader: true });
reader.addListener('data', function(data) {
    // supposing there are so named columns in the source file
    sys.puts(data.col1 + " ... " + data.col2);
});

... or by providing column names from the client code (first row is passed to the data listener in this case):

var reader = csv.createCsvFileReader('data.csv');
reader.setColumnNames([ 'col1', 'col2' ]);
reader.addListener('data', function(data) {
    sys.puts(data.col1 + " ... " + data.col2);
});

Note reader.setColumnNames() resets the column names so next invocation of the data listener will again receive the data in an array rather than an object.

Convert the /etc/passwd file to comma separated format, drop commented lines and dump the results to the standard output:

var reader = csv.createCsvFileReader('/etc/passwd', {
    'separator': ':',
    'quote': '"',
    'escape': '"',
    'comment': '#',
});
var writer = new csv.CsvWriter(process.stdout);
reader.addListener('data', function(data) {
    writer.writeRecord(data);
});

Parsing an upload as the data comes in, using node-formidable:

upload_form.onPart = function(part) {
    if (!part.filename) { upload_form.handlePart(part); return }

    var reader = csv.createCsvFileReader({'comment': '#'});
    reader.addListener('data', function(data) {
        saveRecord(data);
    });

    part.on('data', function(buffer) {
        // Pipe incoming data into the reader.
        reader.parse(buffer);
    });
    part.on('end', function() {
        reader.end()
    })
}

CsvReader Options

Note: the defaults are based on the values from RFC 4180 - https://tools.ietf.org/html/rfc4180

  • separator - field separator (delimiter), default: ',' (comma)
  • quote - the character used to enclose fields with white space characters, escaping etc., default: '"' (double quote)
  • escape - character used to escape the quote inside a field, default: '"' (double quote). If you are changing quotechar you may want to change the escape to the same value
  • comment - parser will ignore this character and all following characters on the same line the line, default: none
  • columnNames - an array of column names, if used, the rows sent to the data listener are represented as hashes instead of arrays, default: none
  • columnsFromHeader - boolean value indicating whether the first row should be interpreted as a list of header names. If used, the rows sent to the data listener are represented as hashes instead of arrays, default: false
  • nestedQuotes - boolean value indicating whether the parser should try to process a file with unescaped quote characters inside fields, default: false
  • flags - a string with flags to be passed through to createRead/WriteStream (only supported via createCsvFileReader and createCsvFileWriter methods), default: none

CSvWriter Options

  • separator - field separator (delimiter), default: ',' (comma)
  • quote - the character used to enclose fields with white space characters, escaping etc., default: '"' (double quote)
  • escape - character used to escape the quote inside a field, default: '"' (double quote). If you are changing quotechar you may want to change the escape to the same value
  • escapeFormulas - boolean value indicating whether the parser should escape '=', '+' and '-' with an apostrophe to prevent some programs from treating the content as an executable formula, default: false

ya-csv's People

Contributors

73rhodes avatar amilajack avatar blakmatrix avatar cstigler avatar dominykas avatar esatterwhite avatar freewil avatar heycalmdown avatar jason-cooke avatar koles avatar leesei avatar miracle2k avatar nerdytoddgerdy avatar tedeh avatar tootallnate avatar tzellman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ya-csv's Issues

publish 0.9.3 to npm

Seems while the version in package.json was bumped and 0.9.3 was tagged it was never published to the npm.

Escaped Quotes at the end of a chunk are incorrectly ignored.

If the last character of a chunk is a double quote and the next character at the beginning of the next chunk is also a double quote, both characters will be ignored and the output will not output any characters.

I believe this has to do with using data.charAt(i+1) and not looking at the previous field if data.charAt(i+1) is null.

Example:

test,123,"quote""d field here"

If the stream chunk ends here:

test,123,"quote""d field here"
               ^

The output value

test,123,"quoted field here

when it should read

test,123,"quote"d field here

Console.loged reader out put for a large file, and got "... 924 more items" instead of the other items.

It's a pretty big csv file. Instead of all the columns, a lot of the output ends with "... X more items". How do I output all of them?

var reader = csv.createCsvFileReader('FB-PageFocus-July2017-To-Sept112017.csv', {
    'separator': ',',
    'quote': '"',
    'escape': '"',
    'comment': '',
});
var writer = new csv.CsvWriter(process.stdout);
// reader.addListener('data', function(data) {
//     console.log([ data ]);
// });

reader.addListener('data', function(data) {
    console.log(data);
});

reader.addListener('error', function(e) {
    console.error('Oops!');
});

Support writing object records

Hi,

The CsvWriter object should support two additional options:
a) an option that defines the column names of objects to be written with writeRecord (e.g. { columnNames : ["name", "age"] } ).
b) an option that defines whether the column names should be written to the output as a first line (e.g. {columnsAsHeader : true} ).

This way, CsvWrite could be used to write objects (dictionaries) instead of just arrays.

Don't crash when passed a binary file

Now it crashes all nodejs app:

node_modules/ya-csv/lib/ya-csv.js:131
 throw new Error("separator expected after a closin
Error: separator expected after a closing quote; found 

It should handle the error gracefully.

how to run this project ?

Hi,
I am new here, just downloaded your project, I would like to run it, can you please show me how ?
thanks

Add unicode support!

At the moment the parser does not support anything but the simplest ascii file format (as far as I an tell). E.g. scandinavian åäöÅÄÖ characters all comes out as 0xFFFD.

How to catch error

Hi,

first of all, thanks for this excellent module.
Question : How error are supposed to be handled ?
The api seems to provide an 'error' event.
But even if you add an 'error' listener, there's a lot of throw new Error ("...") in the code base that makes error handling difficult.

Add ability to parse strings

Right now it only parses streams or file but there is no way to parse a string (and there's no easy way to covert string to a stream either).

Improvement: Allow periods in column names for property grouping

Would be nice if it was possible to group properties into objects, e.g. like this:

reader.setColumnNames([ 'adress.street', 'adress.zip', 'adress.city']);

reader.addListener('data', function(data) {
   console.log(data.adress.street);
   console.log(data.adress.zip);   
   console.log(data.adress.city);
});

reader.pause() doesn't seem to work

I'm trying to pause the readstream after 500 lines but it does not seem to work :/
this is my code (in coffeescript):
reader.addListener 'data', (data) ->
lines++
console.log "passed line #{lines}",data
@Pause()
return
I expect this code to read only 1 line and than pause and it doesn't stop, am I missing smthing?

There is no clear way to close a CsvFileWriter

After creating a CsvFileWriter and writing data to file there is no documented way to close the writer, resulting in a file descriptor leak. I've managed to close it by using writer.writeStream.end().

Example:

    writer = csv.createCsvFileWriter('file.csv', {'flags': 'a'});
    var csvData = new Array();
    csvData.push(id);
    csvData.push(name);
    writer.writeRecord(csvData);
    writer.writeStream.end(); // this is how to close the file descriptor

NPM install not loading

When I try and require the parser after installing through npm I get:

Type '.help' for options.
node> require('ya-csv')
Error: Cannot find module 'ya-csv'
    at loadModule (node.js:275:15)
    at require (node.js:411:14)
    at cwdRequire (repl:27:10)
    at [object Context]:1:1
    at Interface.<anonymous> (repl:78:19)
    at Interface.emit (events:26:26)
    at Interface._ttyWrite (readline:281:12)
    at Interface.write (readline:123:27)
    at Stream.<anonymous> (repl:59:9)
    at Stream.emit (events:26:26)

perhaps an index.js problem?

NPM verifies that it's installed:

$ npm ls ya-csv
npm it worked if it ends with ok
npm cli [ 'ls', 'ya-csv' ]
npm version 0.1.26
npm config file /Users/jlarson/.npmrc
npm config file /usr/local/etc/npmrc
npm GET /
[email protected]                             =koles remote
[email protected]                             =koles remote
[email protected]                             =koles remote
[email protected]                             =koles installed latest remote
npm ok

Last field is set as undefined if it is empty

If the last field in the csv is empty then it will be set as undefined instead of the expected ''

Example file:

id,name,specials
1,John Doe,
2,Jane Doe,Javascript
3,Jake Doe,

Expected output:

{ id: '1', name: 'John Doe', specials: '' }
{ id: '2', name: 'Jane Doe', specials: 'Javascript' }
{ id: '3', name: 'Jake Doe', specials: '' }

Received output:

{ id: '1', name: 'John Doe', specials: '' }
{ id: '2', name: 'Jane Doe', specials: 'Javascript' }
{ id: '3', name: 'Jake Doe', specials: undefined }

Option to make escaping optional

I think it would be good to only use escaping for fields when it is required.

I have seen some CSV implementations, doing this, and unless this goes against any rule I think it's great to save some space.

createCsvStreamRead readStream set to undefined?

This is more of a question than an issue. This method (createCsvStreamReader) sets the readStream to undefined if no options are passed through:

csv.createCsvStreamReader = function(readStream, options) {
    if (options === undefined && typeof readStream === 'object') {
        options = readStream;
        readStream = undefined;
    }
    options = options || {};
    if (readStream) readStream.setEncoding(options.encoding || 'utf8');
    return new CsvReader(readStream, options);
};

It seem like really strange behaviour, and my reader was not receiving any events because of this. Is this actually intended behaviour?

incorrectly parsed file with 3 double quotes and no end of lines

The following CSV contains two lines, each containing 4 records. However, the ya-csv parser reports only 1 line with 5 columns.

== begin ==
one,"""two"", two-and-half","three","four"
1,2,3,4
== end == (no EOL after the previous line)

Caused by two bugs:

  • field starting with """ is not marked quoted
  • last line without a trailing end of line marker is ignored

Thanks to zd at gooddata for reporting this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.