secretgeek / csvz Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 2.0 111 KB

The hot new standard in open databases

License: Creative Commons Zero v1.0 Universal

csv data-science no-sql rdbms sql zip

csvz's People

Contributors

Stargazers

Watchers

Forkers

doekman 0xflotus

csvz's Issues

Change type to data type

csvz-meta-relations composite keys

Sometimes (not often) keys are defined as a combination of two columns. In the columns.csv data it would be possible to identify two columns as being "primary-key". Should this mean that the combination of those two columns constitutes a primary key?

Likewise, how would a foreign key relationship in the relations.csv data targeting that primary key be represented?

how to specify delimiter types

could there be a way to specify the delimiters/qualifiers/ and other specifics/options

Make the spec more meta

I think the specification is now too broad.

Imagine the spec only says there can be meta csv, tables, columns and relations tables (specifying the name). Tool makes then can come up with profiles that describe what data in what meta-tables are put, and what the semantics are. These profiles then can be registered and published in this repository. There could be discussion of course.

You could have an ANSI-96-SQL-IMPORT-EXPORT profile (hope they come up with a better name). But you could also have a MY_OPEN_SOURCE_FORM_APPLICATION profile, that describes how data on forms are validated.

The advantage of tool makers (implementors): you only create was is being used. I do think there should be a csvz implementers forum.

So I propose a more minimalistic base specification, with extensions as profiles. The _meta/csv.csv profile can be built in and be called "localized" profile (if you want to call it that).

What do you think?

change relationships.csv to relations.csv

less typing forever.

How can a .tar.z file full of csvs fit into the csvz specs?

Related to #4 ...

Example:

If someone:

unzipped a compliant .csvz file, to a folder “MyData”
Ran tar -z “MyData” (Todo: correct syntax here to specify output name Eg MyData.csv.t.z ??)

...then what standards would this now comply with?

Suggestion: there could be an optional fragment

csvz-0-tz

...which also invites/allows other mutually exclusive 0 sub standards....

Csvz-0-t

... for a compliant tar that is not gzip’d

Csvz-O-7z

...for a compliant 7z file? (Details needed for such)

MIME type

As adoption inevitably grows, a MIME type should be appropriately registered.

link to prior art

see https://github.com/secretGeek/csvz#a-list-of-csvz-compliant-tools-and-libraries -- perhaps add to list

Include Datatypes encoding suggestions

Notes on how to encode:

Binary data (hint base64)
Datetime
Boolean (true/false)
Date
Time
String

Etc

Is .csv.z an acceptable variant?

If I take a single csv file, and then zip it to end up with a .csv.z, is it conformant with the basic standard, or must the file end with a .csvz extension?

Note interop with csvs standard

very very rough draft of https://github.com/secretGeek/csvs has been created.

it interoperates with this standard, so link/mention it.

Add `csvz-meta-meta`

A file can have

_meta/meta.csv

describing which of the standards you claim conformance with. e.g.

fragment	conformance	notes
csvz-0	strict	this csvz file claims strict adherence with csv-0
csvz-meta-tables	strict	yes we have a _meta folder with a tables.csv file in it

...If they only claimed those two rules were followed then it would be up to the consumer to read the files and determine for themselves how to make sense of them.)

(suggestion: tools could generate this, or at least a draft f this, and tools can use this for configuring their own expectactions.

Could there be a csvz folder without zip?

When you unzip a csvz file, you end up with a folder with csv-files.

Can this be considered a csvz-container?
I think it could be useful. For example, when putting datafiles into git.

To differentiate a csvz-container from a folder containing some csv files, I propose such a folder to have the .csvd extension. So if you extract the my_data.csvz file, you get the my_data.csvd folder.

The folder could have the same extension (thus not introducing a new extension). The disadvantage is you can't extract an .csvz file into the same folder without deleting/moving the original file.

Tools are not expected to open these .csvd-folders directly. It's only to denote that when zipped, it's automatically a .csvz file.

Haven't defined zip file

The specification doesn't specify what a Zip file is.

At the least, there should be a link to either the 2015 ISO standard or alternatively Pkware's technical note (including a version number).

Is there a GZip variant?
Similarly, are there 7Zup, bzip2, and rzip variants?
Is Zip64 supported, to allow more than 4GB files?

Make it clear via headings what is required/suggested

... and for each suggestion create a todo lower in the file or an issue I. Here or specify it further down

csv-meta-columns column ordinal

With columns headers being optional in the 4180 spec, it might also be useful to specify/require the column ordinal in the meta-columns file. This would allow attaching headers to a csv via the column metadata.

For that matter, in the tables metadata would be useful to include a "HasHeaders" boolean column.

Add example of using 7z to put csv files into new zip saved as csvz

`meta-per-file` -- allow individual meta files for each file?

Have you considered using a columns meta file per-table instead of putting all columns into a single csv?

So instead of:
_meta/tables.csv
_meta/columns.csv
states.csv
citites.csv

It would be something like:
_meta/tables.csv
_meta/states_columns.csv
_meta/cities_columns.csv
states.csv
cities.csv

The advantage is that it would be easier to get the schema for a single table.

Update toc

(What is a way to do that automatically in vs code?)

Update tool table to list /MarkPflug/Sylvan.Data.CsvZip

-list it’s claims re spec frag and short description.

csvz-meta-columns column types

The spec for this file feels rather useless unless some minimal set of standard types is defined. A well-defined schema would allow an database import tool to construct the appropriate table in the database. Without a standard set fallback to "string" would be needed when an unknown type was encountered.

I would propose as a minimum:

boolean (true/false, 0/1)
int (byte/short/long?)
float (float/double)
date (datetime)
string (might be worth specifying ascii vs unicode)
binary (Base64)

Possibly also include:

guid
time (timespan/duration)