secretgeek / csvz Goto Github PK
View Code? Open in Web Editor NEWThe hot new standard in open databases
License: Creative Commons Zero v1.0 Universal
The hot new standard in open databases
License: Creative Commons Zero v1.0 Universal
Sometimes (not often) keys are defined as a combination of two columns. In the columns.csv data it would be possible to identify two columns as being "primary-key". Should this mean that the combination of those two columns constitutes a primary key?
Likewise, how would a foreign key relationship in the relations.csv data targeting that primary key be represented?
could there be a way to specify the delimiters/qualifiers/ and other specifics/options
I think the specification is now too broad.
Imagine the spec only says there can be meta csv, tables, columns and relations tables (specifying the name). Tool makes then can come up with profiles that describe what data in what meta-tables are put, and what the semantics are. These profiles then can be registered and published in this repository. There could be discussion of course.
You could have an ANSI-96-SQL-IMPORT-EXPORT profile (hope they come up with a better name). But you could also have a MY_OPEN_SOURCE_FORM_APPLICATION profile, that describes how data on forms are validated.
The advantage of tool makers (implementors): you only create was is being used. I do think there should be a csvz implementers forum.
So I propose a more minimalistic base specification, with extensions as profiles. The _meta/csv.csv
profile can be built in and be called "localized" profile (if you want to call it that).
What do you think?
less typing forever.
Related to #4 ...
Example:
If someone:
.csvz
file, to a folder “MyData”...then what standards would this now comply with?
Suggestion: there could be an optional fragment
csvz-0-tz
...which also invites/allows other mutually exclusive 0 sub standards....
Csvz-0-t
... for a compliant tar that is not gzip’d
Csvz-O-7z
...for a compliant 7z file? (Details needed for such)
As adoption inevitably grows, a MIME type should be appropriately registered.
see https://github.com/secretGeek/csvz#a-list-of-csvz-compliant-tools-and-libraries -- perhaps add to list
Notes on how to encode:
Binary data (hint base64)
Datetime
Boolean (true/false)
Date
Time
String
Etc
If I take a single csv file, and then zip it to end up with a .csv.z, is it conformant with the basic standard, or must the file end with a .csvz extension?
very very rough draft of https://github.com/secretGeek/csvs has been created.
it interoperates with this standard, so link/mention it.
A file can have
_meta/meta.csv
describing which of the standards you claim conformance with. e.g.
fragment | conformance | notes |
---|---|---|
csvz-0 | strict | this csvz file claims strict adherence with csv-0 |
csvz-meta-tables | strict | yes we have a _meta folder with a tables.csv file in it |
...If they only claimed those two rules were followed then it would be up to the consumer to read the files and determine for themselves how to make sense of them.)
(suggestion: tools could generate this, or at least a draft f this, and tools can use this for configuring their own expectactions.
When you unzip a csvz
file, you end up with a folder with csv-files.
Can this be considered a csvz-container?
I think it could be useful. For example, when putting datafiles into git
.
To differentiate a csvz-container from a folder containing some csv files, I propose such a folder to have the .csvd
extension. So if you extract the my_data.csvz
file, you get the my_data.csvd
folder.
The folder could have the same extension (thus not introducing a new extension). The disadvantage is you can't extract an .csvz
file into the same folder without deleting/moving the original file.
Tools are not expected to open these .csvd
-folders directly. It's only to denote that when zipped, it's automatically a .csvz
file.
The specification doesn't specify what a Zip file is.
At the least, there should be a link to either the 2015 ISO standard or alternatively Pkware's technical note (including a version number).
Is there a GZip variant?
Similarly, are there 7Zup, bzip2, and rzip variants?
Is Zip64 supported, to allow more than 4GB files?
... and for each suggestion create a todo lower in the file or an issue I. Here or specify it further down
With columns headers being optional in the 4180 spec, it might also be useful to specify/require the column ordinal in the meta-columns file. This would allow attaching headers to a csv via the column metadata.
For that matter, in the tables metadata would be useful to include a "HasHeaders" boolean column.
Have you considered using a columns meta file per-table instead of putting all columns into a single csv?
So instead of:
_meta/tables.csv
_meta/columns.csv
states.csv
citites.csv
It would be something like:
_meta/tables.csv
_meta/states_columns.csv
_meta/cities_columns.csv
states.csv
cities.csv
The advantage is that it would be easier to get the schema for a single table.
(What is a way to do that automatically in vs code?)
-list it’s claims re spec frag and short description.
The spec for this file feels rather useless unless some minimal set of standard types is defined. A well-defined schema would allow an database import tool to construct the appropriate table in the database. Without a standard set fallback to "string" would be needed when an unknown type was encountered.
I would propose as a minimum:
Possibly also include:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.