Code Monkey home page Code Monkey logo

Comments (6)

MarkPflug avatar MarkPflug commented on May 20, 2024 1

So I propose a more minimalistic base specification, with extensions as profiles.

I'm not sure I know what you mean by this. Do you mean more minimalistic than it already is? Or do you mean don't specify any more than is already specified?

Personally, I think if the columns specification listed the expected set of supported types then the spec would cover the 99% use case: export data out of one database and import it into a different one while preserving the rough schema. Extensions could remain unspecified, as long as the structure of files covered by the csvz spec were correct then any tool could read it.

Consider an example:
Tool A handles several custom profiles. A file created with tool A is then processed by tool B, which doesn't know anything about the custom profiles. Should tool B be responsible for preserving the metadata from those profiles? It would require some specification for how a profile defines what/how its custom metadata needs to be preserved. That sounds hard to specify, let alone implement. Certainly B cannot produce/provide metadata for the unknown profiles, but it could conceivably preserve them.

Maybe consider a concrete example: let's say I want my csvz tool to carry custom metatdata about the indices that existed in my database. How would that be handled by a profile? How would the abstract concept of profile be specified in this context? How would the concrete "indices" profile be specified?

Not that I disagree with this idea, I just think the beauty of csvz is in the simplicity. I worry that the concept of profiles starts down the road of making something too complex for anyone to bother implementing.

from csvz.

secretGeek avatar secretGeek commented on May 20, 2024

as always i am certain that your idea is good and valid and helpful.

the first thing i know to say yes to is having a "csvz implementers forum." How do we do that?

i created a wiki and asked there in case potential forum joiners look there. i'll tell people in the contribution to join as well. https://github.com/secretGeek/csvz/wiki

i am having trouble knowing how to translate this into changes to the specification.

does this mean we would take out existing sections? (which ones) or does it mean we would change existing sections? which ones and what sort of changes?

it should be assumed that these locations are reserved:

_meta/*.csv
(*.csv) included anywhere recursively under meta
all folder names reserved under there unless otherwise specified

These are the files that are currently partially defined (all of which are optional).

_meta/tables.csv
-meta/columns.csv
_meta/relations.csv
_meta/csv.csv 
_meta/datatypes.csv

Where:

  • csv.csv is coming soon, but is not yet sufficiently described that anyone could say they comply with it beyond having the file.
  • datatypes.csv is quite far away from being understood well enough to be shared. again optional.

Maybe _meta/meta.csv which described which of the standards you claim conformance with. e.g.

fragment;conformance;notes
csvz-0;strict;this csvz file claims strict adherence with csv-0
csvz-meta-tables;strict;yes we have a _meta folder with a tables.csv file in it.

(If they only claimed those two rules were followed then it would be up to the consumer to read the files and determine for themselves how to make sense of them.)

Coming soon also we'll likely propose that these paths are added, that allow for per-file meta files.

_meta/tables/*.csv
-meta/columns/*.csv
_meta/relations/*.csv
_meta/csv/*.cs

What if there was:

  • a list of "registered tools" -- stored at this repository.
  • each tool was effectively registering a short name (< 50 characters) which they have then reserved for use.

for example if someone reserved "to-sql-server" then they would have reserved the right to define the naming, structure and meanings of all files/folders that go in this location:

_meta/tools/to-sql-server

Scalability problem: what if there are too many tools?

then we'd use software to solve that problem.

Or guids. (Guids are better)

Once they've registered a name (or a guid) they can then publish standards that someone generating a file can claim to have conformed with.

e.g.

csv-tools-to-sql-server--indexes would be a spec fragment defined by whichever repository is the official owner of the to-sql-server tool (thus they get ownership of the csv-tools-to-sql-server--* spec fragments.)

So if I've stored a csvz file that i think the "to-sql-server" tool can read, and i want it to have very specific indexes built when it is converted to sql server, then i'll write the expected files in the csv-tools-to-sql-server--indexes folder.

An alternative is that the concept of "tool" can be replaced with "vendor". (This all reminds me of vendor prefixes in css standards.) i prefer "little" tools and "little" spec fragments to "big" vendors. the littler the better.

from csvz.

doekman avatar doekman commented on May 20, 2024

That's quite a reply. Not sure if I'm able to reply to all of it right now.

Implementors forum serves a couple of purposes. Visibility what is being worked on is an important one. It's nice to know what people are working on, so you know what to expect in the near future. Also, it should be a place where you can ask questions and discuss them. If a database stores a NULL value; how to store that in a CSV file, and handle it within CSVZ.

I myself would like to have a tool to load/save data from the command line (also makefile) to/from a database. I do have a lot of questions still; not sure if I want to do this as a side project. I put my thoughts on the implementors forum page.

I think the core csvz-specification would be the part under csvz-0. There are still some additions to make. You need to say things about the _meta folder. And does it need to exist within a zip? (thinking of git).

The other parts are "guidance" to the implementors, and can later be used

I just don't want to be this specification a theoretical exercise. I really hope some nice tools will evolve from it. This spec should be a framework for tool implementors, not a todo-list.

You have a lot ideas about this framework (_meta/tools folder), but maybe it's better to wait if an implementor needs something. And invite them to discuss those needs here on this forum.

One more thing about the profiles: each profile describes what meta-files and -columns the profile needs. For example, the database-import-export-profile and the ms-access-clone-profile both use the file _meta/tables.csv. Some columns overlap, but some are specific to a profile (control-type column for the ms-access-clone profile).

from csvz.

doekman avatar doekman commented on May 20, 2024

@MarkPflug I didn't read your reply until now. Yeah, I favour simplicity too.

However, the sky is the limit. Say, you want to transport avatar-images with your csvz file, because they are stored in the database. You could store them as base64 in and csv file, but also in a folder <tablename>/avatar1.jpg next to the csv <tablename>.csv, which contains a column avatar with filename avatar1.jpg.

Would you add that to the spec to, or do you put it into a separate profile?

Anyways, these are just my thoughts.

from csvz.

secretGeek avatar secretGeek commented on May 20, 2024

Interesting interesting.

If this was done:

Say, you want to transport avatar-images with your csvz file, because they are stored in the database. You could store them as base64 in and csv file, but also in a folder /avatar1.jpg next to the csv .csv, which contains a column avatar with filename avatar1.jpg.

... then perhaps that column would be described as having a data type of (I don’t know... ) internal file reference... and there would need to be a spec fragment covering what that means (for example it couldn’t point to a file outside the csvz. )

And if you were writing a tool that could read csvz and could write to (for example) Sql server data bases... then you’d decide for your technology what’s the most effective way to stick those into the resulting database.

If someone else was writing a “from Sql” to csvz file... and they grabbed your database and wrote it all to a csvz file... you’d probably hope/expect/not be shocked if those files you stored ended up as being file reference types there too. But there’s no real guarantee on that sort of thing. If it succeeds then I guess you could tick a box that says that pair of tools have the “round tripping” achievement in the csvz tool store.

from csvz.

MarkPflug avatar MarkPflug commented on May 20, 2024

github discussions are still coming, right? Enabling discussions on this repo seems like the easiest solution. Until they do, you could just create (and pin) an issue with the title "Discussion". Interested parties could subscribe to that issue.

FWIW, I already have a very rough implementation on .NET (as a library, not tool) of the csvz-0, csvz-meta-tables and csvz-meta-columns (but not relations):
https://github.com/MarkPflug/Sylvan.Data.CsvZip

I would imagine avatar images would be carried base64 encoded in a column with type = "binary". That is how my implementation would handle it anyway, assuming it was exported from a binary column in a database.

from csvz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.