Code Monkey home page Code Monkey logo

Comments (23)

cperryk avatar cperryk commented on August 23, 2024 1

This may be outside the scope of this discovery, but I think we can use this opportunity to also rejigger import and export slightly to make them a little more straightforward. Right now, they have too many options, many of which are nonsensical in certain combinations, and which demand too much rote memorization and steepen the learning curve. I recommend we simplify both in two ways:

  • Keep them strictly separated. export writes asset data to stdout and import simply expects it from stdin. import should not be concerned with pulling data out from anywhere; it should only be concerned with putting it somewhere. This is similar to how other data migration tools work β€” e.g. mongorestore doesn't also do a mongodump.
  • Automatically infer asset type from a specified url instead of providing an option for each. I should just be able to do clay export foo.com/pages to get many pages, clay export foo.com/pages/1 to get one page, clay export foo.com for the whole site, clay export foo.com/components/a/instances/1 for a single cmpt instance, clay export foo.com/users for all users, etc.

Examples:

  • clay export foo.com/pages/1 > bootstrap.yml Export single page to file
  • clay import bar.com < bootstrap.yml Import from file
  • clay export foo.com/users | clay import bar.com copy users from foo.com to bar.com

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024 1

A dispatch is a collection of components, pages, users, and other data that can be represented in an object. It uses prefix-free uris and is formatted as YAML (the written-out YAML files are known as bootstrap files). Dispatches can be exported from Clay installs, and imported to them. 3rd party exporters should create a dispatch that's sent to stdout, so it can be piped into claycli.

wordpress-export domain.com/blog | clay import my-clay-site

the passed through object looks like:

pages:
  index:
    customUrl: /
    main:
      - /_components/feed/instances/index
  post1:
    customUrl: /2017/first-post
    main:
      - /_components/article/instances/post1
components:
  feed:
    instances:
      index:
        query:
          match_all: {}
  article:
    instances:
      post1:
        title: First Post
        content:
          - _ref: /_components/paragraph/instances/1
          - _ref: /_components/paragraph/instances/2
  paragraph:
    instances:
      1:
        text: Lorem ipsum dolor sit amet
      2:
        text: consectetur adipisicing elit

config

config -k|s <alias> [value]

Set aliases for api keys and site prefixes

touch

touch <url>

Run GET requests against all instances of a specified component (parsed from the url provided)

import

import accepts a dispatch from stdin and sends it to a site. the dispatch may come from a file, importer, or another script

import domain.com < bootstrap.yml
cq-exporter | import domain-local -k local (site alias, apikey alias)
import domain-local -p < pages.yml (auto-publish pages)

export

export prints a dispatch to stdout. it may be used to pipe to a file, importer, or another script

export domain.com/_components/foo/instances/bar > foo-bar.yml
export domain.com/_components/foo > foo.yml
export domain.com/_pages/foo > foo.yml
export domain.com/_users > users.yml
export domain.com/_lists/tags > tags.yml
export domain.com/_lists > lists.yml
export domain.com/_users | import domain-local (import to site alias)

site prefix means search pages index (determine underscoring of routes afterwards)

export domain.com > pages.yml (100 pages, no layouts)
export domain-prod -l 5 -L > pages.yml (site alias, 5 pages, layouts)
export domain-prod -l 5 -o 10 > pages.yml (limit + offset)
export domain-prod -l 5 < path/to/query.yml > result.yml (custom query)

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

Per @amycheng:

these are the reasons why i dont use clay-cli 100% of the time:

  • lack of site-to-site import
  • messes up page import (see github issue)
  • doesn't publish the page(s) I just imported (edited)
  • doesn't do ambrose imports

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

After reviewing various custom scripts, the old claycli github issues, and interviews with devs, this looks like the main feature set claycli needs to gain widespread mindshare:

  • import/export
    • automatically handle unpublished pages' url property (should be customUrl, and should be prefixed correctly)
    • test claycli import -f against all current first-run bootstraps
    • don't import layout automatically when importing page(s)
    • allow auto-publishing (via flag) when importing page(s)
      • note: should use original publish date + url
    • multi-page import (using pages index)
      • allow batched publishing (using auto-publish flag)
      • limit/offset/query flags
      • lists/users flags
    • multi-site export
      • limit/offset/query flags
      • lists/users flags
  • programmatic api
  • create (not yet, wait for styles)
  • clone (remove)
  • --page/--component β†’ --url for individual item, should work with users/lists/pages/components
  • better logging (using clay-log)

The Way Forward

  • Sketch out a redesigned API for claycli
  • remove dead code / docs (create, clone)
  • rewrite utils based on Chris's PR (incl. logging)
  • add programmatic hooks
  • build + test updates to single-url import
  • build + test updates to single-url export
  • build + test multi-page import
  • build + test multi-page export
  • build + test CQ importer
  • build + test Wordpress importer

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

Arguments

  • --key, -k api key or alias
  • --source-key, -K source api key, used when doing multi-page import from a site (thus, querying the source's pages index) if the site requires a different key than the --key
  • --site, -s site prefix or alias
  • --file, -f file path or alias
  • --version, -v print version and exit
  • --verbose, -V debug mode
  • --help, -h print help and exit
  • --url, -u import, export, or lint specific url (component, page, or public url)
  • --users, -U import or export users
  • --lists, -L import or export lists
  • --limit, -l set number of most recent pages to import/export
  • --offset, -o set offset of most recent pages to import/export
  • --query, -q path to JSON/YAML elasticsearch query to run against pages index (rather than fetching latest w/ limit + offset)
  • --publish, -p boolean to publish imported page(s) / component(s)
  • --dry-run, -n list effects of config/touch/import/export that would be performed without doing the action
  • --force, -F when importing pages, overwrite layouts with imported data. when exporting, overwrite file if it exists
  • --recursive, -r recursively lint children when linting
  • --amphora-legacy, -a use non-underscored api routes (applies like --key)
  • --source-amphora-legacy, -A use non-underscored api routes (applies like --source-key)

Commands

config -k|s|f [-n] <name> [value]
touch [-n] <url>
import [-u|s|f] (or stdin) [-kKaA] [-ULpnF] [-loq] <destination site>
export -u|s [-ka] [-ULnF] [-loq] <destination path> (or stdout) 
lint [-u|f] (or stdin) [-r]

from claycli.

amycheng avatar amycheng commented on August 23, 2024

What are the use cases of the --site argument. I don't find myself using the site prefix a lot while importing and exporting data, because the tool as it is right now, can derive the site from the url argument.

With scratch-cli, I found that it was a hassle to add acceptable items to its sites list. This led to annoying number of errors where scratch told me the site I passed in wasn't accepted.

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

that’s for, say, importing or exporting multiple pages

e.g. clay import -s di-prod -l 10 -K prod -k local di-local means β€œimport the latest 10 pages from prod DI into my local DI (both set as aliases via clay config)”

e.g. clay export -s localhost:3001/selectall -l 0 -U path/to/user-backup.yml would be "export only the users (limit 0 pages) from my local selectall site to user-backup.yml"

from claycli.

amycheng avatar amycheng commented on August 23, 2024

setting the sites alias in an external config is great idea. It was a hassle to maintain that list in scratch-cli

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

yuuup. The config is one of my better ideas πŸ™ƒ

from claycli.

amycheng avatar amycheng commented on August 23, 2024

maybe clay-cli can accept an opts file like mocha.opts. It just sweeps the arguments under the rug but at least the dev will only have to pass in one argument instead of several?

This has the added benefits of devs sharing opts files and creating different opts files for different tasks.

clay import foo.com/pages -opts cut-import

from claycli.

cperryk avatar cperryk commented on August 23, 2024

@nelsonpecora Wouldn't you have to collect all the data first to write the dispatch? That'll be too memory intensive.

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

Hmm, yes, that's the balancing act (between memory and number of required api calls).

from claycli.

cperryk avatar cperryk commented on August 23, 2024

@nelsonpecora I don't see how this approach reduces the number of API calls unless we add some kind of "dispatch import" endpoint to Amphora. Is that what you're suggesting?

My thought was to have each line of clay export's stdout represent something that can be PUT into something else within one request. clay export foo.com/pages/1 would stream something like:

{"/pages/1": <page base data>}
{"/components/a/instances/1": <composed root-level cmpt data>}
{"/components/b/instances/1": <composed root-level cmpt data>}

And you could pipe this directly into the import command β€” it would just go line by line, issuing PUTs.

(This example above assumes cascading page PUTs don't work... I tried after you told me about them but could not get them to work β€” show me?)

If your concern is duplicate PUTs (e.g. a layout appearing a thousand times in a stream b/c a bunch of pages use it), we can use highland's uniqueBy method to prevent that.

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

The problem with that format is that it's really brittle and not human readable, and cannot be used to interact with bootstrap files. The problem with bootstrap files is that they aren't composable (hence, memory hogs). The problem with smaller chunks is that they require too many API calls.

We need something better.

from claycli.

cperryk avatar cperryk commented on August 23, 2024

πŸ€” clay bootstrap myBootstrapDir | clay import foo.com

bootstrap reads the specified bootstrap file / dir and streams asset objects.

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

@cperryk we currently allow both importing and exporting via files, and there are very strong use cases for both types of action.

option 1: we abandon piping to/from files and go back to using --file arguments for that, which would allow stdin/stdout to use that machine-readable format rather than a human-readable one

option 2: we figure out a compromise format that can be used by both machines (to stream export β†’ import) and humans (to import/export from/to files)

from claycli.

cperryk avatar cperryk commented on August 23, 2024

Using export with an arrow would allow us to export data to files and import from files -- it just wouldn't be bootstrap format. It wouldn't be instantly readable, but it would allow us to "save" data to put somewhere later. Do we need the ability to export to bootstrap format? Have we ever needed to generate a bootstrap file from a site?

If we do need it, here's how it could work:

  • clay export foo.com/pages | clay toBootstrap bootstrap.yml
  • clay fromBootstrap someDir | clay import foo.com

from claycli.

cperryk avatar cperryk commented on August 23, 2024

We could also come up with some kind of compromise format and make export's streaming to stdout and import's reading from stdin more intelligent. e.g. for import there would be logic like "read lines until condition x is met when I know I have a complete asset, parse, import, continue to next line"

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

I think we've figured out the compromise solution:

--yaml, -y flag for both import and export, specify that the stdin/stdout format should be yaml (normalized bootstrappy objects) rather than the optimized-for-apis dispatch format. (also I propose we call these formats "bootstrap" and "dispatch", for clarity) We should also heavily discourage people from mucking with the dispatch format manually, since it's not human-friendly and they absolutely will muck it up.

clay import -y domain.com < bootstrap.yml
clay export -y domain.com/_pages/1 > backup.yml
clay import domain.com < db_dump.clay (newline-delineated, so not technically json)
clay export domain.com/users > user_dump.clay (or .txt, or no extension?)

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

Hmm, this is as simple as I can make the api. @cperryk do you think there's a way to simplify the export api any more? (The --limit/--offset and query stuff sorta conflict)

Arguments

  • --key, -k api key or alias
  • --site, -s site prefix or alias
  • --version, -v print version and exit
  • --verbose, -V debug mode
  • --help, -h print help and exit
  • --limit, -l set number of most recent pages to export
  • --offset, -o set offset of most recent pages to export
  • --publish, -p boolean to publish pages/components while importing
  • --layouts, -L boolean to export pages' layouts (false by default)
  • --concurrency, -c run api calls concurrently
  • --yaml, -y handle yaml files in stdin/stdout (note: queries for exporting are always specified as yaml)

Commands

config (-k|-s) <alias> [value]
import [-k <apikey>] [-c <concurrency>] [-py] <site alias>
export [-k <apikey>] [-c <concurrency>] [-Ly] [-l <limit>] [-o <offset>] <url or site alias>
lint [-y] (<url>|stdin)

Exports Examples

Exporting single items

export domain.com/_components/foo/instances/bar
export domain.com/_pages/foo # page only, no layout
export -L domain.com/_pages/foo # page + layout
export domain.com/_lists/authors > authors.clay

Exporting multiple items

export -y domain.com/_users > users.yml
export -l 10 domain.com # latest 10 pages
export -l 10 -o 10 domain.com # next 10 pages
export domain.com < query.yml > dump.clay

from claycli.

cperryk avatar cperryk commented on August 23, 2024

Yes, this looks great! Some thoughts:

  • offset and limit should work with any export, e.g. export foo.com/users -l 10 should export 10 users.
  • lint foo.com/pages should lint all pages.
  • (re: off-thread conversations) --concurrency is important when doing big data migration (e.g. nymag's SoU import) and we should keep it

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

per our discussions:

  • lint and export should accept a newline-delineated stream of uris (pages/components/etc), rather than a query
  • --limit and --offset will work on stdin if it exists, then fallback to usage against the pages index (if passed in a site for multi-page export)
  • concurrency will be defaulted to 10, but controlled via a CLAYCLI_CONCURRENCY env variable

from claycli.

nelsonpecora avatar nelsonpecora commented on August 23, 2024

Note: All claycli env variables should begin with CLAYCLI_ to prevent naming collisions

from claycli.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.