Code Monkey home page Code Monkey logo

ckan-client-js's People

Contributors

amercader avatar anuveyatsu avatar cotts avatar dependabot[bot] avatar kmanaseryan avatar mariorodeghiero avatar pdelboca avatar risenw avatar rufuspollock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ckan-client-js's Issues

Use browser file object as a data instead of whole buffer on upload

Originally this issue came from
frictionlessdata/frictionless-js#89

We had two issues with this:

  • Unable to generate hash for large files. (This is fixed in data.js)
  • Unable to upload large file (This needs to be fixed here)

Acceptence criteria

  • Able to upload large file (600mb+)
  • The existing tests should pass

Tasks

  • Update upload method so that it won't use the buffer instead it will use the original browser file object

Analysis

The thing is that when we upload a file, we read it as buffer and then pass this to HTTP request. However when the file is too large (e.g. 600+) the browser fails to read it, depending on the memory capacity. Now instead of passing the buffer we should pass the original browser file object and stay away from reading the file contnet. The rest will be handled by the HTTP request. It worth to mention that original issue is not produced at this stage, it happened even in earlier step, during generating the hash. Since for the hash we also used to use the whole buffer, initially it failed here first. However we got improvements here, so instead of reading the whole buffer, now data.js lib supports reading the file buffer chunk by chunk which gives us the ability to read arbitrary size of file and hash it. So as you can see remains only the last checkpoint to cover.

Refactor sdk to be more elegant

Is about refactoring this to be more elegant and use.

Acceptance

  • refator code to be more elegant using Client and Open methodos
  • change all files necessary and fix tests

Tasks

  • change webpack to call the variable Client
  • rename the main class to use Client
  • change the import and export to be able to use only Open to call methods do use in browser or node
  • be able to pass the url to CkanAuthz request, so will be able to reuse this in datapub request
    • throw an error if the user didn't pass an URL
  • fix all tests to use the new format

Method to call CKAN action API

We can copy this from old okfn/ckan.js

Acceptance

  • async client.action(actionName, payload, useHttpGet=false) calls CKAN action API and returns the result
  • (?) Support for forcing http get

[ ] the method is documented in code and README
Created a separate issue for this #22

Tasks

  • Tests (copy from okfn/ckan.js)
    • Node

* [ ] Browser (?)
I don't think we need this for now, because the implementation of action() is platform agnostic, i.e. there is no any platform related implementation for now.

  • Implement the method
  • Support for postType=GET (like ckan.js has)

Implement get resource

Implement get resource function as described in EPIC #4

Aceptence criteria

  • When the resource id or dataset id and resource name is/are passed return the resource json

Tasks

  • Write a function which receives the resource id or dataset id & resource name and makes the relevant request to CKAN server to get the dataset.
  • Write the test to assert the above implementation

Refactor LfsClient.GitLfsClient to use code from `giftless-client-js` and remove it from `ckan-client-js`

In datopian/giftless-client-js#6 the code for LfsClient.GitLfsClient from ckan-client-js was moved to giftless-client-js (specifically to https://github.com/datopian/giftless-client-js/blob/feature/upload/src/index.js and https://github.com/datopian/giftless-client-js/blob/feature/upload/src/transfer.js). This code will no longer be needed here and can use be used from giftless-client-js instead.

Acceptance

  • The code from this repo for LfsClient.GitLfsClient has been refactored to use giftless-client-js.

Tasks

  • TBC

pushBlob method to push resource data to blob storage

This a refactoring of existing push method to pushBlob

Pseudo code looks a bit like:

client.pushBlob(resource, onProgressFn) {
  await response = client._doBlobAuthz()
  await storageToken = client._pushDataToBlobStorage(resource, response.result.token, onProgressFn)
}

Acceptance

  • refactor push to pushBlob method
  • refactor authz method

Tasks

  • Refactor push( ) method to be putBlob( )
  • Move the ckanAuthz method into the push renamed doBlobAuthz( )
  • Rename uploadToStorage to be pushDataToBlobStorage( )
  • the parameter resource should be file come from the datajs.open()

retrieve method for dataset metdata

Implement retrieve dataset function as described in EPIC #4

Aceptence criteria

  • client.retrieve(datasetNameOrId) returning a frictionless formatted dataset object.

Tasks

  • Write a function which receives the data id or dataset name as parameter and makes the relevant request to CKAN server to get the dataset.
  • Converts the result to frictionless format
  • Write the test to assert the above implementation

Analysis

Pseudo-code

client.retrieve( ... )
    await result = this.action(`package_get`, name)
    return ckan2f11s.dataset(result.package)

Use Yarn by default instead of NPM

Just for consistency we'd like to use Yarn and yarn.lock file instead of NPM and package-lock.json going forward.

Tasks

  • Remove package-lock.json file
  • Generate yarn.lock file
  • Commit

[epic] v0.2 Development of the Client SDK

v0.1 was about basic "push" functionality which is done.

v0.2 is about refactoring this to be more elegant and support for reading/writing metadata

API design should follow https://github.com/datopian/ckan-client-guide (but tweaked for js/nodejs idioms)

https://github.com/okfn/ckan.js could probably be reused for the plumbing layer.

Acceptance

  • client.push(datasetMetadata) function works which takes dataset JS object (in f11s format) and puts it to CKAN datastore
  • client.pushBlob(resource, onProgressFn): uploads a resource's data to the Blob Storage. resource is a data.js File object of a specific type e.g. FileInterface
  • client.retrieve(datasetIdOrName)

Porcelain

  • client.action(...)

Bonus

  • client.pushResource(resource, dataset-name) takes resource in data.js and saves data to storage and metadata
  • client.pushResourceMedata

Tasks

  • client.action: wrapper for CKAN action API #10
  • client.push: put dataset metadata #11
  • client.pushBlob: refactor existing code #12
  • client.retrieve: #7

API design

This is indicative - see individual issues for precise specifications

Uses data.js and its Frictionless (f11s) data interface as the way to access files and datasets. Metadata follows Frictionless formats.

Plumbing

This is relatively raw low level APIs.

  • Storage: getting authz, interaction with giftless etc
  • MetaStore: CKAN action API v3
client = Client(...)
client.action_api(...)

Porcelain

// create a dataset (or overwrite existing one) - just puts metadata (file uploading is separate)
client.push(datasetMetadata)

// longform of previous item
resource = open(path) # f11s.Resource object
// this uploads just the data to storage ...
// resource has .stream, 
client.pushBlob(resource)
// f11s.Dataset({metadata})
// so can do dataset.descriptor
dataset = client.retrieve(dataset_name)
dataset.resources.push(resource.descriptor)
client.push(dataset)

// bonus
// convenience function
// dataset must exist
client.pushResource(f11s_resource, dataset=dataset_name)
// convenience
client.pushResourceMetadata(...) # only save resource metadata ... 

// GET functions
client.retrieve(dataset_id_or_name)

// BONUS
client.retrieveResource(resource_id OR dataset_id + resource_name)

Example usage

import data.js as f11s
import ckanclient

// resource is a data.js file object from a browser file selection
resource = filereader object wrapped ...


dataset = {
  descriptor:
    name: ... 
    resources:
}

What is an f11s_dataset?

dataset.descriptor => Frictionless formatted object for a dataset ...

put

Client.action_api = async function(action, payload) {

}

Client.put = async function(f11s_dataset) {
  ckan_metadata = f11s2ckan.dataset(f11s_dataset.descriptor)
  // https://docs.ckan.org/en/2.9/api/index.html#ckan.logic.action.create.package_create
  // see existing code https://github.com/okfn/ckan.js#catalog
  await res = client.action_api('package_create', ckan_metadata)
  return res
}

Clean up this module

It seems to me that there is some code in this module that is either no longer in use or is duplicated somehow, and a lot of the naming is confusing:

  1. lib/file.js - I think this is no longer in use as we now rely on data.js.
  2. lib/util/action-api.js - How is this different from the stuff in index.js (Client.action) and do we need both? Should they be merged/
  3. The stuff in lib/util/ckan-auth-api.js is very much related to the other CKAN API wrappers and should probably be merged or at least there is some code that we can DRY out of it and rely on the same logic, no?
  4. In general why is this stuff under util? these are not really utilities but core functionality that we are implementing here
  5. I don't even know what is in lib/util/general.js but anything that has both util and general in it's name must be misplaced. (Ok I checked what is in there, and I think we should use an npm module for this and not reinvent the wheel)
  6. Naming - there are methods with very confusing names like push and pushBlob. This module needs to follow clear semantics, e.g. if this is a CKAN client it should have methods like createResource or updateResource or uploadResourceFile. When push comes to shove, push is just meaningless IMHO.

Update README for better explanation of using methods

Acceptance criteria

  • In the README I can see how to use in the browser - including it as a script
  • Used jsdoc2md which documents the methods in README
  • pushBlob is documented
  • Node part of how to use it is now more clear

Tasks

  • Update README for the browser
  • Use jsdoc2md to convert the comments into md in README
  • Document pushBlob method in README
  • Update Node part re how to use it

Compatibility with trailing / in the URL

In the Python version, I added a small tweak to handle a trailing slash in the URLs. That allows the client to work seamlessly having https://my.ckan.server.org/ as well as https:/my.ckan.server.org as an input to create the client instance.

This way we avoid building URLs such as https://my.ckan.server.org/api/3/action//create_package, which returns 404.

I think all we need is to change the lines:

this.api = api

To:

    this.api = api
    if (!this.api.endsWith('/')) {
        this.api += '/'    
    }

And remove the leading slash here:

const path = `/api/3/action/${actionName}`

What do you think?

Example script for testing against demo.ckan.org

Have a simple example script we can use both as demo and test:

// in examples/demo.ckan.org.js

APIKEY = 'xxx' // or get from environment
client = Client(...
dataset = client.get('some-dataset-on-demo.ckan.org')
dataset.resources.push({ path: https://somecsvonline.com/somecsv.csv, ...})
client.push(dataset)

console.log('visit ...');

Acceptance

  • Example script in examples (or just in the README) that we know works
  • Have run script and verified by hand that dataset is updated on demo.ckan.org

Publish into @datopian org on npm

Aceptence criteria

  • Can install by running yarn add @datopian/ckan-client-js

Tasks

  • Setup a tag so that on release it will be published on NPM register

Push dataset metadata client.push(datasetMetadata)

client.push(datasetMetadata) function works which takes dataset JS object (in f11s format) and puts it to CKAN datastore

Acceptance

  • client.push(datasetMetadata) that updates an (existing) dataset. If Dataset exists raises exception with that info
  • client.create(datasetNameOrMetadata) that creates a "project" (with associated dataset).

Tasks

  • Convert from f11s metadata to ckan metadata
  • Call action api method and save that metadata and return response

Implement Git LFS basic-multipart transfer support

Following the addition in Giftless (datopian/giftless#11), we should add support for the new transfer mode in ckan-client-js so that it is possible to efficiently upload very large files.

Acceptance

  • ckan-client-js can negotiate either a basic or multipart-basic transfer mode with a server that supports it
  • ckan-client-js can decide whether to prefer a basic or multipart-basic transfer based on file size or other configuration
  • ckan-client-js can successfully upload a large (>1gb) file to storage using mutlipart-basic

Technical Requirements

  • Refactor current LFS logic to a separate module, with a single class wrapping the transfer logic and also encapsulating batch API / transfer mode negotiation
  • Move basic transfer protocol implementation details into an adapter that can be replaced
  • Implement a new multipart-basic transfer adapter

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.