Code Monkey home page Code Monkey logo

data-connector-server's People

Contributors

dependabot[bot] avatar nick-verida avatar tahpot avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

snyata

data-connector-server's Issues

Refactor to use a single sync() endpoint

We are changing the security model to assume the data connector server is operating within a secure enclave.

As such, this server can be modified to accept a single sync() command that accepts a user's DID and their Verida: Vault private key.

Create basic dashboard

Requirements:

  • /dashboard/connections Connect a provider, Sync a provider, View sync status
  • /dashboard/data View data for a schema, List common schemas

Handle sync errors and timeouts

Need to handle the following scenarios:

  • Sync has an error, retry X times, then fail permanently
  • Sync has lost credentials, user needs to re-authenticate
  • Sync timed out and needs to be reset after X time

Implement Youtube connector

Implement Youtube handlers in the Google provider.

Phase 1:

User the facebook/following.ts and the google/gmail.ts files as examples to following when building the Youtube handler.

  • Create a new youtube-following.ts handler within the existing Google provider
  • Map Youtube subscriptions data to the Verida following schema (https://common.schemas.verida.io/social/following/v0.1.0/schema.json) See facebook/following.ts for example.
  • Add the necessary youtube scopes to the existing Google provider getScopes() method.
  • Add a getYoutube() method into the youtube-following.ts library to instantiate a connection to the Youtube library.

Then, let's have another call to work on next steps.

Refactor for efficiency

The current architecture has the server create a new database for every sync request.

It then provides the Wallet access to that database, so the wallet can sync this new data. This is inefficient from a storage perspective, creates timing issues and makes it difficult to make the sync data in a way to only fetch the new items.

Now that we support accessTokens, we can provide time limited access for the server to read and write directly to user's vault storage collections. This does increase the security risk surface area a little, but the performance and user experience trade-offs are worth it.

Data Connector New Architecture

It will work like this:

  1. User authenticates via the data connector server (no change)
  2. Verida Vault saves the user's accessToken and refreshToken for the connection into it's own private database of connections (no change)
  3. Verida Vault periodically calls a sync() method on the data connector server
  4. Data connector server handles the sync() request, pulls the latest data from the connector API (ie: facebook) and saves the new data directly into the user's Verida Vault database

In order for this to work, the data api connector sync() method will need to accept the following parameters:

  1. veridaDatabaseEndpointUri The endpoint of the Verida database to connect to
  2. veridaDatabaseAccessToken The Verida database access token that has 20 minute expiry
  3. veridaDatabaseEncryptionKey The Verida database encryption key that is used to encrypt / decrypt data from the Verida database
  4. accessToken The access token for the connection API (ie: facebook)
  5. refreshToken The refresh token for the connection API (ie: facebook)

Where veridaDatabase is an encrypted database on the Verida network, owned and controlled by the Verida Vault application.

[New handler] Google Drive Documents

Add GSuite drive files into a new "documents" datastore.

Request the scope https://www.googleapis.com/auth/drive.readonly

Support the following document types:

  • Docs (Google documents, Word documents, PDFs)
  • Slides
  • Spreadsheets

We don't want to store the actual document, but do want to store the text of the document and a link to the actual document.

Note: PDF files can be converted to text with this code:

static async parsePdfAttachment(base64Data: string): Promise<string> {

Session data isn't being saved

The connect() endpoint sets session data which is then accessed in the callback() endpoint.

However, as you can see when running the code, the session information isn't retained.

This is a problem as the redirect URL is specified in connect(), but then can't be used in callback() to redirect the user back to the relevant application that initiated the connect request.

Is there an issue with session config / initialization?

Steps to reproduce:

  1. Run the server (yarn run start)
  2. Hit the /connect endpoint (ie: http://localhost:5021/connect/discord?did=did:vda:mainnet:0xCDEdd96AfA6956f0299580225C2d9a52aca8487A&key=abc82b917d4f44708f35a618247e70d5243b1c66c0b60a939bca0bc67eadddef&redirect=http://localhost:3001/connections)
  3. Hit the /callback endpoint (ie: http://localhost:5021/callback/discord?code=abc123)
  4. See the console output on the server and you'll see the session data from /callback endpoint doesn't match /connect endpoint

[New connector] Implement Telegram connector

Investigate implementing a Telegram connector.

  • Pull down all the user's chatrooms
  • Pull down all the user's chats

Will need to design a schema for each of those data sets.

Telegram supports the tdlib library that exposes full access to Telegram functionality, enough to support creating new Telegram clients.

There are node.js wrappers around this library:

Telegram doesn't support OAuth and has rather complex auth options, so we will need to implement a specific telegram page in the data connector server that handles auth.

Twitter app breaks auth

Platform: iOS

If a user has the official Twitter app installed and attempts to authorize the data connector, the Twitter app hijacks the authorization request, opens the Twitter app and shows the consent screen in an embedded browser in the Twitter app.

That is fine, however, when the user clicks authorize they are redirected back to the data connector server -- but the query params appear to have been stripped so the data connector server can't obtain the tokens.

Memory leak?

Production server seems to have a memory leak that spirals out of control.

Support syncing all user data for facebook and twitter

We currently limit requests to a single page (20 results).

Need to implement pagination via the underlying API's to support syncing much larger data sets (ie: The full history of a user's posts).

Still have a maximum of 3,000 entries for now.

Ideally implement in a way that allows for the pagination pattern to be generic code that can be reused across other API's.

Support access token instead of private keys

Currently the /connect and /sync endpoints expects the private key (or seed phrase) of an identity.

This was just for the PoC and is poor security practice. This needs to be upgraded to accept either one of:

  • Recent access token for the Verida: Vault context (Permits time limited read / write access)
  • Keychain entropy for the Verida: Vault context (Permits permanent read / write access)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.