Code Monkey home page Code Monkey logo

crews-be's Introduction

crews-be is an Express application that finds Spotify album credits in Discogs. It exposes one endpoint that, given the album id, requests the album data from Spotify API and immediately after searches for it in Discogs. As it finds credits for it, it updates the state (in app.locals) with such information. It's the backend that supports the crews app also available in my Github profile.

Application keys are needed for both Spotify and Discogs.

Application is deployed at Heroku

Try it out

Configuration

The app accepts the following environment configuration variables:

Name Description Required Default
consumerKey Discogs consumerKey value, given when an app is created in the Discogs developers site
consumerSecret Discogs consumerSecret value, given when an app is created in the Discogs developers site
throttleTime Amount of milliseconds that the general queue of Discogs takes between operations. This is related to Discogs API rate limits 1100 (this is also the recommended value)
PAUSE_NEEDED_AFTER_429 Amount of milliseconds that the queue of operations is paused after getting 429 from Discogs. That means that no Discogs requests are performed during that time 30000
clientId Given by Spotify when creating a new application
clientSecret Given by Spotify when creating a new application
clientId Given by Spotify when creating a new application
PORT Port where Express listens to requests 3001
CORS_ALLOW_ORIGIN Value of the Access-Control-Allow-Origin header in the response *

Usage

  1. Clone repo and npm install
  2. Create your Spotify and Discogs applications and have their keys handy
  3. Drop your env vars in a .env file.
  4. Run npm start.
  5. Request an album at localhost:<PORT>/data/album/:spotifyAlbumId

You'll notice the app responds very quickly to the client with an empty bestMatch with 0 progress. That just means the search started. You can keep requesting the album and check how the progress goes, but the console will also inform you about it. Ideally, a web client should poll the endpoint to find out about new data and stop when progress is 100. Since the Discogs API is the main limitant here and it's restricted to one request per second, it's not recommended to request the album more than once per second, since there won't be anything new before that.

Once the search finishes the data remains in the state for as long as the app is in memory, so subsequent requests for the album should show all the data found about the album.

Search logic

The Discogs API exposes a search endpoint. The details of the results then have to be fetched one by one using the release endpoint.

Discogs API requests are limited. Some albums can throw tens and even hundreds of results. This can make some searches very long to complete. It is possible though, and considerably likely, that the first results are the most relevant. With this in mind the app exhibits the following behaviors:

  • All the requests made to Discogs are throttled to avoid reaching the limit. I made a specific NPM package (throxy) solely for this.
  • The 429 Too Many Requests has still been observed rarely (known issue) so it's handled.
  • The album endpoint response has a progress value. This does not reflect found data, just how many operations have been performed to complete the search. The total amount of operations is given by the sum of the times the Discogs search endpoint has to be called and the amount of releases that need to be individually requested.
  • The album endpoint response has a bestMatch object that contains all the found data for the album. Subsequent requests to the album endpoint will have it updated or not as the progress increases.
  • Search operations are performed in strict sequential behavior. Otherwise a single search would hog the operations queue (the one that throxy handles).

Clients are suppossed to poll the album endpoint until progress reaches 100. All this also allows to start many searches at the same time and use the Discogs API resources efficiently for all the clients.

Logging

The app outputs info about the searches both to console and to disk (/log folder). In the console, a tag with the name of the album and artist being searched lets you know what every message is about. On disk, the logs folder contains a file for every album. The name of such file is the Spotify album ID. Logging is done using winston

Tests

Test are contained in *.spec.js files along the module they're testing. Frameworks used are mocha, sinon and Node's assert.

crews-be's People

Contributors

pedro-otero avatar

Stargazers

Zao Soula avatar

crews-be's Issues

Taylor Swift - 1989

Album Id Track Id
1989 34OkZVpuzBa9y40DCy0LPR Shake It Off 3fthfkkvy9av3q3uAGVf7U

Finds no credits.

Some editions in Discogs have credits. Find out why aren't they matching.

Paradise Circus - Heligoland

Album Id Track Id
Heligoland 1F8y2bg9V9nRoy8zuxo3Jt Paradise Circus 2BndJYJQ17UcEeUFJP5JmY

Finds credits only until the search has progressed beyond 90%. Usually the first results have better data, such as credits information populated. Make sure the match is correct and if it is, why is it so far down the list of results.

Rename searches actions

Searches actions should be renamed as follows:

setLastSearchPage => setLatestSearchPage
setLastRelease => setLatestRelease

Save searches

Currently all searches reside in memory and are lost when Heroku kills the free dyno. These need to be saved to a database. Has to be a NoSQL like Mongo because of the flexible schema. After this, the search request flow has to change:

  • A first middleware checks if the search exists in the DB and retrieves it.
  • Same current flow
  • When the search finishes, it's stored in the DB and removed from the app state.

Track endpoint

New feature: A track endpoint that outputs credits only for that track. It reuses everything that's in place for albums search. It just produces a lighter response for the client.

It can either:

  • Do an additional request to Spotify for the track and find out the album.
  • Have the album id as a parameter.

So endpoints can be respectively:

  • /data/track/<trackId>
  • /data/album/<albumId>/<trackId>

Count active searches

Keep count of active searches. Active means: not completed and actively getting results and parsing them.

This might help when implementing cached responses, since the number of active searches will be equal to the TTL (in secs, for a 60/second Discogs API limit) of the response.

The approach could be keeping the operations list for every search in state and make the search function modify and iterate over this list.

Duplicate credits

Album Id Track Id
Body Talk 1 7J4oxoeFQLTrHnjNu2ZaJ5 Fembot 6agCYmY7mHdd2HtzkKR0uU

Klas Åhlund, Patrik Berger and Robyn appear as composers twice. This can't be related to the issue with accented names (the one fixed with b379850) since only Klas' is.

The issue likely has to do with the fact that composers are associated using different ways to refer to it that Discogs has mapped in roles. So the problem can be this guy.

Put credits directly where they belong

Currently all credits are stored in their own state array. With almost 100 albums searched, its length can be in the order of thousands. So every time the credits for an album are needed, this array has to be filtered. That will not scale well.

Put credits specific to an album either in their searches or albums entry. If in the album, store the credits for each track in their respective track item.

Madonna - Lucky Star

The highlighted artists don't seem right. Are probably from a remix edition of the album

image

Album Id Track Id
Madonna 2hWI9GNr3kBrxZ7Mphho4Q Lucky Star 2hWI9GNr3kBrxZ7Mphho4Q

Merge albums and searches state objects

Currently albums has the resulting album data that will be sent to the user, and searches just the data related to the progress of the search.

Merge into a single albums object with:

  • progress: The data previously under searches
  • result: The data previously under albums

Admin interface

Have an admin interface endpoint that renders a view that displays the state. This view should allow to discard previous searches.

New endpoints:

  • GET /admin/state
  • DELETE /admin/search/:spotifyAlbumId

Store less about the albums

Albums are put in the store fully. And what's used of them is so little. Put only the id, name, and array lf tracks (without pagination). Every track should have only id and name.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.