Code Monkey home page Code Monkey logo

discogs2pg's People

Contributors

clrnd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

discogs2pg's Issues

failed to parse command status

When trying to parse releases i get the following error every time.

$ stack exec -- discogs2pg -c "host=localhost dbname=discogs_data" discogs_20161001_releases.xml
Parsing releases...
release = 7642045
release_artist = 8870676
release_extraartist = 26472485
release_label = 8651410
discogs2pg: user error (Database.PostgreSQL.Simple.Copy.putCopyEnd: failed to parse command status)

The following tables do not get populated.
release_company
release_format
release_identifier
release_video
track
track_artist
track_extraartist

slow execution after running script on a new datadump

Thanks for building this tool! Its saved me a ton of time in gathering the migrating the Discogs Data for a pet project of mine.

I have a question about the intended usage of the package AFTER an initial data dump parse: If I run the script on a batch of xml files from October and then run it again on the next set of files in November, is there an intermediate action I should take?

Not sure if it's due to the indices on the release-related tables, but I've been running the script on the newest data dump after having filled the tables with last months dump and I'm finding the COPYing process to be pretty slow. Over 12 hours now and it still hasn't finished.

Am I better off just dropping the tables altogether and then refilling them with the new data? That way it can just bulk copy?

Error while importing

Hi,

I'm trying to import the latest dump (20180401) and getting following error while importing:

discogs2pg: user error (Database.PostgreSQL.Simple.Copy.putCopyEnd: failed to parse command status Connection error: ERROR: value "999999999999" is out of range for type integer CONTEXT: COPY release_format, line 9856798, column qty: "999999999999"

I don't really understand what it is happening, but is the latest dump too big?

Release - Master mapping is missing

I imported the schema and ran discogs2pg -d 20180201. This takes a while and fills up the database, but after it finishes none of the releases are mapped to a master:

simon=# select * from release where master_id is not null;
 id | status | title | country | released | notes | genres | styles | master_id | data_quality
----+--------+-------+---------+----------+-------+--------+--------+-----------+--------------
(0 rows)

simon=# select count(*) from release;
  count
---------
 9442718
(1 row)

simon=#

Am I doing something wrong? Perhaps something has changed in the exports from Discogs?

Segmentation Fault Error

Hi, getting segmentation fault, not sure how to fix. I can get the other tables like artist, release to work however releases I keep getting this error

~/Documents/discogs2023$ discogs2pg -g -c dbname=discogs discogs_20231001_releases.xml.gz
Parsing releases...
Avoiding entry for ridiculous relase - One Vigintillion(10^63): Release {_releaseId = "8262262", _releaseMasterId = "", _releaseStatus = "Accepted", _releaseTitle = "One Vigintillion(10^63) Songs", _releaseCountry = "Ukraine", _releaseDate = "2016-03-17", _releaseQuality = "Needs Vote", _releaseNotes = "The album comes in a 926.2 Kb 7z archive containing 10\226\129\182\194\179+1 similar five-second tracks for a total of about 158,5\195\151&10\226\129\181\226\129\180 years.\n\nWas published at Archive.org", _releaseArtists = [ArtistRelation {_artistRelId = "2868829", _artistRelAnv = "", _artistRelJoin = "", _artistRelRole = ""}], _releaseExArtists = [ArtistRelation {_artistRelId = "2234455", _artistRelAnv = "", _artistRelJoin = "", _artistRelRole = "Painting [Uncredited]"}], _releaseLabels = [ReleaseLabel {_reLabLabel = "Genetic Trance", _reLabCatno = "GT662"}], _releaseFormats = [ReleaseFormat {_reFmtName = "File", _reFmtText = "32 kbps", _reFmtQty = "1000000000000000000000000000000000000000000000000000000000000001", _reFmtDescriptions = ["MP3","Album","Mono"]}], _releaseTracks = [ReleaseTrack {_reTrkIdx = "1", _reTrkTitle = "\226\136\158", _reTrkPosition = "1", _reTrkDuration = "0:05", _reTrkArtists = [], _reTrkExArtists = []},ReleaseTrack {_reTrkIdx = "2", _reTrkTitle = "\226\136\158", _reTrkPosition = "2\226\128\146&10\226\129\182\194\179+1", _reTrkDuration = "", _reTrkArtists = [], _reTrkExArtists = []}], _releaseIdentifiers = [], _releaseVideos = [], _releaseCompanies = [], _releaseGenres = ["Electronic"], _releaseStyles = ["Ambient","Abstract"]}
Segmentation fault (core dumped)

Add Foreign Keys to the tables

Foreign Keys are commented, what's the reason for this? (it can be very handy when using graphql tools like Hasura etc.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.