clrnd / discogs2pg Goto Github PK

View Code? Open in Web Editor NEW

35.0 35.0 8.0 39 KB

Discogs to PostgreSQL importer

License: BSD 3-Clause "New" or "Revised" License

Haskell 88.28% PLpgSQL 11.72%

discogs2pg's People

Contributors

Stargazers

Watchers

Forkers

addodelgrossi larrabee rayrrr ragyhaddad ekeimaja ta264 otosky

discogs2pg's Issues

Composite primary keys fail in index.sql

The composite primary keys on indexes.sql are not unique and are not reliable not-null.

failed to parse command status

When trying to parse releases i get the following error every time.

$ stack exec -- discogs2pg -c "host=localhost dbname=discogs_data" discogs_20161001_releases.xml
Parsing releases...
release = 7642045
release_artist = 8870676
release_extraartist = 26472485
release_label = 8651410
discogs2pg: user error (Database.PostgreSQL.Simple.Copy.putCopyEnd: failed to parse command status)

The following tables do not get populated.
release_company
release_format
release_identifier
release_video
track
track_artist
track_extraartist

slow execution after running script on a new datadump

Thanks for building this tool! Its saved me a ton of time in gathering the migrating the Discogs Data for a pet project of mine.

I have a question about the intended usage of the package AFTER an initial data dump parse: If I run the script on a batch of xml files from October and then run it again on the next set of files in November, is there an intermediate action I should take?

Not sure if it's due to the indices on the release-related tables, but I've been running the script on the newest data dump after having filled the tables with last months dump and I'm finding the COPYing process to be pretty slow. Over 12 hours now and it still hasn't finished.

Am I better off just dropping the tables altogether and then refilling them with the new data? That way it can just bulk copy?

--aggressive flag doesn't work or not implemented?

Passing the --aggressive results in an error. I also can't find anything in the source code? Is this implemented?

Table release not found barcode collumn

In the table release there is no column barcode, any reason ?

Error while importing

Hi,

I'm trying to import the latest dump (20180401) and getting following error while importing:

discogs2pg: user error (Database.PostgreSQL.Simple.Copy.putCopyEnd: failed to parse command status Connection error: ERROR: value "999999999999" is out of range for type integer CONTEXT: COPY release_format, line 9856798, column qty: "999999999999"

I don't really understand what it is happening, but is the latest dump too big?

Release - Master mapping is missing

I imported the schema and ran discogs2pg -d 20180201. This takes a while and fills up the database, but after it finishes none of the releases are mapped to a master:

simon=# select * from release where master_id is not null;
 id | status | title | country | released | notes | genres | styles | master_id | data_quality
----+--------+-------+---------+----------+-------+--------+--------+-----------+--------------
(0 rows)

simon=# select count(*) from release;
  count
---------
 9442718
(1 row)

simon=#

Am I doing something wrong? Perhaps something has changed in the exports from Discogs?

Segmentation Fault Error

Hi, getting segmentation fault, not sure how to fix. I can get the other tables like artist, release to work however releases I keep getting this error

~/Documents/discogs2023$ discogs2pg -g -c dbname=discogs discogs_20231001_releases.xml.gz
Parsing releases...
Avoiding entry for ridiculous relase - One Vigintillion(10^63): Release {_releaseId = "8262262", _releaseMasterId = "", _releaseStatus = "Accepted", _releaseTitle = "One Vigintillion(10^63) Songs", _releaseCountry = "Ukraine", _releaseDate = "2016-03-17", _releaseQuality = "Needs Vote", _releaseNotes = "The album comes in a 926.2 Kb 7z archive containing 10\226\129\182\194\179+1 similar five-second tracks for a total of about 158,5\195\151&10\226\129\181\226\129\180 years.\n\nWas published at Archive.org", _releaseArtists = [ArtistRelation {_artistRelId = "2868829", _artistRelAnv = "", _artistRelJoin = "", _artistRelRole = ""}], _releaseExArtists = [ArtistRelation {_artistRelId = "2234455", _artistRelAnv = "", _artistRelJoin = "", _artistRelRole = "Painting [Uncredited]"}], _releaseLabels = [ReleaseLabel {_reLabLabel = "Genetic Trance", _reLabCatno = "GT662"}], _releaseFormats = [ReleaseFormat {_reFmtName = "File", _reFmtText = "32 kbps", _reFmtQty = "1000000000000000000000000000000000000000000000000000000000000001", _reFmtDescriptions = ["MP3","Album","Mono"]}], _releaseTracks = [ReleaseTrack {_reTrkIdx = "1", _reTrkTitle = "\226\136\158", _reTrkPosition = "1", _reTrkDuration = "0:05", _reTrkArtists = [], _reTrkExArtists = []},ReleaseTrack {_reTrkIdx = "2", _reTrkTitle = "\226\136\158", _reTrkPosition = "2\226\128\146&10\226\129\182\194\179+1", _reTrkDuration = "", _reTrkArtists = [], _reTrkExArtists = []}], _releaseIdentifiers = [], _releaseVideos = [], _releaseCompanies = [], _releaseGenres = ["Electronic"], _releaseStyles = ["Ambient","Abstract"]}
Segmentation fault (core dumped)

How comptaible is the postgres output with discogs-xml2db

You have done exactly what I wanted to do with philipmat/discogs-xml2db#35
I'm tempted to swap to your project but i know less about Haskell than I even did about Python, so I am hesitant, do you know what differences the are between the tables generated by the two approaches ?

Add Foreign Keys to the tables

Foreign Keys are commented, what's the reason for this? (it can be very handy when using graphql tools like Hasura etc.)

clrnd / discogs2pg Goto Github PK

discogs2pg's People

Contributors

Stargazers

Watchers

Forkers

discogs2pg's Issues

Composite primary keys fail in index.sql

failed to parse command status

slow execution after running script on a new datadump

--aggressive flag doesn't work or not implemented?

Table release not found barcode collumn

Error while importing

Release - Master mapping is missing

Segmentation Fault Error

How comptaible is the postgres output with discogs-xml2db

Add Foreign Keys to the tables

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent