clrnd / discogs2pg Goto Github PK
View Code? Open in Web Editor NEWDiscogs to PostgreSQL importer
License: BSD 3-Clause "New" or "Revised" License
Discogs to PostgreSQL importer
License: BSD 3-Clause "New" or "Revised" License
The composite primary keys on indexes.sql are not unique and are not reliable not-null.
When trying to parse releases i get the following error every time.
$ stack exec -- discogs2pg -c "host=localhost dbname=discogs_data" discogs_20161001_releases.xml
Parsing releases...
release = 7642045
release_artist = 8870676
release_extraartist = 26472485
release_label = 8651410
discogs2pg: user error (Database.PostgreSQL.Simple.Copy.putCopyEnd: failed to parse command status)
The following tables do not get populated.
release_company
release_format
release_identifier
release_video
track
track_artist
track_extraartist
Thanks for building this tool! Its saved me a ton of time in gathering the migrating the Discogs Data for a pet project of mine.
I have a question about the intended usage of the package AFTER an initial data dump parse: If I run the script on a batch of xml files from October and then run it again on the next set of files in November, is there an intermediate action I should take?
Not sure if it's due to the indices on the release-related tables, but I've been running the script on the newest data dump after having filled the tables with last months dump and I'm finding the COPYing process to be pretty slow. Over 12 hours now and it still hasn't finished.
Am I better off just dropping the tables altogether and then refilling them with the new data? That way it can just bulk copy?
Passing the --aggressive results in an error. I also can't find anything in the source code? Is this implemented?
In the table release there is no column barcode, any reason ?
Hi,
I'm trying to import the latest dump (20180401) and getting following error while importing:
discogs2pg: user error (Database.PostgreSQL.Simple.Copy.putCopyEnd: failed to parse command status Connection error: ERROR: value "999999999999" is out of range for type integer CONTEXT: COPY release_format, line 9856798, column qty: "999999999999"
I don't really understand what it is happening, but is the latest dump too big?
I imported the schema and ran discogs2pg -d 20180201
. This takes a while and fills up the database, but after it finishes none of the releases are mapped to a master:
simon=# select * from release where master_id is not null;
id | status | title | country | released | notes | genres | styles | master_id | data_quality
----+--------+-------+---------+----------+-------+--------+--------+-----------+--------------
(0 rows)
simon=# select count(*) from release;
count
---------
9442718
(1 row)
simon=#
Am I doing something wrong? Perhaps something has changed in the exports from Discogs?
Hi, getting segmentation fault, not sure how to fix. I can get the other tables like artist, release to work however releases I keep getting this error
~/Documents/discogs2023$ discogs2pg -g -c dbname=discogs discogs_20231001_releases.xml.gz
Parsing releases...
Avoiding entry for ridiculous relase - One Vigintillion(10^63): Release {_releaseId = "8262262", _releaseMasterId = "", _releaseStatus = "Accepted", _releaseTitle = "One Vigintillion(10^63) Songs", _releaseCountry = "Ukraine", _releaseDate = "2016-03-17", _releaseQuality = "Needs Vote", _releaseNotes = "The album comes in a 926.2 Kb 7z archive containing 10\226\129\182\194\179+1 similar five-second tracks for a total of about 158,5\195\151&10\226\129\181\226\129\180 years.\n\nWas published at Archive.org", _releaseArtists = [ArtistRelation {_artistRelId = "2868829", _artistRelAnv = "", _artistRelJoin = "", _artistRelRole = ""}], _releaseExArtists = [ArtistRelation {_artistRelId = "2234455", _artistRelAnv = "", _artistRelJoin = "", _artistRelRole = "Painting [Uncredited]"}], _releaseLabels = [ReleaseLabel {_reLabLabel = "Genetic Trance", _reLabCatno = "GT662"}], _releaseFormats = [ReleaseFormat {_reFmtName = "File", _reFmtText = "32 kbps", _reFmtQty = "1000000000000000000000000000000000000000000000000000000000000001", _reFmtDescriptions = ["MP3","Album","Mono"]}], _releaseTracks = [ReleaseTrack {_reTrkIdx = "1", _reTrkTitle = "\226\136\158", _reTrkPosition = "1", _reTrkDuration = "0:05", _reTrkArtists = [], _reTrkExArtists = []},ReleaseTrack {_reTrkIdx = "2", _reTrkTitle = "\226\136\158", _reTrkPosition = "2\226\128\146&10\226\129\182\194\179+1", _reTrkDuration = "", _reTrkArtists = [], _reTrkExArtists = []}], _releaseIdentifiers = [], _releaseVideos = [], _releaseCompanies = [], _releaseGenres = ["Electronic"], _releaseStyles = ["Ambient","Abstract"]}
Segmentation fault (core dumped)
You have done exactly what I wanted to do with philipmat/discogs-xml2db#35
I'm tempted to swap to your project but i know less about Haskell than I even did about Python, so I am hesitant, do you know what differences the are between the tables generated by the two approaches ?
Foreign Keys are commented, what's the reason for this? (it can be very handy when using graphql tools like Hasura etc.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.