This sucks:
SELECT X.* FROM no_chain_samplemodel as X
JOIN (SELECT user_id, MAX(timestamp) AS timestamp
FROM no_chain_samplemodel
GROUP BY user_id) AS Y
ON (X.user_id = Y.user_id and X.timestamp = Y.timestamp)i
WHERE X.staff_id = %s
This is great:
SELECT DISTINCT ON (user_id) FROM no_chain_samplemodel
WHERE timestamp <= '2017-01-01' ORDER BY user_id ASC, timestamp DESC;
With the Django ORM:
SampleModel.objects.distinct('user_id').\
filter(timestamp__gt=mydate).order_by('user_id', '-timestamp')
From this
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
...
}
}
To this
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
...
}
}
Massive fintech Django project:
- 190k lines of Python code
- > 100 Apps
- 383 tables to migrate
- 3000 tests running in ~3 minutes
- adapt code
- migrate data
- profit!
Pgloader to the rescue:
No live replication == Downtime!!!
- drop foreign keys (ForeignKey → IntegerField)
- adapt queries
- write a database router
- get Postgres on CI (stable tests)
- search for untested raw queries
- manual testing on real data
- use migrations for everything
- test everything
- NEVER rely on implicit ordering
- make Django apps really independent
- split that monolith ASAP
Hofstadter’s Database Migration Law:
Migrating from MySQL to Postgres always takes longer than you expect, even when you take into account Hofstadter’s Law.
@andreacrotti https://www.iwoca.co.uk/