pulp / pulp-2to3-migration Goto Github PK
View Code? Open in Web Editor NEWA migration tool from Pulp 2 to Pulp 3
License: GNU General Public License v2.0
A migration tool from Pulp 2 to Pulp 3
License: GNU General Public License v2.0
⚠️ ⛔️ Pulp2 is EOL as of November 30 2022, for more info visit this link https://pulpproject.org/2022/09/19/pulp-2-eol/. ⛔️ Pulp is a platform for managing repositories of content, such as software packages, and pushing that content out to large numbers of consumers. For more information, check out the project website: http://www.pulpproject.org
Version
0.11.6
Description of problem:
Having many docker_blobs, pulp2to3 migration fails with:
Processing Pulp2 repositories, importers, distributors 3253/3283
Pre-migrating Pulp 2 docker_blob content 10000/25670
Initial Migration steps complete. Migration failed, You will want to investigate: https://satellite.example.com/foreman_tasks/tasks/fd24355e-9fea-4aab-8a49-4751e9dfb9ab rake aborted!
ForemanTasks::TaskError: Task fd24355e-9fea-4aab-8a49-4751e9dfb9ab: Katello::Errors::Pulp3Error: Sort exceeded memory limit of 104857600 bytes
The reason is an aggregate method requires from mongo more memory than mongo limit is (100MB).
How reproducible:
100% in a scaled environment
Steps to Reproduce:
Actual results:
2 fails with the above error, plus /var/log/messages have backtrace:
Feb 8 10:44:06 satellite pulpcore-worker-1: pulp: rq.worker:ERROR: Traceback (most recent call last):
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Feb 8 10:44:06 satellite pulpcore-worker-1: rv = job.perform()
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Feb 8 10:44:06 satellite pulpcore-worker-1: self._result = self._execute()
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
Feb 8 10:44:06 satellite pulpcore-worker-1: return self.func(*self.args, **self.kwargs)
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/tasks/migrate.py", line 77, in migrate_from_pulp2
Feb 8 10:44:06 satellite pulpcore-worker-1: pre_migrate_all_content(plan)
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/pre_migration.py", line 70, in pre_migrate_all_content
Feb 8 10:44:06 satellite pulpcore-worker-1: pre_migrate_content_type(content_model, mutable_type, lazy_type, premigrate_hook)
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/pre_migration.py", line 124, in pre_migrate_content_type
Feb 8 10:44:06 satellite pulpcore-worker-1: pulp2_content_ids = premigrate_hook()
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/docker/utils.py", line 17, in find_tags
Feb 8 10:44:06 satellite pulpcore-worker-1: result = pulp2_models.Tag.objects.aggregate([sort_stage, group_stage1, group_stage2])
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib/python3.6/site-packages/mongoengine/queryset/base.py", line 1318, in aggregate
Feb 8 10:44:06 satellite pulpcore-worker-1: return collection.aggregate(final_pipeline, cursor={}, **kwargs)
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib64/python3.6/site-packages/pymongo/collection.py", line 2458, in aggregate
Feb 8 10:44:06 satellite pulpcore-worker-1: **kwargs)
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib64/python3.6/site-packages/pymongo/collection.py", line 2377, in _aggregate
Feb 8 10:44:06 satellite pulpcore-worker-1: retryable=not cmd._performs_write)
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1471, in _retryable_read
Feb 8 10:44:06 satellite pulpcore-worker-1: return func(session, server, sock_info, slave_ok)
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib64/python3.6/site-packages/pymongo/aggregation.py", line 148, in get_cursor
Feb 8 10:44:06 satellite pulpcore-worker-1: user_fields=self._user_fields)
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib64/python3.6/site-packages/pymongo/pool.py", line 694, in command
Feb 8 10:44:06 satellite pulpcore-worker-1: exhaust_allowed=exhaust_allowed)
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib64/python3.6/site-packages/pymongo/network.py", line 162, in command
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib64/python3.6/site-packages/pymongo/network.py", line 162, in command
Feb 8 10:44:06 satellite pulpcore-worker-1: parse_write_concern_error=parse_write_concern_error)
Feb 8 10:44:06 satellite pulpcore-worker-1: File "/usr/lib64/python3.6/site-packages/pymongo/helpers.py", line 168, in _check_command_response
Feb 8 10:44:06 satellite pulpcore-worker-1: max_wire_version)
Feb 8 10:44:06 satellite pulpcore-worker-1: pymongo.errors.OperationFailure: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in., full error: {'ok': 0.0, 'errmsg': 'Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.', 'code': 16819, 'codeName': 'Location16819'}
Expected results:
Clean pre-migration.
Additional info:
I guess /usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/docker/utils.py", line 17 should be:
result = pulp2_models.Tag.objects.aggregate([sort_stage, group_stage1, group_stage2], allowDiskUse=true)
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118537
Version
satellite-6.9.9-1.el7sat.noarch
tfm-rubygem-katello-3.18.1.53-1.el7sat.noarch
tfm-rubygem-pulp_2to3_migration_client-0.10.0-1.el7sat.noarch
Describe the bug
This appears to be an error which is caused by too many tags are returned in a single document and exceeded the 16MB BSON size limit.
Content migration starting. These steps may take a while to complete. Refer to `foreman-maintain content migration-stats` for an estimate.
Initial Migration steps complete.
...
Initial Migration steps complete.Migration failed, You will want to investigate: https://satellite.example.com/foreman_tasks/tasks/xxxxx-xxxxx-xxxx-xxxxxxx
rake aborted!
ForemanTasks::TaskError: Task xxxxx-xxxxx-xxxx-xxxxxxx: Katello::Errors::Pulp3Error: BSONObj size: 32817579 (XXXXXX) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 0, full error: {'ok': 0.0, 'errmsg': 'BSONObj size: 32817579 (XXXXXX) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 0', 'code': XXXX, 'codeName': 'LocationXXXX'}
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.53/lib/katello/tasks/pulp3_migration.rake:43:in `block (2 levels) in <top (required)>'
/opt/rh/rh-ruby25/root/usr/share/gems/gems/rake-12.3.3/exe/rake:27:in `<top (required)>'
Tasks: TOP => katello:pulp3_migration
(See full trace by running task with --trace)
[31m[1m[FAIL][0m
Failed executing preserve_output=true foreman-rake katello:pulp3_migration, exit status 1
--------------------------------------------------------------------------------
Scenario [Prepare content for Pulp 3] failed.
The following steps ended up in failing state:
[content-prepare]
Traceback:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
rv = job.perform()
File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
self._result = self._execute()
File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
return self.func(*self.args, **self.kwargs)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/tasks/migrate.py", line 77, in migrate_from_pulp2
pre_migrate_all_content(plan)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/pre_migration.py", line 70, in pre_migrate_all_content
pre_migrate_content_type(content_model, mutable_type, lazy_type, premigrate_hook)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/pre_migration.py", line 124, in pre_migrate_content_type
pulp2_content_ids = premigrate_hook()
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/docker/utils.py", line 18, in find_tags
[sort_stage, group_stage1, group_stage2], allowDiskUse=True
File "/usr/lib/python3.6/site-packages/mongoengine/queryset/base.py", line 1318, in aggregate
return collection.aggregate(final_pipeline, cursor={}, **kwargs)
File "/usr/lib64/python3.6/site-packages/pymongo/collection.py", line 2458, in aggregate
**kwargs)
File "/usr/lib64/python3.6/site-packages/pymongo/collection.py", line 2377, in _aggregate
retryable=not cmd._performs_write)
File "/usr/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1471, in _retryable_read
return func(session, server, sock_info, slave_ok)
File "/usr/lib64/python3.6/site-packages/pymongo/aggregation.py", line 148, in get_cursor
user_fields=self._user_fields)
File "/usr/lib64/python3.6/site-packages/pymongo/pool.py", line 694, in command
exhaust_allowed=exhaust_allowed)
File "/usr/lib64/python3.6/site-packages/pymongo/network.py", line 162, in command
parse_write_concern_error=parse_write_concern_error)
File "/usr/lib64/python3.6/site-packages/pymongo/helpers.py", line 168, in _check_command_response
max_wire_version)
pymongo.errors.OperationFailure: BSONObj size: 32817579 (XXXXX) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 0, full error: {'ok': 0.0, 'errmsg': 'BSONObj size: 32817579 (XXXXX) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 0', 'code': XXXX, 'codeName': 'LocationXXXX'}
Mongo query
# mongo pulp_database
> db.units_docker_tag.count()
1369273
The following mongodb query is fetching all unique tag ids an return all in one document.
/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/docker/utils.py
def find_tags():
"""
Find tags that have same name within the repo.
Return only one tag out of 2 tags with the same name.
Prefer schema2 over schema1.
"""
# sort the schema version in desc mode.
sort_stage = {'$sort': {'schema_version': -1}}
# group tags by name and repo_id; take just first result out of the 2 tags with the same name
group_stage1 = {'$group': {'_id': {'name': '$name', 'repo_id': '$repo_id'},
'tags_id': {'$first': '$_id'}}}
group_stage2 = {'$group': {'_id': None, 'tags_ids': {'$addToSet': '$tags_id'}}}
result = pulp2_models.Tag.objects.aggregate(
[sort_stage, group_stage1, group_stage2], allowDiskUse=True
)
if result._has_next():
return result.next()['tags_ids']
return []
To Reproduce
Pulpcore 3.20 is a natural point to cease ongoing maintenance of the migration plugin.
The condition to produce the bug is similar to https://bugzilla.redhat.com/show_bug.cgi?id=2020473 but this issue occurs during content migration rather than repository sync
To reproduce: Have some custom repository (ex. Fedora 35 synced from https://dl.fedoraproject.org/pub/fedora/linux/releases/35/Everything/x86_64/os/ ) which contains an RPM that has files with '{{' or similar characters in file names. Then filelists.xml will be like:
In "6dc8dae1be904c2613d5aa3667dacd2554d05077eb1ce4296b6edfa3c4db3a46-filelists.xml.gz"
<package pkgid="f2056c2614a2e47efd63138cf9ad2fac9ea5ea193d736fc35069c1bf6723d549" name="R-rlang" arch="x86_64">
<version epoch="0" ver="0.4.11" rel="3.fc35"/>
<file type="dir">/usr/lib/.build-id</file>
<file type="dir">/usr/lib/.build-id/42</file>
<file>/usr/lib/.build-id/42/5049f5a9e0d1ac25c21de795f28fe886bfa0ca</file>
<file type="dir">/usr/lib64/R/library/rlang</file>
<file>/usr/lib64/R/library/rlang/DESCRIPTION</file>
<file>/usr/lib64/R/library/rlang/INDEX</file>
<file>/usr/lib64/R/library/rlang/LICENSE</file>
<file type="dir">/usr/lib64/R/library/rlang/Meta</file>
snip...
<file>/usr/lib64/R/library/rlang/help/wref_key.html</file>
<file>/usr/lib64/R/library/rlang/help/wref_value.html</file>
<file>/usr/lib64/R/library/rlang/help/zap.html</file>
<file>/usr/lib64/R/library/rlang/help/zap_srcref.html</file>
<file>/usr/lib64/R/library/rlang/help/{{.html</file> <================= TemplateSyntaxError: Empty variable tag on line 548\n",
<file>/usr/lib64/R/library/rlang/help/{{}}.html</file> <=================
And cause the TemplateSyntaxError: Empty variable tag when the migration plugin calls render_filelists
As $summary, migration failed with:
error:
traceback: |2
File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
rv = job.perform()
File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
self._result = self._execute()
File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
return self.func(*self.args, **self.kwargs)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/tasks/migrate.py", line 81, in migrate_from_pulp2
migrate_content(plan, skip_corrupted=skip_corrupted)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 55, in migrate_content
plugin.migrator.migrate_content_to_pulp3(skip_corrupted=skip_corrupted)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/docker/migrator.py", line 106, in migrate_content_to_pulp3
loop.run_until_complete(dm.create())
File "/usr/lib64/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/content.py", line 89, in create
await pipeline
File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py", line 225, in create_pipeline
await asyncio.gather(*futures)
File "/usr/lib/python3.6/site-packages/pulpcore/plugin/stages/api.py", line 43, in call
await self.run()
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/docker/migrator.py", line 154, in run
thru = self.relate_manifest_to_list(dc)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/docker/migrator.py", line 234, in relate_manifest_to_list
thru = ManifestListManifest(manifest_list=item, image_manifest=dc.content,
description: local variable 'item' referenced before assignment
See details here pulp/pulp_rpm#2305.
pulp-2to3-migration needs to take provide digest during DistributionTree creation.
This fix will be a compatibility release with pulp_rpm containing a fix for pulp/pulp_rpm#2305.
Version
0.11.12
Describe the bug
The fix for #550 failed under test with the following stacktrace:
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: pulp: rq.worker:ERROR: Traceback (most recent call last):
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: rv = job.perform()
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: self._result = self._execute()
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: return self.func(*self.args, **self.kwargs)
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 201, in complex_repo_migration
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: create_repo_version(progress_rv, pulp2_repo, pulp3_remote)
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 453, in create_repo_version
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: resolve_path_overlap(new_version)
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 419, in resolve_path_overlap
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: Content.objects.filter(pk__in=cas_with_conflicts[0].content.pk)
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: return getattr(self.get_queryset(), name)(*args, **kwargs)
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/django/db/models/query.py", line 892, in filter
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: return self._filter_or_exclude(False, *args, **kwargs)
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/django/db/models/query.py", line 910, in _filter_or_exclude
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: clone.query.add_q(Q(*args, **kwargs))
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/django/db/models/sql/query.py", line 1290, in add_q
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: clause, _ = self._add_q(q_object, self.used_aliases)
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/django/db/models/sql/query.py", line 1318, in _add_q
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: split_subq=split_subq, simple_col=simple_col,
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/django/db/models/sql/query.py", line 1251, in build_filter
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: condition = self.build_lookup(lookups, col, value)
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/django/db/models/sql/query.py", line 1116, in build_lookup
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: lookup = lookup_class(lhs, rhs)
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/django/db/models/lookups.py", line 20, in __init__
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: self.rhs = self.get_prep_lookup()
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: File "/usr/lib/python3.6/site-packages/django/db/models/lookups.py", line 204, in get_prep_lookup
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: for rhs_value in self.rhs:
Nov 02 05:07:04 dhcp-3-191.vms.sat.rdu2.redhat.com pulpcore-worker-2[127894]: TypeError: 'UUID' object is not iterable
To Reproduce
In pulp2, create a file-repo with 2 isos, 1.iso and 2.iso.
In mongo, rename 2.iso to "1.iso/22.iso".
Migrate.
Expected behavior
Migration should succeed.
Version
satellite-6.9.9-1.el7sat.noarch
tfm-rubygem-katello-3.18.1.53-1.el7sat.noarch
tfm-rubygem-pulp_2to3_migration_client-0.10.0-1.el7sat.noarch
Describe the bug
Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=2097204
The migration could spent long time or stuck in calculating the total repository rpms to be migrated if the total is extremely large (e.g. 2 millions++).
As we can see below, the migration stuck in "Migrating content to Pulp 3" for 3 days without any progress
2022-06-09 19:49:08 +1000: Migrating docker_tag content to Pulp 3 57057/62272
2022-06-09 19:49:18 +1000: Migrating docker_tag content to Pulp 3 60060/62272
2022-06-09 19:49:38 +1000: Initial Migration steps complete.
2022-06-09 19:49:48 +1000: Migrating content to Pulp 3 606012/3485647
...
2022-06-10 10:14:14 +1000: Migrating content to Pulp 3 606012/3485647
2022-06-10 10:14:24 +1000: Migrating content to Pulp 3 606012/3485647
...
...
2022-06-11 14:16:21 +1000: Migrating content to Pulp 3 606012/3485647
Checking the postgres activities. We found that the "count()" query has been active since 3 days.
# su - postgres -c "psql foreman -c \"select * from pg_stat_activity where datname = 'pulpcore' and state <> 'idle';\""
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin |
query | backend_type
---------+----------+-------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+------------------------------------------------+-----------------
123456 | pulpcore | XXXXXX | XXXXXX | pulp | | XX.XX.XX.XX | | XXXXXX | 2022-06-08 11:57:11.798522+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186243+10 | | | active | | XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c
ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | client backend
123456 | pulpcore | XXXXXX | XXXXXX | pulp | | | | | 2022-06-09 19:49:35.191817+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.196112+10 | | | active | | XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c
ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | parallel worker
123456 | pulpcore | XXXXXX | XXXXXX | pulp | | | | | 2022-06-09 19:49:35.192686+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.186242+10 | 2022-06-09 19:49:35.196497+10 | | | active | | XXXXXX | SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2c
ontent" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false))) | parallel worker
(3 rows)
We can see 1 RQ worker is busy
#ps -aux | grep '/usr/bin/rq
pulp XXXXX 64.1 0.3 1468496 836400 ? Sl Jun08 1854:22 /usr/bin/python3 /usr/bin/rq worker -w pulpcore.tasking.worker.PulpWorker -c pulpcore.rqconfig --disable-job-desc-logging
Finally the count() query is completed after running for 4.98 days.
2022-06-13 00:53:05 UTC LOG: duration: 430300515.900 ms statement: SELECT COUNT(*) AS "__count" FROM "pulp_2to3_migration_pulp2rpm" INNER JOIN "pulp_2to3_migration_pulp2content" ON ("pulp_2to3_migration_pulp2rpm"."pulp2content_id" = "pulp_2to3_migration_pulp2content"."pulp_id") WHERE NOT (NOT ("pulp_2to3_migration_pulp2content"."pulp3_content_id" IS NULL) AND NOT ("pulp_2to3_migration_pulp2content"."pulp2_id" IN (SELECT DISTINCT U0."pulp2_unit_id" FROM "pulp_2to3_migration_pulp2lazycatalog" U0 WHERE U0."is_migrated" = false)))
Expected behavior
Better performance
Additional context
I think the poor query could be caused by the large contents + the sub query (large Pulp2LazyCatalog).
# /usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/content.py
if is_lazy_type:
# go through all of the content that haven't been migrated OR have been migrated
# but have new lazy catalog entries.
units_with_new_lces = Pulp2LazyCatalog.objects.filter(
is_migrated=False).values('pulp2_unit_id').distinct()
already_migrated = ~Q(pulp2content__pulp3_content=None)
no_new_lces = ~Q(pulp2content__pulp2_id__in=units_with_new_lces)
pulp_2to3_detail_qs = content_model.objects.exclude(already_migrated & no_new_lces) <=============== this query
else:
# go through all of the content that haven't been migrated
pulp_2to3_detail_qs = content_model.objects.filter(pulp2content__pulp3_content=None)
# order by pulp2_repo if it's set
if content_model.set_pulp2_repo:
pulp_2to3_detail_qs = pulp_2to3_detail_qs.order_by('repo_id')
with ProgressReport(
message='Migrating {} content to Pulp 3'.format(content_type),
code='migrating.{}.content'.format(self.migrator.pulp2_plugin),
total=pulp_2to3_detail_qs.count() <=================================== stuck here for days
) as pb:
select_extra = [
'pulp2content',
'pulp2content__pulp3_content',
]
if content_model.set_pulp2_repo:
select_extra.append('pulp2content__pulp2_repo')
pulp_2to3_detail_qs = pulp_2to3_detail_qs.select_related(*select_extra)
for pulp_2to3_detail_content in pulp_2to3_detail_qs.iterator(chunk_size=800): <======= and then stuck here again for days
dc = None
pulp2content = pulp_2to3_detail_content.pulp2content
Version
0.11+
Describe the bug
A second place where requesting "all" from mongo generates a too-large result.
To Reproduce
See #579 (comment) for details.
Author: quba42 (@quba42 )
Redmine Issue: 9616, https://pulp.plan.io/issues/9616
Inspite of this issue, the migration may complete without errors, or else it may cause errors during repo publish as follows:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
rv = job.perform()
File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
self._result = self._execute()
File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
return self.func(*self.args, **self.kwargs)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 246, in complex_repo_migration
migrated_repo.pulp3_repository_version, signing_service
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 496, in migrate_repo_distributor
pulp2dist, repo_version, signing_service)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/plugin/deb/repository.py", line 80, in migrate_to_pulp3
signing_service_pk=signing_service_pk
File "/usr/lib/python3.6/site-packages/pulp_deb/app/tasks/publishing.py", line 168, in publish
prc.package
File "/usr/lib/python3.6/site-packages/pulp_deb/app/tasks/publishing.py", line 207, in add_package
self.package_index_files[package.architecture][0]
KeyError: 'amd64'
signing_service_pk=signing_service_pk
File "/usr/lib/python3.6/site-packages/pulp_deb/app/tasks/publishing.py", line 168, in publish
prc.package
File "/usr/lib/python3.6/site-packages/pulp_deb/app/tasks/publishing.py", line 207, in add_package
self.package_index_files[package.architecture][0]
KeyError: 'amd64'
Independently of whether the above error is thrown or not, the resulting Pulp3 repo versions may be missing a large part of the content needed for the "structured" publish mode.
The content needed for the pulp3_deb "simple" publish mode is never affected.
The issue can only occur if multiple Pulp2 repos being migrated have Releases or Components, with the same values for the uniqueness constraints of the corresponding Pulp3 types. (In Pulp2 the repo ID is part of the uniqueness constraints, whereas in Pulp3 there is for example just one Release that is simple added to many repo versions). In such cases multiple Pulp2 units correspond to just one Pulp3 unit, and the migration will only add that Pulp3 unit to the repo versions corresponding to one of the original Pulp2 units (instead of all of them).
The fix appears to be for the affected deb 2to3 models to use the uniqueness constraints like the corresponding Pulp2 type rather than the corresponding Pulp3 type.
Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=2074099
Version
satellite-6.9.9-1.el7sat.noarch
tfm-rubygem-katello-3.18.1.53-1.el7sat.noarch
tfm-rubygem-pulp_2to3_migration_client-0.10.0-1.el7sat.noarch
Describe the bug
Description of problem:
The following error was supposed to be fixed via Bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2003888 but it still exists after upgrade to 6.9.8 when the count of ERRATA's are way to be migrated is very high.
pymongo.errors.DocumentTooLarge: BSON document too large (18938832 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
This happens even if we configure the value of PULP_CONTENT_PREMIGRATION_BATCH_SIZE as low as 25 and repeatedly try.
Version-Release number of selected component (if applicable):
Satellite 6.9.8
How reproducible:
At times when ERRATA count is very high (assuming)
Steps to Reproduce:
Same as Bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2003888.
Perhaps, sync a good amount of repo to have a huge amount of errata to be migrated during pulp 2 - pulp3 content migration and then attempt the same on a Satellite 6.9.8
Actual results:
Mar 28 14:06:50 satellite69 pulpcore-worker-3: RuntimeWarning)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: pulp: rq.worker:ERROR: Traceback (most recent call last):
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Mar 28 14:06:50 satellite69 pulpcore-worker-3: rv = job.perform()
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Mar 28 14:06:50 satellite69 pulpcore-worker-3: self._result = self._execute()
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
Mar 28 14:06:50 satellite69 pulpcore-worker-3: return self.func(*self.args, **self.kwargs)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/tasks/migrate.py", line 77, in migrate_from_pulp2
Mar 28 14:06:50 satellite69 pulpcore-worker-3: pre_migrate_all_content(plan)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/pre_migration.py", line 70, in pre_migrate_all_content
Mar 28 14:06:50 satellite69 pulpcore-worker-3: pre_migrate_content_type(content_model, mutable_type, lazy_type, premigrate_hook)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/pre_migration.py", line 301, in pre_migrate_content_type
Mar 28 14:06:50 satellite69 pulpcore-worker-3: record.id: record for record in batched_mongo_content_qs.no_cache()
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/pre_migration.py", line 300, in <dictcomp>
Mar 28 14:06:50 satellite69 pulpcore-worker-3: pulp2_content_by_id = {
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/mongoengine/queryset/base.py", line 1590, in __next__
Mar 28 14:06:50 satellite69 pulpcore-worker-3: raw_doc = next(self._cursor)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/cursor.py", line 1207, in next
Mar 28 14:06:50 satellite69 pulpcore-worker-3: if len(self.__data) or self._refresh():
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/cursor.py", line 1124, in _refresh
Mar 28 14:06:50 satellite69 pulpcore-worker-3: self.__send_message(q)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/cursor.py", line 1001, in __send_message
Mar 28 14:06:50 satellite69 pulpcore-worker-3: address=self.__address)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1372, in _run_operation_with_response
Mar 28 14:06:50 satellite69 pulpcore-worker-3: exhaust=exhaust)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1471, in _retryable_read
Mar 28 14:06:50 satellite69 pulpcore-worker-3: return func(session, server, sock_info, slave_ok)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1366, in _cmd
Mar 28 14:06:50 satellite69 pulpcore-worker-3: unpack_res)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/server.py", line 116, in run_operation_with_response
Mar 28 14:06:50 satellite69 pulpcore-worker-3: sock_info.send_message(data, max_doc_size)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/pool.py", line 711, in send_message
Mar 28 14:06:50 satellite69 pulpcore-worker-3: (max_doc_size, self.max_bson_size))
Mar 28 14:06:50 satellite69 pulpcore-worker-3: pymongo.errors.DocumentTooLarge: BSON document too large (18938832 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
Mar 28 14:06:50 satellite69 pulpcore-worker-3: Traceback (most recent call last):
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
Mar 28 14:06:50 satellite69 pulpcore-worker-3: rv = job.perform()
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
Mar 28 14:06:50 satellite69 pulpcore-worker-3: self._result = self._execute()
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
Mar 28 14:06:50 satellite69 pulpcore-worker-3: return self.func(*self.args, **self.kwargs)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/tasks/migrate.py", line 77, in migrate_from_pulp2
Mar 28 14:06:50 satellite69 pulpcore-worker-3: pre_migrate_all_content(plan)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/pre_migration.py", line 70, in pre_migrate_all_content
Mar 28 14:06:50 satellite69 pulpcore-worker-3: pre_migrate_content_type(content_model, mutable_type, lazy_type, premigrate_hook)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/pre_migration.py", line 301, in pre_migrate_content_type
Mar 28 14:06:50 satellite69 pulpcore-worker-3: record.id: record for record in batched_mongo_content_qs.no_cache()
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/pre_migration.py", line 300, in <dictcomp>
Mar 28 14:06:50 satellite69 pulpcore-worker-3: pulp2_content_by_id = {
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib/python3.6/site-packages/mongoengine/queryset/base.py", line 1590, in __next__
Mar 28 14:06:50 satellite69 pulpcore-worker-3: raw_doc = next(self._cursor)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/cursor.py", line 1207, in next
Mar 28 14:06:50 satellite69 pulpcore-worker-3: if len(self.__data) or self._refresh():
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/cursor.py", line 1124, in _refresh
Mar 28 14:06:50 satellite69 pulpcore-worker-3: self.__send_message(q)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/cursor.py", line 1001, in __send_message
Mar 28 14:06:50 satellite69 pulpcore-worker-3: address=self.__address)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1372, in _run_operation_with_response
Mar 28 14:06:50 satellite69 pulpcore-worker-3: exhaust=exhaust)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1471, in _retryable_read
Mar 28 14:06:50 satellite69 pulpcore-worker-3: return func(session, server, sock_info, slave_ok)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/mongo_client.py", line 1366, in _cmd
Mar 28 14:06:50 satellite69 pulpcore-worker-3: unpack_res)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/server.py", line 116, in run_operation_with_response
Mar 28 14:06:50 satellite69 pulpcore-worker-3: sock_info.send_message(data, max_doc_size)
Mar 28 14:06:50 satellite69 pulpcore-worker-3: File "/usr/lib64/python3.6/site-packages/pymongo/pool.py", line 711, in send_message
Mar 28 14:06:50 satellite69 pulpcore-worker-3: (max_doc_size, self.max_bson_size))
Mar 28 14:06:50 satellite69 pulpcore-worker-3: pymongo.errors.DocumentTooLarge: BSON document too large (18938832 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
Expected results:
The satellite should be able properly to use the PULP_CONTENT_PREMIGRATION_BATCH_SIZE value and should be able to handle a large amount of ERRATA migration as well without any such errors.
Additional info:
I think it is because the query size is too large when using "id__in=". Based on what I observed from the below tests, Pymongo will send full query to server once and then follow by the "GetMore" operation. Regardless of what batch_size we set or what fields we are limiting, they don't affect the query size. That means if we query half a million of ids in 1 shot, we will (should) get the BSON limit error.
I am using RPM for testing because it always has more records than errata. Although, I still don't understand why the migration kept or only failed when migrating errata.
### Test 1: no batch size and no limit fields ###
PULP_SETTINGS=/etc/pulp/settings.py pulpcore-manager shell
>>> from pulp_2to3_migration.pulp2 import connection
>>> from pulp_2to3_migration.app.plugin.rpm.pulp2_models import RPM
>>> connection.initialize()
>>> rpm_ids = RPM.objects.only("id").all().values_list("id")
>>> len(rpm_ids)
243594 <======================================= 24K rpms
>>> batched_mongo_content_qs = RPM.objects(id__in=rpm_ids)
>>> pulp2_content_by_id = {record.id: record for record in batched_mongo_content_qs.no_cache()}
>>>>operation
<pymongo.message._GetMore object at 0x7fae547edf48>
max_doc_size: 48 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._Query object at 0x7fae6b272eb8>
max_doc_size: 11825054 vs max_bson_size: 16777216 <================== size of the query almost hits the 16MB limit
>>>>operation
<pymongo.message._GetMore object at 0x7fae921daec8> <================ subsequent getMore Ops are fine
max_doc_size: 48 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7fae89c7eec8>
max_doc_size: 48 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7fae543fef48>
max_doc_size: 48 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7fae886859c8>
max_doc_size: 48 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7fae930b8ac8>
max_doc_size: 48 vs max_bson_size: 16777216
...
### Test 2: Set batch size and limit fields make no different ###
>>> from pulp_2to3_migration.pulp2 import connection
>>> from pulp_2to3_migration.app.plugin.rpm.pulp2_models import RPM
>>> connection.initialize()
pulp: pulp_2to3_migration.pulp2.connection:INFO: Attempting to connect to localhost:27017
>>> rpm_ids = RPM.objects.only("id").all().values_list("id")
>>> len(rpm_ids)
243594
>>> mongo_fields = set(['id', '_storage_path', '_last_updated', '_content_type_id'])
>>> batched_mongo_content_qs = RPM.objects(id__in=rpm_ids).only(*mongo_fields).batch_size(50)
>>> pulp2_content_by_id = {record.id: record for record in batched_mongo_content_qs.no_cache()}
<pymongo.message._Query object at 0x7fdf81476a98>
>>>>operation
<pymongo.message._Query object at 0x7fdf81476a98>
max_doc_size: 11825155 vs max_bson_size: 16777216 <======================
>>>>operation
<pymongo.message._GetMore object at 0x7fdf813c40c8>
max_doc_size: 63 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7fdf813c40c8>
max_doc_size: 63 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7fdf813c40c8>
max_doc_size: 63 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7fdf813c40c8>
max_doc_size: 63 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7fdf813c40c8>
...
### Test 3: Reduce the query does seems to reduce the bson size ###
>>> from pulp_2to3_migration.pulp2 import connection
>>> from pulp_2to3_migration.app.plugin.rpm.pulp2_models import RPM
>>> connection.initialize()
pulp: pulp_2to3_migration.pulp2.connection:INFO: Attempting to connect to localhost:27017
>>> rpm_ids = RPM.objects.only("id").all().limit(5000).values_list("id") <================= limit 5000
>>> len(rpm_ids)
5000
>>> mongo_fields = set(['id', '_storage_path', '_last_updated', '_content_type_id'])
>>> batched_mongo_content_qs = RPM.objects(id__in=rpm_ids).only(*mongo_fields).batch_size(50)
>>> pulp2_content_by_id = {record.id: record for record in batched_mongo_content_qs.no_cache()}
<pymongo.message._Query object at 0x7f39078bf1a8>
>>>>operation
<pymongo.message._Query object at 0x7f39078bf1a8>
max_doc_size: 234049 vs max_bson_size: 16777216 <=========================== query size is much smaller
>>>>operation
<pymongo.message._GetMore object at 0x7f39071d6cc8>
max_doc_size: 63 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7f39071d6cc8>
max_doc_size: 63 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7f39071d6cc8>
max_doc_size: 63 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7f39071d6cc8>
max_doc_size: 63 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7f39071d6cc8>
max_doc_size: 63 vs max_bson_size: 16777216
>>>>operation
<pymongo.message._GetMore object at 0x7f39071d6cc8>
max_doc_size: 63 vs max_bson_size: 16777216
...
The plugin is reaching its end of life at the end of 2022.
Author: ggainey (@ggainey )
Redmine Issue: 9191, https://pulp.plan.io/issues/9191
Advisory-migration only allows datetime formats. Unfortunately, createrepo allows date-fields to be timestamps, which happens "in the wild" with some SUSE repositories.
Teach advisory-migration to try decoding-as-timestamp if decoding-as-datetime fails.
This was originally reported at https://bugzilla.redhat.com/show_bug.cgi?id=2091438 (pasted below for easier reference) as a fix for an issue with the migration process where pulpcore's repository.py
would fail with a traceback stating that the argument passed to the remove_content()
function did not implement the count()
method.
This is because remove_content()
expects a QuerySet
object, but migration.py
is passing a single Content
object instead in this situation.
(Pasting the original bug report at https://bugzilla.redhat.com/show_bug.cgi?id=2091438 that includes the observed traceback.)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=-
Description of problem:
During the migration from pulp2 to pulp3 on a Satellite 6.9.9 an error happens during the content-prepare phase while migrating RPMs, logging this traceback:
"pulp_href": "/pulp/api/v3/tasks/e3ef9998-a19d-4def-8f69-305ffff669b3/",
"pulp_created": "2022-05-27T17:08:43.443046Z",
"state": "failed",
"name": "pulp_2to3_migration.app.migration.complex_repo_migration",
"started_at": "2022-05-27T17:19:30.170348Z",
"finished_at": "2022-05-27T17:19:31.831923Z",
"error": {
"traceback": " File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job\n rv = job.perform()\n File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform\n self._result = self._execute()\n File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute\n return self.func(*self.args, **self.kwargs)\n File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 201, in complex_repo_migration\n create_repo_version(progress_rv, pulp2_repo, pulp3_remote)\n File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 451, in create_repo_version\n resolve_path_overlap(new_version)\n File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 418, in resolve_path_overlap\n version.remove_content(cas_with_conflicts[0].content)\n File "/usr/lib/python3.6/site-packages/pulpcore/app/models/repository.py", line 618, in remove_content\n if not content or not content.count():\n",
"description": "'Content' object has no attribute 'count'"
Reformatting the traceback for readability:
"traceback": " File "/usr/lib/python3.6/site-packages/rq/worker.py", line 936, in perform_job
rv = job.perform()
File "/usr/lib/python3.6/site-packages/rq/job.py", line 684, in perform
self._result = self._execute()
File "/usr/lib/python3.6/site-packages/rq/job.py", line 690, in _execute
return self.func(*self.args, **self.kwargs)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 201, in complex_repo_migration
create_repo_version(progress_rv, pulp2_repo, pulp3_remote)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 451, in create_repo_version
resolve_path_overlap(new_version)
File "/usr/lib/python3.6/site-packages/pulp_2to3_migration/app/migration.py", line 418, in resolve_path_overlap
version.remove_content(cas_with_conflicts[0].content)
File "/usr/lib/python3.6/site-packages/pulpcore/app/models/repository.py", line 618, in remove_content
if not content or not content.count():",
"description": "'Content' object has no attribute 'count'
Inspecting the use of content.count() in app/models/repository.py, we find out the Content class really doesn't implement a count() method, so this condition seems bound to fail every single time.
I wonder if content.count() here was intended as a test whether content
is iterable, or maybe as a check that content
contains more than 0 items (and is, by definition, iterable).
Assuming the former, I modified the test in app/models/repository.py, from:
if not content or not content.count():
to:
if not content or not hasattr(content, '__iter__'):
The modified code ran the migration successfully.
I don't know yet what would cause content
to be iterable or not -- I suppose it could hold either a single Content object or a collection of Content objects.
Version-Release number of selected component (if applicable):
satellite-6.9.9-1.el7sat.noarch
python3-pulpcore-3.7.9-1.el7pc.noarch
python3-pulp-2to3-migration-0.11.10-1.el7pc.noarch
python3-pulp-certguard-1.0.3-1.el7pc.noarch
python3-pulp-container-2.1.2-1.el7pc.noarch
python3-pulp-file-1.3.0-1.el7pc.noarch
python3-pulp-rpm-3.11.4-1.el7pc.noarch
How reproducible:
100% in this particular Satellite attempting a pulp-2to3 migration.
Steps to Reproduce:
Actual results:
Some RPMs into the migration process, the traceback above printed as part of the pulp task failure.
Expected results:
The content prepare step works.
Additional info:
As an actual first step in the investigation I did for this support ticket, I simply removed the not content.count()
condition and restarted the content prepare task, which failed at a different point with a traceback stating "content object is not iterable". So I assumed the content.count()
condition was to check for iterability and went with my modified condition, not hasattr(content, '__iter__')
, which in this particular case resolved the issue.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=--=-=-=-=-=-=-=-=-=-
See details here pulp/pulp_rpm#2305.
pulp-2to3 migration needs to use the new stage provided by pulp_rpm in the fix for pulp/pulp_rpm#2326.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.