Comments (10)
There's a couple logging lines in both bookwyrm_export_job and Bookwyrm_import_job. Not done by me.
If you'd like to go further. It's been a little while but, in terms of error handling and logging, you can override the on_failure method on Celery classes/tasks. If you want a generic handling of failure say to just log the exception then you can put this in job.py > Job class. You can also look at this example.
If you want more detailed logging say to log the size of an export you can put that closer to the logic say in BookwyrmExportJob.
If you want you can message me on Matrix or here and I can give it a shot if there's any information you specfically want logged.
from bookwyrm.
Not sure how exactly is it implemented in Bookwyrm, but connection also could be lost because of a simple timeout on DBMS side if something takes a very long time and makes no queries or keepalives in the mean time.
from bookwyrm.
I have a tingly feeling this is the bug to discuss my failed export on bookwyrm.social.
from bookwyrm.
Yes, almost certainly
from bookwyrm.
I'm trying to understand what's happening here, and how difficult it will be to fix. A few questions:
- is this saving the export file as a local file (on disk), or in memory (RAM)?
- is it only doing this whilst the export is running, or is the file for download being downloaded from server storage rather than in S3?
- If it's saving to disk whilst running but then uploading to S3, can we tell if the tarfile is then deleted from local?
- Do we have any useful error logs?
A cleanup process for old exports was on the original list but we left it for another day. Possibly that needs to be revisted sooner rather than later.
Some things I've found that may or may not be helpful for this:
- https://github.com/xtream1101/s3-tar
- https://stackoverflow.com/questions/73045584/python-stream-a-tarfile-to-s3-using-multipart-upload
If I am remembering correctly, @dannymate did most of the work setting up Celery jobs, and @CSDUMMI did most of the work setting up the tarfile code, so they may be able to provide some suggestions.
from bookwyrm.
It does unfortunately seem like a sooner rather than later issue. I'm seeing the disk usage creep up and then revert back to previous levels once the task fails -- I believe it's on disk not in RAM, but anything sys admin is in my weaker area of knowledge.
The file does seem to remain on disk if the task runs successfully; I think this because I see gradual increases in disk usage in the monitoring that are consistent with and export as opposed to a backup, which would create a single jump in usage.
Traceback:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/opentelemetry/instrumentation/psycopg2/__init__.py", line 249, in execute
return _cursor_tracer.traced_execution(
File "/usr/local/lib/python3.9/site-packages/opentelemetry/instrumentation/dbapi/__init__.py", line 459, in traced_execution
return query_method(*args, **kwargs)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/bookwyrm/models/bookwyrm_export_job.py", line 46, in start_export_task
json_data = json_export(job.user)
File "/app/bookwyrm/models/bookwyrm_export_job.py", line 121, in json_export
for edition in editions:
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 280, in __iter__
self._fetch_all()
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1324, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/usr/local/lib/python3.9/site-packages/model_utils/managers.py", line 38, in __iter__
yield from iter
File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 51, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/django/__init__.py", line 596, in execute
return real_execute(self, sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.9/site-packages/opentelemetry/instrumentation/psycopg2/__init__.py", line 249, in execute
return _cursor_tracer.traced_execution(
File "/usr/local/lib/python3.9/site-packages/opentelemetry/instrumentation/dbapi/__init__.py", line 459, in traced_execution
return query_method(*args, **kwargs)
django.db.utils.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/django/db/backends/postgresql/base.py", line 277, in _set_autocommit
self.connection.autocommit = autocommit
psycopg2.InterfaceError: connection already closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/celery.py", line 275, in _inner
reraise(*exc_info)
File "/usr/local/lib/python3.9/site-packages/sentry_sdk/_compat.py", line 60, in reraise
raise value
File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/celery.py", line 270, in _inner
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 734, in __protected_call__
return self.run(*args, **kwargs)
File "/app/bookwyrm/models/bookwyrm_export_job.py", line 51, in start_export_task
job.set_status("failed")
File "/app/bookwyrm/models/job.py", line 82, in set_status
self.stop_job(reason="failed")
File "/app/bookwyrm/models/job.py", line 63, in stop_job
self.save(update_fields=["status", "complete", "updated_date"])
File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 739, in save
self.save_base(using=using, force_insert=force_insert,
File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 772, in save_base
with context_manager:
File "/usr/local/lib/python3.9/site-packages/django/db/transaction.py", line 207, in __enter__
connection.set_autocommit(False, force_begin_transaction_with_broken_autocommit=True)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/base/base.py", line 415, in set_autocommit
self._set_autocommit(autocommit)
File "/usr/local/lib/python3.9/site-packages/django/db/backends/postgresql/base.py", line 277, in _set_autocommit
self.connection.autocommit = autocommit
File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.9/site-packages/django/db/backends/postgresql/base.py", line 277, in _set_autocommit
self.connection.autocommit = autocommit
django.db.utils.InterfaceError: connection already closed
from bookwyrm.
At the moment, the archive files are stored in FileField
s. Which appear to be stored on the local disk.
https://github.com/bookwyrm-social/bookwyrm/blob/a4599d0374e2719b50d7d381ca61fbb8a6a3f4cb/bookwyrm/models/bookwyrm_export_job.py#L26C19-L26C28
from bookwyrm.
I was wondering about FileField
vs ImageField
, which is what everything else is stored as. Also I notice there's no upload_to
value set, though the Django docs are a bit unclear whether that's relevant.
from bookwyrm.
I'll take a look at this tomorrow.
from bookwyrm.
Ok I've run a test in dev with Digital Ocean S3, and read @mouse-reeve's traceback.
If I'm reading the traceback correctly, this is an error with the database connection: it's saying it couldn't even properly mark the job as failed, because that means updating the job
instance and therefore connecting to the DB, but the connection had already closed (unexpectedly). Mouse is suggesting that this connection error seems to have happened because the host ran out of space.
I've run an (admittedly very small) use export in a dev environment and can confirm:
- the final export file is definitely stored on s3
- the final export file is not saved to local disk after the job has run
That leaves open the question of how much disk space is required whilst the export is running. I feel like that may not be the whole story though, because the traceback tells us that this is failing on json_export
:
File "/app/bookwyrm/models/bookwyrm_export_job.py", line 46, in start_export_task
json_data = json_export(job.user)
File "/app/bookwyrm/models/bookwyrm_export_job.py", line 121, in json_export
for edition in editions:
The json_export
function doesn't do anything with files - that's tar_export
, which runs after json_export
. The JSON export is just a text file and whilst sure, it might get big, it's unlikely to be a "large file" by any reasonable definition. If we take the traceback at face value, there's a problem with the for edition in editions
loop. Reading back through it, I can see we're making seven database calls within each iteration of the loop, which ...seems like a lot. So maybe this could be assisted/resolved by adding more select_related
collections in get_books_for_user
? i.e. could this be a problem of too many database connections within a single thread?
from bookwyrm.
Related Issues (20)
- Alt view options for faster navigation of longer lists
- Upgrading from version 0.7.1 to version 0.7.2 HOT 2
- Separate Instance "Limit" / "Silence" / "Mute" option in addition to currently available "Block"
- Better clarification about the production nginx configuration
- Support Mastodon API-compatible login to external clients. HOT 2
- Account import from another instance fails HOT 3
- 2FA should have an option to persist in the current device
- Show that link to the book is being moderated HOT 6
- Cannot clone HOT 2
- embed a list into a websit throws errors on the console HOT 1
- [Enhancement] Integrate Mangrove Review Standard
- Automatically parse series name/number from book title in Inventaire connector
- Profile picture not displayed on Mastodon HOT 1
- Replace now-deprecated imghdr module HOT 1
- Move series membership from Edition to Work HOT 1
- hashtag "#books" links to 3rd party domain HOT 2
- pytest failure because it uses ALLOWED_HOSTS from development .env HOT 3
- Moving to a new machine HOT 2
- Deletion of Instance HOT 2
- Ignore filetype when checking for duplicate FileLink
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bookwyrm.