Code Monkey home page Code Monkey logo

Comments (11)

tonioo avatar tonioo commented on June 12, 2024 1

@carragom So, BinaryField is definitely not the answer because we can't filter on it... I finally added a new setting to configure the encoding to use (default value is now LATIN1) and I removed some useless convert_from calls.

from modoboa-amavis.

tonioo avatar tonioo commented on June 12, 2024

Hi,

as for the Quarantine.mail_text field, we could try to use a BinaryField and to remove all calls to convert_from.

Do you think you could try it ?

from modoboa-amavis.

tonioo avatar tonioo commented on June 12, 2024

@carragom ping

from modoboa-amavis.

tonioo avatar tonioo commented on June 12, 2024

From @carragom on December 18, 2015 19:39

Hi @tonioo sorry for the absence. At least in version 1.2.2 Quarantine.mail_text already is a BinaryField as shown here or am I looking at the wrong place ?. Maybe I did not understand what you meant ?

from modoboa-amavis.

tonioo avatar tonioo commented on June 12, 2024

Hi @carragom, your problem seems to be related to the email field:
https://github.com/modoboa/modoboa-amavis/blob/master/modoboa_amavis/models.py#L20

from modoboa-amavis.

tonioo avatar tonioo commented on June 12, 2024

From @carragom on December 18, 2015 22:25

@tonioo Yes that's the field causing the problems. Like I said, I see two options to fix this:

1- Keep the conversion in the database as it's now, but ask the database to convert to LATIN1 instead of UTF8 in all occurrences of the convert_from function here. I did this and it's working for me. I'm not sure if this is the right encoding but for sure it's better than using UTF8 that we already know it breaks.

2- Switch to using BinaryField for the Maddr.email and handle the conversion in python. Just keep in mind that the number of rows in the Maddr table grows fast, when I reported this a month ago the table was around 7K rows, right now it's sitting at 12K rows. So doing this conversion out of the database could be a performance issue.

I have been looking around here to see if there is any indication on what encoding it's actually used without luck. But it does mention that the option $sql_allow_8bit_address needs to be set to use this field as bytea. So LATIN1 sounds like a safe encoding to use.

If you ask me I would just go option number 1 which is simple to implement and have no performance issues.

I can provide a quick PR for option 1 if you decide to go with it.
Let me know.

from modoboa-amavis.

tonioo avatar tonioo commented on June 12, 2024

@carragom Using LATIN1 as encoding won't cover all cases. I guess we will encounter the same issues with another encoding soon. I still think a BinaryField is the right answer and I do hope Django uses the appropriate field when it generates queries. The manual conversion you see in the current code would also disappear.

from modoboa-amavis.

tonioo avatar tonioo commented on June 12, 2024

@carragom BTW, the right place for this issue is into the https://github.com/modoboa/modoboa-amavis repository.

from modoboa-amavis.

tonioo avatar tonioo commented on June 12, 2024

From @carragom on January 27, 2016 23:40

@tonioo I agree that LATIN1 does not cover all cases and it's far from ideal. The one thing for sure is that this problem renders Modoboa unusable, all of it, not just the amavis module. So this should be fixed in any way necessary.

It might be possible that amavis does not intent for these fields to be used as text, from the amavis README.sql-pg.txt:

Upgrade note: field quarantine.mail_text should be of data type 'bytea'
and not 'text' as suggested in earlier documentation; this is to prevent
it from being unjustifiably associated with a character set
, and to be
able to store any byte value; to convert existing field from type 'text'
to type 'bytea' the following clause may be used:
ALTER TABLE quarantine ALTER mail_text TYPE bytea
USING decode(replace(mail_text,'','\'),'escape');

Thanks a lot for your time.

from modoboa-amavis.

tonioo avatar tonioo commented on June 12, 2024

Please look at this thread (the end of the page is interesting):
https://code.djangoproject.com/ticket/2417
And this commit (django source code):
django/django@8ee1edd

And tell me what do you think :)

from modoboa-amavis.

tonioo avatar tonioo commented on June 12, 2024

From @carragom on January 29, 2016 19:2

Yes using a BinaryField is definitively an option see here. But switching to BinaryField alone is not enough. Every custom query using convert_from needs to be replaced with something that fetches the data from the table and filter's it on the python side. This means probably rewriting this entire class.

In any case, it does not matter what type of field is used or where the conversion happens (db or app) at some point those bytes on the database will have to be converted to text in order to be useful and the conversion will require a character set. UTF8 is not the right charset for that data and currently breaks the entire application. The main objective here is to find a way where the application does not break even if the conversion fails.

Again I see two options:

1- Find a way to handle the conversion gracefully at the database level (maybe a stored procedure would help here or just use LATIN1 as charset which is working for me and seems to be what amavis is using).
2- Use a BinaryField and move the entire logic of converting/filtering the data to the web app which is inefficient and a lot of work and will still break if we keep trying to use UTF8 as charset.

Again thanks for your time, I hope I was a bit more clear this time.
Cheers.

from modoboa-amavis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.