Code Monkey home page Code Monkey logo

Comments (24)

NiKiZe avatar NiKiZe commented on July 20, 2024

I have seen this as well (Attachment filename in a Swedish message), i guess the cause is that messages are downloaded and handled as strings before and in FromMIME822.

At the time i didn't have time to care about the name so I have not looked in to this any further. But i did browse some other branches and found 8e21408 maybe that would be a quick fix for this issue?

But I think the correct way would be to use byte arrays. The first step I wanted to do is to create a unit test that shows this issue.

from s22.imap.

smiley22 avatar smiley22 commented on July 20, 2024

Hello!

Can you post a sample mail message (headers included) so that I can re-produce the problem on my end?

Thanks,
smiley22

from s22.imap.

NiKiZe avatar NiKiZe commented on July 20, 2024

Example of raw data... (source from Thunderbird) give or take some.
I have made ImapClient.GetMessageData public to get "message source" and that results in the "ö" in Subject and "å" in Attachment name to be replaced with "?"

Going further ImapClient.GetData uses Encoding.ASCII.GetString, and this is where the data is lost?

Date: Wed, 27 Feb 2013 11:44:50 +0100
From: hidden 
Subject: förfallet 20130227
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="682469891-26617-1361961890=:2752"
--682469891-26617-1361961890=:2752
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
F=F6rfallit till betalning=
=0A=
=0A=
Med v=E4nlig h=E4lsning=0A=
--682469891-26617-1361961890=:2752
Content-Type: application/pdf;
    name="Påminnelse 1.pdf"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment

A dirty patch that "works for me" in this case (not tested with anything else) is:

diff --git a/ImapClient.cs b/ImapClient.cs
index d0ff5ac..572444b 100644
--- a/ImapClient.cs
+++ b/ImapClient.cs
@@ -535,7 +535,7 @@ public class ImapClient : IImapClient
                        byteCount = byteCount - read;
                    }
                }
-               return Encoding.ASCII.GetString(mem.ToArray());
+               return Encoding.GetEncoding("iso8859-1").GetString(mem.ToArray());
            }
        }

from s22.imap.

smiley22 avatar smiley22 commented on July 20, 2024

Thanks, will take a look at it.

edit: The subject line in your sample is not q-encoded though, so naturally the "ö" character gets lost. E-mails must not contain any non ASCII characters.

from s22.imap.

NiKiZe avatar NiKiZe commented on July 20, 2024

@smiley22 Correct, the data should be encoded, but is not.
I agree with you that it is not part of the standard and the one to blame is the sender... (but It works in Thunderbird and Outlook etc).
It is possible for us to make it work here as well, so we should.

This also means that the IMAP server supports unencoded data and then it it is the IMAP server that is at fault? (in my case it is a courier-imap server)

Dont know if this is a good idea or not, but maybe we could for now add

        /// <summary>Encoding used to get data from server</summary>
        public static Encoding DefaultEncoding = Encoding.ASCII;

in ImapClient.cs and change return Encoding.ASCII.GetString(mem.ToArray());
in GetData to return DefaultEncoding.GetString(mem.ToArray());

This way we could change this from the calling code if needed. (If i remember correctly Encoding.GetEncoding("iso8859-1") does not work with Mono)

from s22.imap.

NiKiZe avatar NiKiZe commented on July 20, 2024

Seems like 8 bit is actually supported in IMAP ? http://tools.ietf.org/html/rfc3501#section-4.3.1

from s22.imap.

smiley22 avatar smiley22 commented on July 20, 2024

No, mail headers only ever contain ASCII characters. Anything else wouldn't make much sense really, since there's no way of knowing which character set should be used for decoding.

If you try to decode the above subject line using a decoding of, say, Big-5, you will just get garbage instead of the expected data. That is why, among other reasons, the "encoded-word" and "base64" schemes were devised in the first place so that arbitrary 8-bit data can be transmitted over a 7-bit communication link.

Some IMAP servers support the notion of transmitting 8-bit data via the literal form, but only if a charset identifier has been negotiated. This does not apply to mail headers though.

from s22.imap.

NiKiZe avatar NiKiZe commented on July 20, 2024

Yes, different encodings can cause different problems (and i hate to deal with it as well). :(

Unfortunately we can't change or correct how different servers behave, and as such the fact remains that there is IMAP servers that does this, and there is really messages that do contain unencoded data in the headers (Specifically the Subject line in my case)

So yes data will be garbled in some other cases that we currently do not have examples of but we do have examples of this data where header and body is not in ASCII. (and the result is incorrect)

Thunderbird gives the option to change the encoding and that affects the whole message, as well as the subject. I recommend that this is implemented, one way to do that might be as described above with DefaultEncoding or whatever you want to call that variable/property.

from s22.imap.

smiley22 avatar smiley22 commented on July 20, 2024

I'm not really convinced this is what happens. I have never come across such mail messages and I would think the majority of MTA's don't relay such mail. This is from the Courier introduction

The Courier mail server will not accept mail with raw 8-bit characters in the headers,
because they are illegal. There are well-defined protocols that must be used to
encode 8-bit text in mail headers. Non-compliant messages may result in the Courier
mail server itself issuing corrupted delivery status notifications, or mishandling the
message in several other ways. Because of that corrupted mail will simply not be accepted.

Don't think this is the cause of hochleitner's missing umlauts.

from s22.imap.

NiKiZe avatar NiKiZe commented on July 20, 2024

Ok, Might be a different issue.
But just a quick question @smiley22 : Where would you suggest that i look for my problem with the message as shown above? That actually is what the source looks like (sanitized, but not otherwise), both with Thunderbird and what is returned when changing the encoding in GetData function.

Not running the courier server just the courier-imapd server, Postfix is delivering messages to maildir. Once again I agree and think that it is wrong of any server to acctually accept or for a client to send such a message, but it's still there on my server.

from s22.imap.

smiley22 avatar smiley22 commented on July 20, 2024

To be honest, I have no idea what's up with that message....there's a standards track extension which allows for UTF-8 in header fields, but I don't believe it's been widely adopted. I don't really know much about it, you can check it out here: http://tools.ietf.org/html/rfc6532

from s22.imap.

NiKiZe avatar NiKiZe commented on July 20, 2024

Thunderbird bug with 8bit message headers: https://bugzilla.mozilla.org/show_bug.cgi?id=90584

from s22.imap.

smiley22 avatar smiley22 commented on July 20, 2024

Ah okay, how do they deal with it? Not sure it's really worth the trouble tbh

from s22.imap.

NiKiZe avatar NiKiZe commented on July 20, 2024

Seems like they check the message body for encoding and using that. (if not found in the header itself, See Comment 18) at least that's what makes most sense to me.
To do this we probably need 2 "passes", first one to check for encoding in message, and then again decoding the message with this encoding. If the data was saved internally as a byte array FromMIME822 could be extended to handle this... For now this is clearly not worth it ;)

That's why i suggested: "Thunderbird gives the option to change the encoding and that affects the whole message, as well as the subject. I recommend that this is implemented, one way to do that might be as described above with DefaultEncoding or whatever you want to call that variable/property."

If one want to get message with different encoding, (like me wanting to default to iso8859-1 or win-1252, don't really know for sure yet) that encoding parameter could be changed.
And then further down the road maybe we could save the data as bytes and maybe add detection etc when the need appears?

from s22.imap.

hochleitner avatar hochleitner commented on July 20, 2024

Okay guys, I think you lost me here. If I understand correctly, the discussion drifted off from umlauts in the body being garbled to umlauts being allowed or not in message headers. I understand the latter is an issue due to email standards but umlauts in the message body should be allowed and therefore be displayed correctly in S22.Imap.
I had a look at the fix proposed by NiKiZe but this is already a few months old and Util.cs has changed since then. Not sure if it's still applicable.
Do you still need me to to supply a message example? I really didn't do much more than sending a message with containing umlauts (Ä, Ö, Ü, etc.) in various encodings (ISO-8859-1, UTF-8) and always got the "?" characters instead.

from s22.imap.

NiKiZe avatar NiKiZe commented on July 20, 2024

The commit i refereed to at first turned out not to have anything to do with this. Sorry about that.

But maybe you can test the changing ASCII to iso8859-1. see the proposed containing:

Encoding.GetEncoding("iso8859-1").GetString(mem.ToArray());

I believe the issue is with 8-bit chars sent unencoded in the message, this would fix this for us, but other people might get different issues.
Just to be sure what is the cause in your case, a example message would be good.

from s22.imap.

hochleitner avatar hochleitner commented on July 20, 2024

I'm really not doing much here. I have a game running that does periodic email checks in the background and allows users to send an email to a predefined address to spawn an enemy. All the message hast to contain is the name of the enemy, so it's usually one word. For testing purpose I used the word "Österreich" (but you can insert any word with umlauts).

Such a message would look like this (source from Thunderbird):

Message-ID: <[email protected]>
Date: Wed, 20 Mar 2013 11:23:15 +0100
From: XXX <[email protected]>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
To: [email protected]
Subject: Text
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

Österreich

In this case, the message is sent in UTF-8 but as reported before, the problem also occurs with ISO 8859-1 (haven't tested any other encodings so far...).

All I do is retrieve the message using
MailMessage message = imap.GetMessage(uid);
and then access the body via message.Body.

In my understanding, this should contain "Österreich" but it contains "??sterreich" (or "?sterreich" when using ISO 8859-1 due to the 1byte encoding).

As you can see, no magic or even really special things around, just non-ascii characters not interpreted wrong. I have read through the code but to be honest - I'm not familiar enough with it to see, where this charset problem is actually happening.

from s22.imap.

NiKiZe avatar NiKiZe commented on July 20, 2024

This is the same problem i have with filenames, and my headers. The diff i sugested above (change in ImapClient.cs Currently on Line 541) will make this work for you, but instead of iso-8859-1 you would want to change the encoding to UTF-8 for this message.

If you get this in different encodings, then this encoding needs to be changed for every message.

So even if the standard do not allow anything else then ASCII in messages, we still need to handle it since it's out in the wild :(

from s22.imap.

hochleitner avatar hochleitner commented on July 20, 2024

Okay, I understand. Problem in my case is, I don't really know which encoding will be used since everyone playing this game can send an email and people will most likely use different encodings. What I don't understand - the header information does contain the used encoding. This information is know prior to decoding the body. Why not simple take this encoding and decode the body message accordingly?

from s22.imap.

NiKiZe avatar NiKiZe commented on July 20, 2024

Not quite, currently the message is fetched from the server, and converted to string. After this the encoding is checked, and it works correct if the message is transfered in non 8-bit.

To get this working correctly the internal representation needs to be changed to binary (bytes). because the conversation might be needed several times.
First to get headers and encoding (ASCII) and possibly again to decode the byte array with the correct/found encoding if it is 8-bit (it should probably be done always because messages without correct headers).

from s22.imap.

hochleitner avatar hochleitner commented on July 20, 2024

Okay, now I have more of an impression what's going on here. Your suggested fix doesn't work for me though. Neither using
return Encoding.GetEncoding("iso8859-1").GetString(mem.ToArray());
nor
return Encoding.UTF8.GetString(mem.ToArray());
seem to work. Same textual result. I guess I'll just have to live without umlauts for some time. Although I'm pretty sure this worked in an earlier version...

from s22.imap.

smiley22 avatar smiley22 commented on July 20, 2024

Hochleitner,
it really depends on the format of the mail message. The problem with the mail message you posted is the Content-Transfer-Encoding of 8bit. I guess if you tried retrieving another mail from your IMAP server which does not have a content-transfer-encoding of 8bit (but rather quoted-printable/base64), you will then find umlauts being displayed correctly.

I have started working on a solution for this. It'll take a couple of days, though, because I want to make sure that it doesn't introduce any new problems and doesn't break anything else...

from s22.imap.

NiKiZe avatar NiKiZe commented on July 20, 2024

Looks like issue #79 is another version of the same problem.

from s22.imap.

phaq01 avatar phaq01 commented on July 20, 2024

There is a fork of S22.Imap which fixes the issue:
Install-Package S22.ImapWithUTF8

from s22.imap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.