Code Monkey home page Code Monkey logo

imapdedup's People

Contributors

adamhorner avatar agners avatar b-luc avatar bjorn avatar cameronmurdoch avatar catap avatar devurandom avatar grawity avatar grb43 avatar iveqy avatar jengels avatar lingfish avatar lpirl avatar melefabrizio avatar mhagger avatar quentinsf avatar vincentbernat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

imapdedup's Issues

Too many values to unpack

I got this error, any clue ?

21200 message(s) in Sent processed Traceback (most recent call last): File "./imapdedup.py", line 536, in <module> main(sys.argv[1:]) File "./imapdedup.py", line 532, in main process(options, mboxes) File "./imapdedup.py", line 440, in process for mnum, hinfo in get_msg_headers(server, msgnums[i: i + chunkSize]): File "./imapdedup.py", line 316, in get_msg_headers _, hinfo = ms[ci * 2] ValueError: too many values to unpack (expected 2)

xrange not defined

Trying it on icloud imap:

There are 1380 messages in Archive.
No message(s) currently marked as deleted in Archive
1380 others in Archive
Reading the others... (in batches of 100)
Traceback (most recent call last):
File "/usr/bin/imapdedup", line 271, in
main()
File "/usr/bin/imapdedup", line 200, in main
for i in xrange(0, len(msgnums), chunkSize):
NameError: name 'xrange' is not defined

request

How about making it recursively scan all folders in the mailbox looking for dupes?

Handling of names with umlauts

Hi, the script throws an exception if I try to use a directory with umlauts, like: Entw&APw-rfe or
Gel&APY-schte Elemente.

Add option to use an admin user account for authentication

I had to find a way to connect to a Zimbra server without knowing the user's password. Fortunately Zimbra allows you to login as one user (typically an "admin" user) and then impersonate the user that you really want. It is done via the "AUTHENTICATE PLAIN" command (which must be enabled in Zimbra).
Below is the "diff" to your original code - actually pretty straight forward. I added another option "-a|--authuser" and a simple code snippet to use "authenticate" instead of "login" if it is set. It works for me :)

--- imapdedup.py 2018-04-24 11:21:58.415659097 -0400
+++ imapdedup.bluc.py 2018-04-24 22:16:16.769606461 -0400
@@ -60,6 +60,7 @@
parser.add_option("-s", "--server",dest='server',help='IMAP server')
parser.add_option("-p", "--port", dest='port', help='IMAP server port', type='int')
parser.add_option("-x", "--ssl", dest='ssl', action="store_true", help='Use SSL')

  • parser.add_option("-a", "--authuser", dest='authuser', help='IMAP admin user (e.g. for Zimbra)')
    parser.add_option("-u", "--user", dest='user', help='IMAP user name')
    parser.add_option("-w", "--password", dest='password', help='IMAP password (Will prompt if not specified)')
    parser.add_option("-v", "--verbose", dest="verbose", action="store_true", help="Verbose mode")
    @@ -198,6 +199,10 @@

    try:
    if not options.process:

  •        if options.authuser:
    
  •            authcb = lambda resp: "{0}\x00{1}\x00{2}".format(options.user,options.authuser,options.password)
    
  •            server.authenticate("PLAIN", authcb)
    
  •        else:
           server.login(options.user, options.password)
    

    except:
    sys.stderr.write("\nError: Login failed\n")

Criteria for choosing the "one true" message?

Looking at the code, it appears that in given group of "matching" messages, we keep the first one found and discard the others.

For my usecase, I would prefer to keep the one that is the largest, because I have duplicates where attachments etc have been stripped.

How hard would that be?

Thanks!

expunge messages?

marking dupes is great, but actually expunging them would be also great. Any hints how to also get rid of messages to free space?

Message could be more helpful

Message Inbox_13 is a duplicate of Inbox_11 and would be marked as deleted

Would it be possible to print the subject of those two messages please?

Folders with a large number of messages cause an error

Hi,

IMAPdedup is a great tool! Thank you for your awesome work.

Folders with a large number of messages cause an error when run on my Synology 920+ (20GB RAM). Please see below.

I hope, that you can debug this. Thank you!

Tom.

=== CLI Output ===

There are 246410 messages in X_Folder.
No message(s) currently marked as deleted in X_Folder
Traceback (most recent call last):
File "/var/packages/py3k/target/usr/local/lib/python3.8/imaplib.py", line 1022, in _command_complete
typ, data = self._get_tagged_response(tag, expect_bye=logout)
File "/var/packages/py3k/target/usr/local/lib/python3.8/imaplib.py", line 1148, in _get_tagged_response
self._get_response()
File "/var/packages/py3k/target/usr/local/lib/python3.8/imaplib.py", line 1050, in _get_response
resp = self._get_line()
File "/var/packages/py3k/target/usr/local/lib/python3.8/imaplib.py", line 1158, in _get_line
line = self.readline()
File "/var/packages/py3k/target/usr/local/lib/python3.8/imaplib.py", line 316, in readline
raise self.error("got more than %d bytes" % _MAXLINE)
imaplib.error: got more than 1000000 bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/volume1/IMAPdedup-master/imapdedup-20210504.py", line 477, in process
msgnums = get_undeleted_msgnums(server, options.sent_before)
File "/volume1/IMAPdedup-master/imapdedup-20210504.py", line 329, in get_undeleted_msgnums
return get_matching_msgnums(server, "UNDELETED", sent_before)
File "/volume1/IMAPdedup-master/imapdedup-20210504.py", line 312, in get_matching_msgnums
deleted_info = check_response(server.search(None, query))
File "/var/packages/py3k/target/usr/local/lib/python3.8/imaplib.py", line 725, in search
typ, dat = self._simple_command(name, *criteria)
File "/var/packages/py3k/target/usr/local/lib/python3.8/imaplib.py", line 1205, in _simple_command
return self._command_complete(name, self._command(name, *args))
File "/var/packages/py3k/target/usr/local/lib/python3.8/imaplib.py", line 1026, in _command_complete
raise self.error('command: %s => %s' % (name, val))
imaplib.error: command: SEARCH => got more than 1000000 bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/var/packages/py3k/target/usr/local/lib/python3.8/imaplib.py", line 1022, in _command_complete
typ, data = self._get_tagged_response(tag, expect_bye=logout)
File "/var/packages/py3k/target/usr/local/lib/python3.8/imaplib.py", line 1148, in _get_tagged_response
self._get_response()
File "/var/packages/py3k/target/usr/local/lib/python3.8/imaplib.py", line 1079, in _get_response
raise self.abort("unexpected response: %r" % resp)
imaplib.abort: unexpected response: b'58729 158730 158731 158732 158733 158734 158735 158736 158737 158738 158739 158740 158741 158742 158743 158744 158745 158746 158747 158748 158749 158750 158751 158752 158753 158754 158755 158756 158757 158758 158759 158760 158761 158762 158763 158764 158765 158766 158767 158768 158769 158770 158771 158772 158773 158774 158775 158776 158777 158778 158779 158780 158781 158782 158783 158784 158785 158786 158787 158788 158789 158790 158791 158792 158793 158794 158795 158796 158797 158798 158799 158800 158801 158802 158803 158804 158805 158806 158807 158808 158809 158810 158811 158812 158813 158814 158815 158816 158817 158818 158819 158820 158821 158822 158823 158824 158825 158826 158827 158828 158829 158830 158831 158832 158833 158834 158835 158836 158837 158838 158839 158840 158841 158842 158843 158844 158845 158846 158847 158848 158849 158850 158851 158852 158853 158854 158855 158856 158857 158858 158859 158860 158861 158862 158863 158864 158865 158866 158867 158868 158869 158870 158871 158872 158873 158874 158875 158876 158877 158878 158879 158880 158881 158882 158883 158884 158885 158886 158887 158888 158889 158890 158891 158892 158893 158894 158895 158896 158897 158898 158899 158900 158901 158902 158903 158904 158905 158906 158907 158908 158909 158910 158911 158912 158913 158914 158915 158916 158917 158918 158919 158920 158921 158922 158923 158924 158925 158926 158927 158928 158929 158930 158931 158932 158933 158934 158935 158936 158937 158938 158939 158940 158941 158942 158943 158944 158945 158946 158947 158948 158949 158950 158951 158952 158953 158954 158955 158956 158957 158958 158959 158960 158961 158962 158963 158964 158965 158966 158967 158968 158969 158970 158971 158972 158973 158974 158975 158976 158977 158978 158979 158980 158981 158982 158983 158984 158985 158986 158987 158988 158989 158990 158991 158992 158993 158994 158995 158996 158997 158998 158999 159000 159001 159002 159003 159004 159005 159006 159007 159008 159009 159010 159011 159012 159013 159014 159015 159016 159017 159018 159019 159020 159021 159022 159023 159024 159025 159026 159027 159028 159029 159030 159031 159032 159033 159034 159035 159036 159037 159038 159039 159040 159041 159042 159043 159044 159045 159046 159047 159048 159049 159050 159051 159052 159053 159054 159055 159056 159057 159058 159059 159060 159061 159062 159063 159064 159065 159066 159067 159068 159069 159070 159071 159072 159073 159074 159075 159076 159077 159078 159079 159080 159081 159082 159083 159084 159085 159086 159087 159088 159089 159090 159091 159092 159093 159094 159095 159096 159097 159098 159099 159100 159101 159102 159103 159104 159105 159106 159107 159108 159109 159110 159111 159112 159113 159114 159115 159116 159117 159118 159119 159120 159121 159122 159123 159124 159125 159126 159127 159128 159129 159130 159131 159132 159133 159134 159135 159136 159137 159138 159139 159140 159141 159142 159143 159144 159145 159146 159147 159148 159149 159150 159151 159152 159153 159154 159155 159156 159157 ..

expected an indented block

Hi,

Any idea what the error below could be?

./imapdup.py -h
File "./imapdup.py", line 310
main(sys.argv[1:])
^
IndentationError: expected an indented block

Config File

If I use the config file and specific multiple folders is each folder processed individually or or does the script compare between folders?

I want to be able to run this through our public folder set, but only want it to remove duplicate emails within the same folder not duplicate emails across folders.

Error using -P with wrapper: /bin/sh: 1: local: not in a function

I'm not sure how the wrapper script for -P is supposed to work. I am using this on Mail-in-a-Box (Dovecot) with a script called 'local':

#!/bin/bash /usr/lib/dovecot/imap -o mail_location=maildir:/home/user-data/mail/mailboxes/domain.com/test

Error I get is:
# ./imapdedup.py -s localhost -u [email protected] -x -v INBOX -P local /bin/sh: 1: local: not in a function Traceback (most recent call last): File "./imapdedup.py", line 324, in <module> main(sys.argv[1:]) File "./imapdedup.py", line 321, in main process(options, mboxes) File "./imapdedup.py", line 182, in process server = serverclass(options.process) File "/usr/lib/python2.7/imaplib.py", line 1240, in __init__ IMAP4.__init__(self) File "/usr/lib/python2.7/imaplib.py", line 193, in __init__ self.welcome = self._get_response() File "/usr/lib/python2.7/imaplib.py", line 928, in _get_response resp = self._get_line() File "/usr/lib/python2.7/imaplib.py", line 1028, in _get_line raise self.abort('socket error: EOF') imaplib.abort: socket error: EOF

Syntax error on line 287 ??

When I try to run, I get

  File "./imapdedup.py", line 287
    resp: List[Tuple[int, bytes]] = []
        ^
SyntaxError: invalid syntax

I know nothing about python, I'm lost to correct the problem...

Add option to exclude `Date:` from `--checksum` mode

I have an account which receives duplicate messages (with differing Message-IDs) a few minutes apart from various mailing-lists. It would be helpful to have an option to exclude the Date: header from checksum calculations to account for these situations.

(Processing a 416-message mailbox found no duplicates with Date: checks enabled, but 106 duplicates with the line commented out!)

Adding OAUTH 2.0 support?

I would very much like to use IMAPdedup, but my mailboxen are on Microsoft Exchange Servers, and these no longer support "basic" authentication. They need OAUTH 2.0 support. Any chance of getting that added?

Adapt to mboxes?

Is it possible to use this on mboxes?

I have exported several mail service mboxes that I want to merge and dedup, and start a new email life from scratch. A boy can dream, can't he?

IndexError: list index out of range

After processing 26200 out of 42171 emails

Traceback (most recent call last):
  File "imapdedup.py", line 536, in <module>
    main(sys.argv[1:])
  File "imapdedup.py", line 532, in main
    process(options, mboxes)
  File "imapdedup.py", line 440, in process
    for mnum, hinfo in get_msg_headers(server, msgnums[i: i + chunkSize]):
  File "imapdedup.py", line 315, in get_msg_headers
    mnum = int(msg_ids[ci])
IndexError: list index out of range

Python 2 shebang required

The current shebang

#! /usr/bin/env python

will fail if the default version of Python is 3, as it is on my system. The shebang should explicitly specify the required Python version instead:

#! /usr/bin/env python2

In the long run it would of course be preferable to also support Python 3.

Pyton version changing results?

Hi.

Same options - Same mailbox
Different user - Different Ubuntu version

...
If you had NOT selected the 'dry-run' option,
  6 messages would now be marked as deleted.
geohei@altar:~/imapdedup$ python3 -V
Python 3.6.9
...
If you had NOT selected the 'dry-run' option,
  18 messages would now be marked as deleted.
root@vm96:~/imapdedup# python3 -V
Python 3.8.10

Does the python3 version change the results?

Thank You!

Just wanted to say thank you!

This wonderful tool saved my day when everything else failed!

Attempting to process the inbox.

I can perform a dry run but then I change the -n option to -P I get the following:

imapdedup.py -s -u -x -P INBOX
/bin/sh: INBOX: command not found
Traceback (most recent call last):
File "/Users/ccoles/Desktop/IMAPdedup-master/imapdedup.py", line 313, in
main(sys.argv[1:])
File "/Users/ccoles/Desktop/IMAPdedup-master/imapdedup.py", line 310, in main
process(options, mboxes)
File "/Users/ccoles/Desktop/IMAPdedup-master/imapdedup.py", line 171, in process
server = serverclass(options.process)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 1241, in init
IMAP4.init(self)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 194, in init
self.welcome = self._get_response()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 929, in _get_response
resp = self._get_line()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 1029, in _get_line
raise self.abort('socket error: EOF')
imaplib.abort: socket error: EOF

imaplib.error got more than 1000 bytes

Doing a dry run:

Password:
There are 8801 messages in INBOX/Archiv.
No message(s) currently marked as deleted in INBOX/Archiv
Traceback (most recent call last):
File "/usr/lib/python3.4/imaplib.py", line 957, in _command_complete
typ, data = self._get_tagged_response(tag)
File "/usr/lib/python3.4/imaplib.py", line 1077, in _get_tagged_response
self._get_response()
File "/usr/lib/python3.4/imaplib.py", line 985, in _get_response
resp = self._get_line()
File "/usr/lib/python3.4/imaplib.py", line 1087, in _get_line
line = self.readline()
File "/usr/lib/python3.4/imaplib.py", line 270, in readline
raise self.error("got more than %d bytes" % _MAXLINE)
imaplib.error: got more than 10000 bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/bin/imapdedup", line 271, in
main()
File "/usr/bin/imapdedup", line 194, in main
msgnums = check_response(server.search(None, 'UNDELETED'))[0].split()
File "/usr/lib/python3.4/imaplib.py", line 660, in search
typ, dat = self._simple_command(name, *criteria)
File "/usr/lib/python3.4/imaplib.py", line 1134, in _simple_command
return self._command_complete(name, self._command(name, *args))
File "/usr/lib/python3.4/imaplib.py", line 961, in _command_complete
raise self.error('command: %s => %s' % (name, val))
imaplib.error: command: SEARCH => got more than 10000 bytes

Thank you!

This tool saved me 892 GB of space due to a sieve script going haywire and duplicating every mail including attachments. Is there any way to buy you a few coffees or a pizza or similar as thanks?

Small change to show progress ...

Specifying the "-v" options make the script chatty - but not doing so make you wonder whether it actually does anything, esp. with large mailboxes. So I wrote a small patch that spit out a message for every 100 emails that were processed.
Not knowing how to upload a patch, here is the "diff" file:

--- imapdedup.py.ORIG 2013-07-20 20:30:05.618822083 -0400
+++ imapdedup.py 2013-07-20 20:22:01.310841292 -0400
@@ -162,6 +162,9 @@
mp = p.parsestr(m[0][1])
if options.verbose:
print "Checking message", mbox, mnum

  •            else:
    
  •                if ((int(mnum) % 100) == 1):
    
  •                     print mnum, "message(s) in", mbox, "processed"
             msg_id = get_message_id(mp, options.use_checksum)
             msg_map[mnum] = mp
             if msg_id:
    
             if msg_id:
    

Issue with imapdedup

Dear Sir,
We migrate lotus to zimbra, in migration we migrate some mailboxes 2 3 times because of network failure our large accounts stuck then we re-run migration wizard due to this we found duplicate emails in our zimbra mailboxes, We are using imapdedup but it fails to remove emails from those folders which contain spaces in their name, kindly give any solution if you have, we are very thankful to u.

ERROR is here:
Error: Got response: ['SELECT failed'] from server
Error: Got response: ['SELECT failed'] from server
Error: Got response: ['SELECT failed'] from server
Traceback (most recent call last):
File "imapdedup.py", line 310, in
main(sys.argv[1:])
File "imapdedup.py", line 307, in main
process(options, mboxes)
File "imapdedup.py", line 215, in process
msgs = check_response(server.select(mbox, options.dry_run))[0]
File "/usr/lib64/python2.7/imaplib.py", line 661, in select
typ, dat = self._simple_command(name, mailbox)
File "/usr/lib64/python2.7/imaplib.py", line 1082, in _simple_command
return self._command_complete(name, self._command(name, *args))
File "/usr/lib64/python2.7/imaplib.py", line 917, in _command_complete
raise self.error('%s command error: %s %s' % (name, typ, data))
imaplib.error: SELECT command error: BAD ['parse error: zero-length content']

Using sed to quote each line for processing with xargs

If you are using an operating system with xargs you can use sed to automatically quote each line of your input by piping it through sed. For example:

cat folders.txt | sed 's/^\(.*\)$/"\1"/' | xargs ./imapdedup.py -s imap.server -w supersecretpasword -u imapusername -x -v -S

I thought that you might like to add this to examples in your readme.md

Thanks for writing this, it is a godsend :)

Foldername with Blank an square bracket

Howto use a Foldername like [Gmail]/Alle Nachrichten ?
I tried escaping the brackets with \ but it's not working, even entering the whole Folderpath in single quotes does not work.

mailbox list with / delimiter gives error on recursive processing

I tryed to deduplicate mails on my mailbox (dont know the kind of server)

I got a list of mailboxes just fine
INBOX
INBOX/subfolder
INBOX/subfolder/subfolder and so on

with the option -r it wont process:

"Traceback (most recent call last):
File "./imapdedup.py", line 535, in
main(sys.argv[1:])
File "./imapdedup.py", line 531, in main
process(options, mboxes)
File "./imapdedup.py", line 396, in process
for mb in get_mailbox_list(server, parent, pattern):
File "./imapdedup.py", line 257, in get_mailbox_list
bits = parse_list_response(mb)
File "./imapdedup.py", line 165, in parse_list_response
m = list_response_pattern.match(line)
TypeError: expected string or bytes-like object"

-> Probably the delimiter of folders is the problem?! . vs / ?

Passwords with spaces (or other special characters) don't work?

Hi, I have an account that uses a multi word password with spaces and a % character. Login always fails when trying to use IMAPdedup. I have tried adding the --password="my password" flag, tried escaping the %, and tried entering when being prompted, but all to no avail. Any suggestions?

AttributeError: 'NoneType' object has no attribute 'split'

I am getting the following when I try to run this script against a SmarterMail mail server:

There are 7850 messages in INBOX/Old Mail - Keep.
Traceback (most recent call last):
File "./imapdedup.py", line 313, in
main(sys.argv[1:])
File "./imapdedup.py", line 310, in main
process(options, mboxes)
File "./imapdedup.py", line 222, in process
deleted = check_response(server.search(None, 'DELETED'))[0].split()
AttributeError: 'NoneType' object has no attribute 'split'

Fails on large message count

It worked for all the folders. Then I did the dry-run for the INBOX folder and it found 113000 duplicates. When i remove the -n option it fails. If I try the dry-run again now it also fails.

$ ./imapdedup.py -s mail.server.com -u [email protected] -x l
Password:
Spam
Drafts
Deleted Items
Sent
INBOX
$ ./imapdedup.py -s mail.server.com -u [email protected] -x INBOX
Password: 
There are 170714 messages in INBOX.
No message(s) currently marked as deleted in INBOX
170714 others in INBOX
Traceback (most recent call last):
  File "./imapdedup.py", line 324, in <module>
    main(sys.argv[1:])
  File "./imapdedup.py", line 321, in main
    process(options, mboxes)
  File "./imapdedup.py", line 248, in process
    ms = check_response(server.fetch(message_ids, '(RFC822.HEADER)'))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 456, in fetch
    typ, dat = self._simple_command(name, message_set, message_parts)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 1088, in _simple_command
    return self._command_complete(name, self._command(name, *args))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 912, in _command_complete
    raise self.abort('command: %s => %s' % (name, val))
imaplib.abort: command: FETCH => socket error: EOF

Python3 compatibility

Hi,
When running imapdedup with python3 (3.7.5), I get:

Traceback (most recent call last):
File "/home/iranzo/.bin/imapdedup.py", line 324, in
main(sys.argv[1:])
File "/home/iranzo/.bin/imapdedup.py", line 321, in main
process(options, mboxes)
File "/home/iranzo/.bin/imapdedup.py", line 246, in process
message_ids = ','.join(msgnums_in_chunk)
TypeError: sequence item 0: expected str instance, bytes found

Same binary executed with python2 works fine, seems UTF conversion that gives bytes instead of regular str

TypeError: cannot concatenate 'str' and 'NoneType' objects

Hello there,

I get this error while running the script.

Traceback (most recent call last): File "./imapdedup.py", line 270, in main() File "./imapdedup.py", line 215, in main msg_id = get_message_id(mp, options.use_checksum, options.use_id_in_checksum) File "./imapdedup.py", line 99, in get_message_id md5.update(("Subject:" + parsed_message['Subject']).encode('utf-8')) TypeError: cannot concatenate 'str' and 'NoneType' objects

ValueError: invalid literal for int() with base 16

I am using the -c option on a mailbox and am getting this stack-trace. It look like it's there's a malformed email, with a word where the code expects a hex string:

  File "./imapdedup.py", line 313, in <module>
    main(sys.argv[1:])
  File "./imapdedup.py", line 310, in main
    process(options, mboxes)
  File "./imapdedup.py", line 249, in process
    msg_id = get_message_id(mp, options.use_checksum, options.use_id_in_checksum)
  File "./imapdedup.py", line 132, in get_message_id
    md5.update("Subject:" + utf8_header(parsed_message,'Subject'))
  File "./imapdedup.py", line 105, in utf8_header
    text, encoding = decode_header(parsed_message.get(name,''))[0]
  File "/usr/lib64/python2.6/email/header.py", line 93, in decode_header
    dec = email.quoprimime.header_decode(encoded)
  File "/usr/lib64/python2.6/email/quoprimime.py", line 336, in header_decode
    return re.sub(r'=\w{2}', _unquote_match, s)
  File "/usr/lib64/python2.6/re.py", line 151, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "/usr/lib64/python2.6/email/quoprimime.py", line 324, in _unquote_match
    return unquote(s)
  File "/usr/lib64/python2.6/email/quoprimime.py", line 106, in unquote
    return chr(int(s[1:3], 16))
ValueError: invalid literal for int() with base 16: 'Th'

Mailbox doesn't exist

./imapdedup.py -s ..** -u *** -w ******* -x -l
Drafts
Sent
Sent.2011
Sent.2014
Sent.2012
Sent.2013
Sent.2016
Sent.2015
Sent.2017
Junk
Trash
INBOX.2011
INBOX.2012
INBOX.2013
INBOX.2014
INBOX.2015
INBOX.2016
Archiv
INBOX

./imapdedup.py -s ..** -u *** -w ****** -x -n INBOX
Error: Got response: ["Mailbox doesn't exist: m (0.000 + 0.000 secs)."] from server

./imapdedup.py -s ..** -u *** -w ****** -x -n "INBOX"
Error: Got response: ["Mailbox doesn't exist: m (0.000 + 0.000 secs)."] from server

./imapdedup.py -s ..** -u *** -w ****** -x -n 'INBOX'
Error: Got response: ["Mailbox doesn't exist: m (0.000 + 0.000 secs)."] from server

Problems with timezone names including non-ASCII characters

Originally raised in #19, @nigelhorne had a message with the date field:

Date: Tue, 27 May 2003 07:53:35 +0200 (Paris, Madrid (heure d'été))

This causes:

File "./imapdedup.py", line 108, in get_message_id
md5.update(("Date:" + str(parsed_message['Date'])).encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 61: ordinal not in range(128)

pull password from env?

Hi. I wanted to suggest a slight change -- grabbing the password off the environment.

This avoids constantly having to type it into the commandline or writing a wrapper script...

    + import os
    ...

     if not options.password and not options.process:
    -    options.password = getpass.getpass()
    +    options.password = os.getenv('IMAPdedup_pass') or getpass.getpass()

so then you just invoke it as:

export IMAPdedup_pass=sekrit
./imapdedup.py -s foo -u bar -x "INBOX"
./imapdedup.py -s foo -u bar -x "INBOX.2"
./imapdedup.py -s foo -u bar -x "INBOX.3"

and you're not prompted for the password.

Mailbox doesn't exist

I get this exception when I try to execute on any mailbox where the name has embedded spaces:

Traceback (most recent call last):
File "./imapdedup.py", line 218, in process
msgs = check_response(server.select(mbox, options.dry_run))[0]
File "./imapdedup.py", line 50, in check_response
raise ImapDedupException("Got response: %s from server" % value)
main.ImapDedupException: Got response: [b"Mailbox doesn't exist: Apple"] from server

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./imapdedup.py", line 313, in
main(sys.argv[1:])
File "./imapdedup.py", line 310, in main
process(options, mboxes)
File "./imapdedup.py", line 304, in process
print >> sys.stderr, "Error:", e
TypeError: unsupported operand type(s) for >>: 'builtin_function_or_method' and '_io.TextIOWrapper'

ssl issue?

Hi,

Running imapdedup with -x yields the following

  File "/home/thirs/systeme/IMAPdedup/imapdedup.py", line 603, in <module>
    process(options, mboxes)
  File "/home/thirs/systeme/IMAPdedup/imapdedup.py", line 415, in process
    server.starttls()
  File "/usr/lib/python3.10/imaplib.py", line 822, in starttls
    self.sock = ssl_context.wrap_socket(self.sock,
  File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.10/ssl.py", line 1071, in _create
    self.do_handshake()
  File "/usr/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: UNEXPECTED_MESSAGE] unexpected message (_ssl.c:997)

Any hints?

TypeError: cannot concatenate 'str' and 'NoneType' objects

-c -m causes:

Traceback (most recent call last):
File "/usr/bin/imapdedup", line 271, in
main()
File "/usr/bin/imapdedup", line 216, in main
msg_id = get_message_id(mp, options.use_checksum, options.use_id_in_checksum)
File "/usr/bin/imapdedup", line 100, in get_message_id
md5.update(("Subject:" + parsed_message['Subject']).encode('utf-8'))
TypeError: cannot concatenate 'str' and 'NoneType' objects

'utf7' codec can't decode byte

Hi,
running ./imapdedup.py -s localhost -u user -l gives:

Traceback (most recent call last):
File "./imapdedup.py", line 271, in
main()
File "./imapdedup.py", line 164, in main
mb = mb.decode('utf-7')
File "/usr/lib64/python2.6/encodings/utf_7.py", line 12, in decode
return codecs.utf_7_decode(input, errors, True)
UnicodeDecodeError: 'utf7' codec can't decode byte 0x5c in position 1: unexpected special character

Not an issue at all!!

Just I wanted to say thank you to all the team and specially to Quentin,

I had more than 5000 duplicates in years and years of moving imap folders between servers and today you saved my life!

Sorry about filling an issue but I didn't find your email.

Cheers.

Tono.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 11: ordinal not in range(128)

Traceback (most recent call last):
File "/usr/bin/imapdedup", line 271, in
main()
File "/usr/bin/imapdedup", line 216, in main
msg_id = get_message_id(mp, options.use_checksum, options.use_id_in_checksum)
File "/usr/bin/imapdedup", line 100, in get_message_id
md5.update(("Subject:" + parsed_message['Subject']).encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 11: ordinal not in range(128)

Getting syntax error on first run

Hi,

Trying the script for the first time I'm getting the following error

File "C:\temp\dupecheck\imapdedup.py", line 67
<title>IMAPdedup/imapdedup.py at master · quentinsf/IMAPdedup · GitHub</title>
^
SyntaxError: invalid character '·' (U+00B7)

I've downloaded the latest python.

Really looking forward putting this great utility to use!

Thank you!

email.errors.HeaderParseError

Hey just got that error when running with the checksum option.

There are 72414 messages in Archive.
No message(s) currently marked as deleted in Archive
72414 others in Archive
100 message(s) in Archive processed
200 message(s) in Archive processed
300 message(s) in Archive processed
400 message(s) in Archive processed
Traceback (most recent call last):
  File "./imapdedup.py", line 313, in <module>
    main(sys.argv[1:])
  File "./imapdedup.py", line 310, in main
    process(options, mboxes)
  File "./imapdedup.py", line 249, in process
    msg_id = get_message_id(mp, options.use_checksum, options.use_id_in_checksum)
  File "./imapdedup.py", line 132, in get_message_id
    md5.update("Subject:" + utf8_header(parsed_message,'Subject'))
  File "./imapdedup.py", line 105, in utf8_header
    text, encoding = decode_header(parsed_message.get(name,''))[0]
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/email/header.py", line 108, in decode_header
    raise HeaderParseError
email.errors.HeaderParseError

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.