Comments (11)
Hi, thanks for describing your problem in great detail. I don't see the problem immediately, but we may find out something more.
My feeling says that one of two things is probably happening: 1) It's downloading every message just once, but the file size on your local machine comes out much larger than on the server. Or: 2) It's downloading every message multiple times for an unknown reason.
How about the number of messages? Is it also 4 times the number on the server, like with the file size? From the logs, do you see any reason why getmail would download the same thing multiple times?
To make testing easier, it might be useful to create some simpler "test circumstances", where you don't test your setup with 30GB of messages, but only 5 small messages, to see what's happening more easily.
Good luck :)
from getmail6.
Thanks very much Emtiu!
- size in google mail and size locally, I tend to think it might be the same, but... maybe google shows it compressed or something? will test that in google admin and report back.
- thats what I suspect and cant live with suspect so will find the truth :-p
Number of messages, will see too that in google admin (hope it shows that somewhere I cant recall now)
One thing I find is that google workspace limits the daily download to about 1.5gb https://support.google.com/a/answer/1071518?hl=en, havent got any error about that yet but...
Cant really test with just 5 messages, because it works fine! even now I redownload about 5gb complete mailbox and no problem.
The thing is what happens in time... this scripts are running for i.e one per hour, so first time it will take a day or more to download the complete mailbox (with setlock it wont start two times the same script), then it will just download a few emails to update... so THERE... when time/weeks passes, I think the glitch appears and it grows bigger and bigger.
Will try what you suggest and maybe also run some duplicate finder that matches the content, so if I get several same content messages...
Thanks for the quick answer!!!
from getmail6.
I can think of these reasons for duplication:
-
gmail uses mailboxes as tags and the same mail can figure in more mailboxes,
but getmail treats them as different mails -
oldmail-*
files got deleted between runs -
not
to_oldmail_on_each_mail
and something went wrong before updatingoldmail-*
files
Could one of these apply to your case?
from getmail6.
Regarding the first, don't use ALL. The default mailboxes in getmail
is ('INBOX',)
. All mails are tagged as INBOX
first, on gmail.
from getmail6.
People reported encoding issues that made the session stop without updating oldfile-
files. As a consequence mails were re-downloaded. to_oldmail_on_each_mail
updates the oldmail-
files on each mail rather than at the end of the session.
from getmail6.
Thanks for all your help! seems to be running good now, with no duplicates... still going 500gb. (and GoogleMail still not blocked me for exceeding 1.5gb, shhhh)
from getmail6.
Sorry, might I open it again?
from getmail6.
Yes, feel free to reopen if the problem is not resolved.
from getmail6.
Hi,
(edit: now I am doubting this two options are making some problem in my setup, not sure...)
**delivered_to** ([boolean](https://getmail6.org/configuration.html#parameter-boolean)) — if set, getmail adds a Delivered-To: header field to the message. If unset, it will not do so. Default: True. Note that this field will contain the envelope recipient of the message if the retriever in use is a multidrop retriever; otherwise it will contain the string "unknown".
**received** ([boolean](https://getmail6.org/configuration.html#parameter-boolean)) — if set, getmail adds a Received: header field to the message. If unset, it will not do so. Default: True.
I did some more testing about the duplicating mails in my gmail downloaded "ALL" folder, so with this I could find de duplicates (I think)
- get to the ALL folder where getmail6 downloaded the complete mailbox with the config in my first post, 17335 files.
- md5sum * > checksums.txt ## this makes a checksum of each file so it can be identified by it content.
- sort checksums.txt | uniq -w32 -d --all-repeated=separate ## this separates and sort the matching checksums in same content groups.
This is what I get, some groups are about 10 matches, other 8... other less, so there is no same pattern
In [retriever] section I use mailboxes = ALL, could it be that one email that is labeled with two or more labels in gmail, getmail6 will download one of each label?
checksum / mail.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1715843400.M505303P17584Q351Rde4c0d8c1bb67ca8.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1715869058.M353821P49345Q351R834c067361823c79.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1715879929.M133632P72767Q351R20955d99f1c591bd.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1715886809.M4376P91369Q351Ra16c0036b13433ee.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1715895259.M158272P111687Q351R6ce2535fbb231572.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1715903811.M183685P132701Q351R2f00990e2ca509d3.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1715932771.M964736P177361Q349R43d11c7911d61ad9.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1715948627.M565818P202326Q349R503045e2341c96ac.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1715960294.M351627P223654Q349Rbc628375b6c38c39.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1715971891.M552163P247890Q349R387efe626605a429.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1715995405.M706898P310461Q348R935ae9a8b22dd6a9.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63 1716006740.M666819P334867Q348R69997b0708933dcd.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715831642.M951416P3777Q168R129204d204366e17.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715858772.M287891P36911Q168R95cfb2c6c8c3f5ad.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715876034.M900840P64961Q168R78b1a2af7eaf0e37.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715884700.M12831P86319Q168R2cfe12b13eac8868.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715892015.M174609P105501Q168R24830202d7651dc6.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715900988.M285829P126036Q168R6904db6a88e88220.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715919037.M885823P153996Q168R56b7f225f774109f.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715922657.M642205P161266Q168R848016fd3f6a1e04.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715926601.M844930P168239Q168R71fbb96f5df61720.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715941089.M493774P191982Q168R7f1693acdda1897f.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715956768.M463501P216330Q168R0b90d8c9da756c39.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715967694.M524329P239758Q168R7afd967d0d1f623b.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715990872.M511756P298169Q168R10f8abdf2c9641e8.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1715992948.M221725P304329Q168Rf01abab7111a7836.neverland
fd52351fcdf83ba4463ddb4c33b5d438 1716001726.M570899P325089Q168Rb09e10a6d99a1ed4.neverland
fd6ecc6589b02f4f377002dfdbe40d77 1715856680.M807505P34039Q1144Re0ae502a7ac78e08.neverland
fd6ecc6589b02f4f377002dfdbe40d77 1715874044.M694684P59829Q1145R39214a60d10982e9.neverland
fd6ecc6589b02f4f377002dfdbe40d77 1715883085.M940622P81787Q1145R6ba792fe6a5d8273.neverland
fd6ecc6589b02f4f377002dfdbe40d77 1715890151.M272076P100684Q1145R868baffdb8a55250.neverland
fd6ecc6589b02f4f377002dfdbe40d77 1715899104.M876762P120678Q1145R890eacbca1541edb.neverland
fd6ecc6589b02f4f377002dfdbe40d77 1715917469.M703775P151131Q1145R166c32808443c387.neverland
fd6ecc6589b02f4f377002dfdbe40d77 1715954758.M479951P212653Q1142R5380319442c5c158.neverland
fd6ecc6589b02f4f377002dfdbe40d77 1715965594.M556138P234998Q1142Ra55a0f721d0b2c48.neverland
fd6ecc6589b02f4f377002dfdbe40d77 1715989473.M78926P294515Q1143Ra6c87024db368955.neverland
fd6ecc6589b02f4f377002dfdbe40d77 1716000640.M466403P322042Q1142R46649e69f8c804ce.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0 1715856492.M861165P33669Q1015R2868990242dec4d4.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0 1715873885.M108690P59285Q1016R0f0ea396095cc994.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0 1715882897.M573335P81234Q1016Rdea754d375a19320.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0 1715889901.M115623P99892Q1016Rb9855e121ef12961.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0 1715898937.M758581P120225Q1016R85fefb1ff8bd8019.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0 1715917274.M541860P150738Q1016R15501c2f78abe3d8.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0 1715954601.M294494P212310Q1013R8329a390c0fcaadd.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0 1715965431.M602952P234566Q1013Rb3cca89f5ecc37b2.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0 1715986809.M575514P286237Q1014Rfaf4437887429668.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0 1716000395.M107118P321450Q1013Re6eeead245411126.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1715843853.M382660P18485Q608R7eb91c9ec349dc4a.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1715869456.M980875P50074Q609R91f9ae95e3644279.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1715880284.M433135P73945Q609R915da27bd9bf30f6.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1715887227.M275661P92829Q609R621251dd01a3947b.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1715895832.M570549P113359Q609Rf7946cfc217b47fd.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1715904339.M176358P134414Q609R85d7158e4947a74b.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1715933238.M380251P178514Q607Re9d8ed9a088e1b82.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1715949106.M641363P203547Q607Rdc9900d677c1fe34.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1715960649.M286524P224775Q607R4a99cdf9d8d5b2ab.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1715978482.M156167P266325Q608Red95ca9cda93ef6b.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1715995820.M116489P311908Q607R6ed872ae9d75c63f.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf 1716007235.M422443P336506Q607R28f58534f447b769.neverland
Also, I configure getmail to verbose = 2, so I get a log of each mailbox, with this verbosity I cant match each log line with a file (easily), but this is a part of one log for i.e., I did a grep getting from it this string "(14623982 bytes)" that might me kind of uniq I think (or at least only one email might be sized that) and it repeats several times... that makes me think it got downloaded several times.
2024-05-16 01:39:34 [[Gmail]/All Mail] msg 216/587 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 05:27:37 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 09:06:36 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 11:50:57 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 13:31:59 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 14:42:45 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 15:46:15 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 16:37:19 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 17:48:59 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 19:03:41 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 20:18:19 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 22:22:41 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 01:19:00 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 02:27:31 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 03:42:22 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 05:50:48 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 07:51:01 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 10:13:17 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 11:50:40 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 13:12:48 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 14:59:31 [[Gmail]/All Mail] msg 216/589 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 18:15:46 [[Gmail]/Sent Mail] msg 205/576 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 21:51:09 [[Gmail]/All Mail] msg 216/589 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 22:49:12 [[Gmail]/Sent Mail] msg 205/576 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 00:29:40 [[Gmail]/All Mail] msg 216/589 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 02:06:45 [[Gmail]/Sent Mail] msg 205/582 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 03:23:00 [[Gmail]/All Mail] msg 216/597 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 04:57:09 [[Gmail]/Sent Mail] msg 205/584 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 07:07:23 [[Gmail]/All Mail] msg 216/597 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 08:54:54 [[Gmail]/Sent Mail] msg 205/584 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 11:03:43 [[Gmail]/All Mail] msg 216/597 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
from getmail6.
Regarding the first, don't use ALL. The default mailboxes in
getmail
is('INBOX',)
. All mails are tagged asINBOX
first, on gmail.
Thanks for the clarification, will test again with just inbox and sent. Also I found that ("INBOX",) is the same in every language, but in case of SENT is not... at least I couldnt find a general naming, so had to use like this:
If in english:
mailboxes = ("[Gmail]Sent", )
If in spanish:
mailboxes = ("[Gmail]/Enviados", )
So in other languages might be the same way :-(
from getmail6.
I can think of these reasons for duplication:
* gmail uses mailboxes as tags and the same mail can figure in more mailboxes, but getmail treats them as different mails * `oldmail-*` files got deleted between runs * not `to_oldmail_on_each_mail` and something went wrong before updating `oldmail-*` files
Could one of these apply to your case?
Can you explain little more about this option? to_oldmail_on_each_mail = true
I didnt understood the documentation about this option... woudl it be useful for what?
Thanks
from getmail6.
Related Issues (20)
- Getmail6 problem with attachments when using it with outlook.office365.com HOT 1
- Not working in Ubuntu 23.04, Python 3.11.6 : missing lts.decode() HOT 4
- spf check HOT 4
- Maildir Filename Generation Issue with Truncated Hostname
- not working on Android Termux if [destination] type is Maildir
- Allow customization of `redirect_uri`
- UnicodeDecodeError when checking mail HOT 3
- Can't get mailboxes parameter to work HOT 3
- handler called, but no children ([Errno 10] No child processes) HOT 5
- getmail6 not working with python 3.12
- Delivery error (maildir delivery 2593690 error (127, maildir delivery process failed (%b requires a bytes-like object, or an object that implements __bytes__, not 'UnicodeEncodeError'))) HOT 4
- Oracle Linux 9.3 install fails HOT 1
- Are there any retrievers for docking folders here? HOT 1
- MS365 Shared Mailboxes HOT 2
- Possible mailbox fetching duplication (part 2) HOT 4
- getmail-gmail-xoauth-tokens no longer working after fix for #178 HOT 3
- getmail-gmail-xoauth-token not starting http server HOT 10
- fails with "TypeError: IMAP4_SSL.__init__()" after changing to Python 3.12 HOT 4
- Microsoft outlook 'modern authentication' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from getmail6.