Code Monkey home page Code Monkey logo

Comments (11)

emtiu avatar emtiu commented on July 17, 2024 1

Hi, thanks for describing your problem in great detail. I don't see the problem immediately, but we may find out something more.

My feeling says that one of two things is probably happening: 1) It's downloading every message just once, but the file size on your local machine comes out much larger than on the server. Or: 2) It's downloading every message multiple times for an unknown reason.

How about the number of messages? Is it also 4 times the number on the server, like with the file size? From the logs, do you see any reason why getmail would download the same thing multiple times?

To make testing easier, it might be useful to create some simpler "test circumstances", where you don't test your setup with 30GB of messages, but only 5 small messages, to see what's happening more easily.

Good luck :)

from getmail6.

sebadamus avatar sebadamus commented on July 17, 2024 1

Thanks very much Emtiu!

  1. size in google mail and size locally, I tend to think it might be the same, but... maybe google shows it compressed or something? will test that in google admin and report back.
  2. thats what I suspect and cant live with suspect so will find the truth :-p

Number of messages, will see too that in google admin (hope it shows that somewhere I cant recall now)
One thing I find is that google workspace limits the daily download to about 1.5gb https://support.google.com/a/answer/1071518?hl=en, havent got any error about that yet but...

Cant really test with just 5 messages, because it works fine! even now I redownload about 5gb complete mailbox and no problem.
The thing is what happens in time... this scripts are running for i.e one per hour, so first time it will take a day or more to download the complete mailbox (with setlock it wont start two times the same script), then it will just download a few emails to update... so THERE... when time/weeks passes, I think the glitch appears and it grows bigger and bigger.

Will try what you suggest and maybe also run some duplicate finder that matches the content, so if I get several same content messages...

Thanks for the quick answer!!!

from getmail6.

rpuntaie avatar rpuntaie commented on July 17, 2024 1

I can think of these reasons for duplication:

  • gmail uses mailboxes as tags and the same mail can figure in more mailboxes,
    but getmail treats them as different mails

  • oldmail-* files got deleted between runs

  • not to_oldmail_on_each_mail and something went wrong before updating oldmail-* files

Could one of these apply to your case?

from getmail6.

rpuntaie avatar rpuntaie commented on July 17, 2024 1

Regarding the first, don't use ALL. The default mailboxes in getmail is ('INBOX',). All mails are tagged as INBOX first, on gmail.

from getmail6.

rpuntaie avatar rpuntaie commented on July 17, 2024 1

People reported encoding issues that made the session stop without updating oldfile- files. As a consequence mails were re-downloaded. to_oldmail_on_each_mail updates the oldmail- files on each mail rather than at the end of the session.

from getmail6.

sebadamus avatar sebadamus commented on July 17, 2024 1

Thanks for all your help! seems to be running good now, with no duplicates... still going 500gb. (and GoogleMail still not blocked me for exceeding 1.5gb, shhhh)

from getmail6.

sebadamus avatar sebadamus commented on July 17, 2024 1

Sorry, might I open it again?

from getmail6.

emtiu avatar emtiu commented on July 17, 2024 1

Yes, feel free to reopen if the problem is not resolved.

from getmail6.

sebadamus avatar sebadamus commented on July 17, 2024

Hi,

(edit: now I am doubting this two options are making some problem in my setup, not sure...)
**delivered_to** ([boolean](https://getmail6.org/configuration.html#parameter-boolean)) — if set, getmail adds a Delivered-To: header field to the message. If unset, it will not do so. Default: True. Note that this field will contain the envelope recipient of the message if the retriever in use is a multidrop retriever; otherwise it will contain the string "unknown".
**received** ([boolean](https://getmail6.org/configuration.html#parameter-boolean)) — if set, getmail adds a Received: header field to the message. If unset, it will not do so. Default: True.

I did some more testing about the duplicating mails in my gmail downloaded "ALL" folder, so with this I could find de duplicates (I think)

  1. get to the ALL folder where getmail6 downloaded the complete mailbox with the config in my first post, 17335 files.
  2. md5sum * > checksums.txt ## this makes a checksum of each file so it can be identified by it content.
  3. sort checksums.txt | uniq -w32 -d --all-repeated=separate ## this separates and sort the matching checksums in same content groups.

This is what I get, some groups are about 10 matches, other 8... other less, so there is no same pattern
In [retriever] section I use mailboxes = ALL, could it be that one email that is labeled with two or more labels in gmail, getmail6 will download one of each label?

checksum / mail.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1715843400.M505303P17584Q351Rde4c0d8c1bb67ca8.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1715869058.M353821P49345Q351R834c067361823c79.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1715879929.M133632P72767Q351R20955d99f1c591bd.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1715886809.M4376P91369Q351Ra16c0036b13433ee.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1715895259.M158272P111687Q351R6ce2535fbb231572.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1715903811.M183685P132701Q351R2f00990e2ca509d3.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1715932771.M964736P177361Q349R43d11c7911d61ad9.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1715948627.M565818P202326Q349R503045e2341c96ac.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1715960294.M351627P223654Q349Rbc628375b6c38c39.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1715971891.M552163P247890Q349R387efe626605a429.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1715995405.M706898P310461Q348R935ae9a8b22dd6a9.neverland
fd1adb0dcfba5c0cb50dc6e8a0835b63  1716006740.M666819P334867Q348R69997b0708933dcd.neverland

fd52351fcdf83ba4463ddb4c33b5d438  1715831642.M951416P3777Q168R129204d204366e17.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715858772.M287891P36911Q168R95cfb2c6c8c3f5ad.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715876034.M900840P64961Q168R78b1a2af7eaf0e37.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715884700.M12831P86319Q168R2cfe12b13eac8868.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715892015.M174609P105501Q168R24830202d7651dc6.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715900988.M285829P126036Q168R6904db6a88e88220.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715919037.M885823P153996Q168R56b7f225f774109f.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715922657.M642205P161266Q168R848016fd3f6a1e04.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715926601.M844930P168239Q168R71fbb96f5df61720.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715941089.M493774P191982Q168R7f1693acdda1897f.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715956768.M463501P216330Q168R0b90d8c9da756c39.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715967694.M524329P239758Q168R7afd967d0d1f623b.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715990872.M511756P298169Q168R10f8abdf2c9641e8.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1715992948.M221725P304329Q168Rf01abab7111a7836.neverland
fd52351fcdf83ba4463ddb4c33b5d438  1716001726.M570899P325089Q168Rb09e10a6d99a1ed4.neverland

fd6ecc6589b02f4f377002dfdbe40d77  1715856680.M807505P34039Q1144Re0ae502a7ac78e08.neverland
fd6ecc6589b02f4f377002dfdbe40d77  1715874044.M694684P59829Q1145R39214a60d10982e9.neverland
fd6ecc6589b02f4f377002dfdbe40d77  1715883085.M940622P81787Q1145R6ba792fe6a5d8273.neverland
fd6ecc6589b02f4f377002dfdbe40d77  1715890151.M272076P100684Q1145R868baffdb8a55250.neverland
fd6ecc6589b02f4f377002dfdbe40d77  1715899104.M876762P120678Q1145R890eacbca1541edb.neverland
fd6ecc6589b02f4f377002dfdbe40d77  1715917469.M703775P151131Q1145R166c32808443c387.neverland
fd6ecc6589b02f4f377002dfdbe40d77  1715954758.M479951P212653Q1142R5380319442c5c158.neverland
fd6ecc6589b02f4f377002dfdbe40d77  1715965594.M556138P234998Q1142Ra55a0f721d0b2c48.neverland
fd6ecc6589b02f4f377002dfdbe40d77  1715989473.M78926P294515Q1143Ra6c87024db368955.neverland
fd6ecc6589b02f4f377002dfdbe40d77  1716000640.M466403P322042Q1142R46649e69f8c804ce.neverland

fdb24178ae5a2f4ecc80f75663d6f6e0  1715856492.M861165P33669Q1015R2868990242dec4d4.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0  1715873885.M108690P59285Q1016R0f0ea396095cc994.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0  1715882897.M573335P81234Q1016Rdea754d375a19320.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0  1715889901.M115623P99892Q1016Rb9855e121ef12961.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0  1715898937.M758581P120225Q1016R85fefb1ff8bd8019.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0  1715917274.M541860P150738Q1016R15501c2f78abe3d8.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0  1715954601.M294494P212310Q1013R8329a390c0fcaadd.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0  1715965431.M602952P234566Q1013Rb3cca89f5ecc37b2.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0  1715986809.M575514P286237Q1014Rfaf4437887429668.neverland
fdb24178ae5a2f4ecc80f75663d6f6e0  1716000395.M107118P321450Q1013Re6eeead245411126.neverland

fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1715843853.M382660P18485Q608R7eb91c9ec349dc4a.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1715869456.M980875P50074Q609R91f9ae95e3644279.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1715880284.M433135P73945Q609R915da27bd9bf30f6.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1715887227.M275661P92829Q609R621251dd01a3947b.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1715895832.M570549P113359Q609Rf7946cfc217b47fd.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1715904339.M176358P134414Q609R85d7158e4947a74b.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1715933238.M380251P178514Q607Re9d8ed9a088e1b82.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1715949106.M641363P203547Q607Rdc9900d677c1fe34.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1715960649.M286524P224775Q607R4a99cdf9d8d5b2ab.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1715978482.M156167P266325Q608Red95ca9cda93ef6b.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1715995820.M116489P311908Q607R6ed872ae9d75c63f.neverland
fe2ec3b71aa8ea2ab4eb896c8d94d7bf  1716007235.M422443P336506Q607R28f58534f447b769.neverland

checksums-sort.txt

Also, I configure getmail to verbose = 2, so I get a log of each mailbox, with this verbosity I cant match each log line with a file (easily), but this is a part of one log for i.e., I did a grep getting from it this string "(14623982 bytes)" that might me kind of uniq I think (or at least only one email might be sized that) and it repeats several times... that makes me think it got downloaded several times.

2024-05-16 01:39:34 [[Gmail]/All Mail] msg 216/587 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 05:27:37 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 09:06:36 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 11:50:57 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 13:31:59 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 14:42:45 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 15:46:15 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 16:37:19 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 17:48:59 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 19:03:41 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 20:18:19 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-16 22:22:41 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 01:19:00 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 02:27:31 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 03:42:22 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 05:50:48 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 07:51:01 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 10:13:17 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 11:50:40 [[Gmail]/All Mail] msg 216/588 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 13:12:48 [[Gmail]/Sent Mail] msg 205/575 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 14:59:31 [[Gmail]/All Mail] msg 216/589 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 18:15:46 [[Gmail]/Sent Mail] msg 205/576 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 21:51:09 [[Gmail]/All Mail] msg 216/589 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-17 22:49:12 [[Gmail]/Sent Mail] msg 205/576 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 00:29:40 [[Gmail]/All Mail] msg 216/589 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 02:06:45 [[Gmail]/Sent Mail] msg 205/582 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 03:23:00 [[Gmail]/All Mail] msg 216/597 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 04:57:09 [[Gmail]/Sent Mail] msg 205/584 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 07:07:23 [[Gmail]/All Mail] msg 216/597 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 08:54:54 [[Gmail]/Sent Mail] msg 205/584 (14623982 bytes) msgid 5/205 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/
2024-05-18 11:03:43 [[Gmail]/All Mail] msg 216/597 (14623982 bytes) msgid 11/259 from <[email protected]> delivered to Maildir /home/sebadamus/getmail/Maildir/themail/ALL/

from getmail6.

sebadamus avatar sebadamus commented on July 17, 2024

Regarding the first, don't use ALL. The default mailboxes in getmail is ('INBOX',). All mails are tagged as INBOX first, on gmail.

Thanks for the clarification, will test again with just inbox and sent. Also I found that ("INBOX",) is the same in every language, but in case of SENT is not... at least I couldnt find a general naming, so had to use like this:
If in english:
mailboxes = ("[Gmail]Sent", )
If in spanish:
mailboxes = ("[Gmail]/Enviados", )

So in other languages might be the same way :-(

from getmail6.

sebadamus avatar sebadamus commented on July 17, 2024

I can think of these reasons for duplication:

* gmail uses mailboxes as tags and the same mail can figure in more mailboxes,
  but getmail treats them as different mails

* `oldmail-*` files got deleted between runs

* not `to_oldmail_on_each_mail` and something went wrong before updating `oldmail-*` files

Could one of these apply to your case?

Can you explain little more about this option? to_oldmail_on_each_mail = true
I didnt understood the documentation about this option... woudl it be useful for what?

Thanks

from getmail6.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.