Code Monkey home page Code Monkey logo

intelmq-mailgen's People

Contributors

bernhard-herzog avatar bernhardreiter avatar dmth avatar gsiv avatar rolandgeider avatar swilde avatar th-certbund avatar wagner-intevation avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

th-certbund

intelmq-mailgen's Issues

Modifie CSV output format

As per cutomer request:

CSV output should:

  • be seperated by commas
  • quote every field using quotation marks

I'll implement the change.

Support python3-gpg (for Ubuntu 20.04 LTS)

GnuPG has published official Python bindings.

The currently used GnuPG python bindings (pygpgme) are being phased out from GNU/Linux distributions and are not very actively maintained.

We shall support the official bindings (and test with Ubuntu 20.04 LTS)
additionally. The official bindings are in package python3-gpg (see https://packages.ubuntu.com/focal/python3-gpg).

Support can be dropped for pygpgme, once Ubuntu 16.04 LTS stops getting maintenance updates (April 2021) https://ubuntu.com/about/release-cycle https://wiki.ubuntu.com/Releases

details on old library

Is https://launchpad.net/pygpgme (which would be the package python3-gpgme for Ubuntu 16.04 LTS (https://packages.ubuntu.com/search?suite=default&section=all&arch=any&keywords=python3-gpgme&searchon=names)

mailgen creates email without orig-date header

The "Date:" (aka. orig-date field) header is not only extremely useful, but also required according to RFC 5322 section 3.6. "Field Definitions".

But currently this header is missing from mails generated by mailgen. Mailgen should add this header, with the current time/date from the moment the mail was generated. For further semantics see the RfC.

OpenPGP/MIME signatures

For email receivers that want to get emails with attachments (e.g. x-arf or csv attachments) they want to have an OpenPGP/MIME signature to be able to verify the sender in a standard compliant way.

A module or code can be useful to implement parts of certtools/intelmq#534 .

technical

RFCs 2015 and 3156 define a MIME compatible solution for OpenPGP signed emails. The advantage is that encodings and mime-types will be handled nicely and even in the case that the mail user agent does not know about crypto.

Sending out x-arf emails

Should be able to send out x-arf emails.
Specification available from http://www.x-arf.org,
the question is: which version v0.2 or v0.3 draft.

TODO: List x-arf sender and receivers. Look for example emails.

fast way to notice that a "load of events" has been processed (as indicator for sending)

A number of feeds come as one block, e.g. once a day. (This means that will have the same time.observation value in intelmq.)

Each recipient wants to get one aggregrated email with all notification for this block as fast and complete as possible.

technical implementation thoughts

@bernhard-herzog has implemented a way to notice that when a directive was inserted the last time for a specific set of aggregation values, so when used to aggregate on time.observation and max(d.inserted_at) is 2 hours ago, we trigger sending the email.

This methods has the drawback that if the first event and the last event of one load (or batch) is for this set of aggregation values, it will take a long time before another directive is entering the database, so the time intervall has to be quite long to have a good detection that processing has been through.

This issue is about using a better detection mechanism, that can detect the completion of processing faster, thus sending emails faster on the average.

implementation idea

Using an extra table that for each feed.name and time.observation keeps that last inserted directive time. This way the email aggregation script can use a simple additional query to see with a higher reliability that the batch has been processed fully.

Necessary implementation steps (roughly):

  • extend trigger to enter information in the new table
  • add new table to db schema
  • extend mail generation to use the new information
  • provide a way to deal with old pending notifications for the migration to the new system

Handle unknown notification formats

We need to decide on how to handle notifications for which mailgen cannot determine the format to use for the message. E.g. if a notification specifies a format which mailgen simply does not know about, what should mailgen do?

Next step for testing, add instructions for an smtpd which saves the emails

The debugging smptd of python3 only dumps the email to std.
For analysis, manual and automatic testing, it makes sense to save the emails
somewhere. Maybe on disc in maildir format so that emails can be inspected by email
clients or other python scripts.

I'll give it a shot.
Next steps:

  • look for an smtpd python module which can already do this?
  • see if there is a python module for maildir

turn intelmq-mailgen into a module to help unit test single functions

The intelmq-mailgen file contains about 20 functions right now.
In order to be able to write unittests for it, we probably should turn it into a module.

Within the tests directory I would want to import the functions and write tests for it.

e.g. into directory "mailgen" with init.py
and turn intelmq-mailgen into file that just imports that module and runs it.

Allow parallel email creation to raise speed

Right now only one mailgen variant can run at a time
and mailgen only uses one (python thread).

If email sending speed becomes an issue, it may be possible to enable
more email creation processes to work in parallel.

Possibilities:

  • allow to start several mailgen workers (from different machines)
  • use threads within mailgen (because sending and crypto will be i/o bound from mailgen's side)
  • make sure crypto processes can run on a different thread/core.

Both ideas will allow a machine with several cores to utilise them better.

If implemented, the sql interactions must be checked for race conditions.

Technically the use of SQL selections should right now prevent
an active second mailgen script to run.

@bernhard-herzog
https://github.com/Intevation/intelmq-mailgen/blob/master/intelmq-mailgen#L715
has FOR UPDATE NOWAIT
does this prevent mailmen from running twice like you've said?

Test data via database

In order to facilitate tests, we should have a database sql extract (part of a dump) that we can insert
and run mailgen on.

Good would be to have a few types of all patterns represented in the database table.

Should be easy to produce from a test run with intelMQ inserting the events.

Deb package build process no longer runs unit tests

Before issue #19 dpkg-buildpackage would run (a part of) mailgen's
test suite via its Makefile.

After the restructuring and switch to pybuild, this is no longer the
case:

I: pybuild base:170: cd /2auto/intelmq-mailgen/.pybuild/pythonX.Y_3.4/build; python3.4 -m unittest discover -v 

----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK
I: pybuild base:170: cd /2auto/intelmq-mailgen/.pybuild/pythonX.Y_3.5/build; python3.5 -m unittest discover -v 

----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK

Tests can still be run manually with make but we should probably
fix this regression.

logging: prepare for python standard configuration methods

Right now, only the logging_level for intelmqmail can be set via cb.main()
and the configuration file. This is close to how intelmq does it right now.

If logging is used in production with more requirements, it may make sense to

  1. resort to a logging configuration standard file that python's logging module accepts
  2. unify this with intelmq itself, because there should be a central place fo configure logging for the whole solution (over core and all components)

Make ticket number unguessable

Split out from #28
Make the ticket number unguessable.

Implementation idea:
Add a table that keeps the used ticket ids,
draw new ones randomly until you find out that has not been in use.

Within our current design size:
100,000,000 possibilities for numbers per day
and aiming for sending out 1,000,000 mail per days,
this process will need a redraw in a max of 1% of cases.
So we are okay.

Less attractive implementation idea:
Using a festel chipher like suggested on the Postgresql wiki.
It is less clear to prove that the chipher will create no collisions
and what would be needed (in terms of the used "round" function or "key")
to make it unguessible, if the source is know.

non-interactive installation for apache password

The debian packages should give the possibility for a non-interactive setting
of the intelmq apache password. This is a precondition for automatic tests.

Solution idea:
Just generate a password and write it into the system wide configuration file.
So admins can look it up.

enforce quoted-printable encoding for text/plain emails to be send out - even if only 7-bit body

In the rare case that very long lines are in there, quoted-printable is necessary.
For the other cases for a text/plain body it is not strictly necessary,
but we have some requests to do quoted-printable there anyway.
(Probably because of the compatibility with some email receivers.)

== Technical Analysis:
The version of python3 on Debian Jessie does not do quoted-printable on very long lines,
so enforcing could make the solution more robust.

mailgen uses only default formating

The current implementation of mailgen can only create two variations
of mails:

  1. when the "format" field of the record in the notifications table
    contains "feed_specific" as value, the mail will be created from
    the "specific.txt" template, with cvs data containing
    `botnet_drone_csv_columns'.
  2. for any other value of "format" a mail will be generated from the
    template specified in the notification record, with cvs data containing
    `most_csv_columns'.

Looking at the code line 599 ff. this comes to no surprise:

    if notification["format"] == "feed_specific":
        formatter = known_formatters['feed_specific'].get(('csv',
                                                           notification["feed_name"]))
        if formatter is not None:
            formatter = known_formatters['feed_specific'].get(('csv',
                                                               'DEFAULT'))

    else:
        formatter = known_formatters['generic'].get((notification["format"],
                                                     "GENERIC"))

besides being hilarious the "if formatter is not None:" makes no sense
at all and should be deleted. The whole
mail_format_feed_specific_as_csv should be deleted, too, as there is
already a generic fallback...

'source_directives' JSON object not added to column "extra" from table "events"

Hi, I have a problem with inserting into directives table, which is later used for sending mails. As far as i understood this trigger on table events adds new row to table directives through few nested procedures into the trigger:
"
create trigger events_insert_directive_trigger after
insert
on
public.events for each row execute procedure events_insert_directives_for_row();
"

events_insert_directives_for_row procedure:
"
CREATE OR REPLACE FUNCTION public.events_insert_directives_for_row()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $function$
BEGIN
PERFORM directives_from_extra(NEW.id, NEW.extra);
RETURN NEW;
END
$function$
;
"

directives_from_extra procedure:
"
CREATE OR REPLACE FUNCTION public.directives_from_extra(event_id bigint, extra json)
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE
json_directives JSON := extra -> 'certbund' -> 'source_directives';
directive JSON;
BEGIN
IF json_directives IS NOT NULL THEN
FOR directive
IN SELECT * FROM json_array_elements(json_directives) LOOP
PERFORM insert_directive(event_id, directive, 'source');
END LOOP;
END IF;
END
$function$
;
"

The last procedure searches for JSON object "source_directives", but none of the CERT-bund Contact Database and CERT-bund Contact Rules seem to add this information to column "extra" in table "events". This is how my "extra" column looks formatted:
{
"features":"cmd,stat_v2,shell_v2",
"certbund":{
"source_contacts":{
"organisations":[
{
"import_source":"",
"name":"test1",
"id":0,
"managed":"manual",
"sector":null,
"contacts":[
{
"email":"[email protected]",
"managed":"manual",
"email_status":"enabled",
"annotations":[

                 ]
              }
           ],
           "annotations":[

           ]
        }
     ],
     "matches":[
        {
           "organisations":[
              0
           ],
           "managed":"manual",
           "field":"asn",
           "annotations":[

           ]
        }
     ]
  }

},
"model":"SM-G960F",
"name":"starltexx",
"tag":"adb",
"device":"starlte"
}

I only have source_contacts. I added the contact "[email protected]" from fody application, but i didnt insert all info, only ASN and mail. Could this be the problem, which i doubt or if you can help me with finding where CERT-bund expert should add "source_directives" info into column "extra".

Thanks in advance.

Create a unique ticket number per email usable for help desks

Each email should have a unique ticket number.
It should be usable for help desks, this means:

  • Shorter and readable is better
  • should make it hard to guess the number of send reports.

Idea is: use a prefix for the cert, like cert-example "CE"
then an iso like date and a unique random number,
formated to be readable over the phone.

Example
CE-20160818-1234-5678

Variants:

  • Using hexadecimal number one could save one character length, which could be used
    to make it shorter or to implement a simple checksum to prevent typos

Implementation ideas:

  • Use the postgresql database to have a table that marks the randomly chosen ticket ids
    to avoid collisions. Could be done by a postgresql internal function.
  • use a dict internal to one mailgen run, and limit the mailgen runs to 1 per day
    (or add the hours behind the date)
  • sync mailgen runs via an extra database (redis?) :)
  • add a small service that only draws new unique numbers.

Using postgresql seems to be preferable to not introduce more dependencies.
If the roundtrip to the db or to the service becomes a problem, someone could draw a couple of ids and cache them for usage.

Reading of configuration files should be optional

To enable testing as a regular user, reading the system configuration file should be optional
and if there is a complete system configuration file there is no need to enforce an additional local user
one.

Packaging-Debian: Leave system configuration untouched

@gsiv does debian/rules
'''sh
sed 's@/usr/local/lib/intelmq@/usr/lib/intelmq@'
debian/intelmq-mailgen/usr/share/doc/intelmq-mailgen/examples/intelmq-mailgen.conf.example
> debian/intelmq-mailgen/etc/intelmq/intelmq-mailgen.conf
'''
leave the system configuration line okay?
(The replacement also seems unnecessary rightn ow.)

Dokumentation flaws

The documentation needs some cleanup and rework, two especially important points:

  • the documentation contains a rather incomprehensible (at least to me) section "Specific Templates".
  • Some CSV output-formaters are using the information from classification identifier as malware name.
    The documentation should explain this fact and point out, that the modify expert should be
    used to fill the classification identifier with the wanted values.

Adding proper header file

The main file in my view needs a header file stating the authors, copyright and license.

And: BTW: shouldn't be #!/usr/bin/env python3 to be sure?

package not installable because of missing pyxarf package

# apt install intelmq-mailgen
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 intelmq-mailgen : Depends: python3 (>= 3.6) but 3.5.3-1 is to be installed
                   Depends: gnupg (>= 2.2) but 2.1.18-8~deb9u4 is to be installed
                   Recommends: python3-pyxarf (>= 0.0.5) but it is not installable
E: Unable to correct problems, you have held broken packages.

the pyxarf package is not available in any ubuntu repository
It is available for xenial here: http://apt.intevation.de/dists/xenial/intelmq-testing/binary-amd64/
but that is pretty useless nowadays
pyxarf is highly optional, so it should not be recommended either

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.