Code Monkey home page Code Monkey logo

nerd's Introduction

NERD - Network Entity Reputation Database

NERD is a software and a service which acquires, stores and aggregates various data about known malicious network entities (mostly IP addresses) and provides them in a comprehensible way to users.

The main NERD instance runs at nerd.cesnet.cz.

See the project wiki for more information.


This software was developed within the scope of the Security Research Programme of the Czech Republic 2015 - 2020 (BV III / 1 VS) granted by the Ministry of the Interior of the Czech Republic under the project No. VI20162019029 The Sharing and analysis of security events in the Czech Republic.

nerd's People

Contributors

aisik00 avatar cejkato2 avatar dianwoshishi avatar jakubjancicka avatar janskto1 avatar karyfeever avatar kukant avatar ladislavmacoun avatar misa11n avatar oltis17 avatar sambudai avatar tomasmax95 avatar vaclavbartos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nerd's Issues

Configuration in YAML

Configuration files should be rewritten to YAML format, which is much easier to write/edit, than JSON which is currently used. What is needed:

  • Rewrite common/config.py to parse YAML, keeping the same interface.
  • Rewrite all existing configuration files to YAML

Hostname tagging - recognition of IP encoded in hostname

Add new functionality to HostnameClass module - find out if the hostname was derived from the corresponding IP address, i.e. if the IP (or part of it) is somehow encoded into the hostname. For example, the followng situations should be recognized:

100.33.130.194 | pool-100-33-130-194.nycmny.fios.verizon.net
195.62.53.135 | 53-135.static.spheral.ru
81.214.186.168 | 81.214.186.168.dynamic.ttnet.com.tr
103.57.134.203 | 203.134.57.103-in-addr.arpa-hireachdns.com
203.192.236.35 | dhcp-192-236-35.in2cable.com
46.107.91.149 | 2E6B5B95.dsl.pool.telekom.hu

The task is:

  • Find out various formats being commonly used.
  • Prepare a regexp (or s set of regexps) to match all those formats.
  • Modify HostnameClass module to assign tag ip_in_hostname if IP is found in hostname (possibly with confidence < 1, if there are some formats where it's not completely clear that the hosname was derived from IP, but I don't expect this).

Tagging should take into account confidence of input attributes

Currently, rules ("condition") for tagging assumes input attributes to be binary (i.e. strictly true or false, no confidence value), confidence of output can only be set by explicitly multiplying input values by numbers.

It should be added a support for inputs with confidence value set. So, a term in a condition based on an attribute with confidence c will also have confidence c (which can be further modified by arithmetic operations, of course). For example:

  • Record contains: "hostname_class": {v: "dynamic", c: 0.8}
  • Tag condition: 0.9*('dynamic' in hostname_class)
  • Result: tag is set with confidence = 0.72

Of course, it the input attribute is simple, i.e. it has no confidence value assigned, confidence=1 is assumed.

Confidence of a combination of individual terms in a condition is computed as follows:

  • Arithmetic operations behave normally.
  • Logical and: A and B = A * B
    • Example:
      • Rule: ('dynamic' in hostname_class) and bl.tor --> tag "dynamic_tor"
      • Inputs: ('dynamic' in hostname_class).c = 0.9, bl.tor = 1.0 (implicitly, since blacklists have no confidence values)
      • Result: dynamic_tor.c = 0.9
  • Logical or: A or B = 1 - ((1 - A) * (1 - B))
    • Example:
      • Rule: 0.9*('dynamic' in hostname_class) or 0.2*('dsl' in hostname_class) --> tag "dynamic"
      • Inputs: ('dynamic' in hostname_class).c = 1.0, ('dsl' in hostname_class).c = 1.0
      • Result: dynamic.c = 0.9*1.0 or 0.2*1.0 = 0.9 or 0.2 = 1 - (0.1 * 0.8) = 0.92

See Data model for proposed specification of how values with confidence should be stored in database. The tagging scheme should automatically recognize if given input value is plain or with confidence (by its data type and presence of ".c" attribute).

Note: The definition of confidnce combinations may change, I'll need to prepare some real use-cases and find out if these operations are OK. So, do other issues first. I wrote this so you have an idea what you will work on in the future.

Study FireHOL IP Lists - potential source of information

Study FireHOL IP Lists (http://iplists.firehol.org), it seems to be an extensive list of various blacklists and other data feeds. Maybe it can be somehow used directly by NERD, or at least as a source of information about potentially usable feeds, that could be downloaded in NERD directly.

I also think they have a script for downloading and processing all the data freely available.

Maybe it could also be used as source of information about blacklists for users, i.e. to show a link to blacklist's FireHOL "profile" as info about the blacklist (together with link to its official website; of course it will need to first imlpement help strings/popups for blacklists).

Get data from CIRCL's BGP Ranking

Add a new module to NERD which adds attribute circl_bgprank to asn entities. The value will be taken from CIRCL's public BGP ranking service (https://www.circl.lu/projects/bgpranking/) via its REST API.

Use bgpranking_web library for querying (available on pypi, just add it to requirements.txt). Function cache_get_daily_rank will be probably needed (full API documentation: http://circl.github.io/bgpranking-redis-api/code/API.html).

The function querying and setting the rank will be registered on asn:!NEW and asn:!every1d events (i.e. the rank will be queried when a new ASN is added and then refreshed every day). It will simply set attribute circl_bgprank to the value received from API.

Add confidence to event-type tags

Currently, tags marking prevalent event types (e.g. Scanner, Login attampts) are binary - when number of events of a given type exceeds a threshold, the tag is assigned, otherwise it isn't. Since we already have support for "confidence" in our tagging scheme, it would be nice to use it here as well.

So, a tag should be assigned even for a single occurence of some event, but with a low confidence. With increasing number of events, the confidence will grow, reaching 1.0 at some configured threshold. Also, it would be nice if the threshold (and the rate of confidence growing) could be configured differently for different types of attacks.

Detaily about configuration and implementation are up to you, or we can discuss it later.

Bulk API endpoint returns 500 when requested without trailing slash

The bulk IP address query endpoint must be accessed with a trailing slash: <base_url>/ip/bulk/, as documented in the wiki. When invoking the endpoint without the trailing slash, the response is always {"err_n": 500, "error": "Internal Server Error"}.

Can be reproduced with:

curl -H "Authorization: TOKEN" -H "Content-Type: application/octet-stream" https://nerd.cesnet.cz/nerd/api/v1/ip/bulk --data "ZZZZ"

This is most certainly incorrect behaviour. The response should at least be a 404, perhaps 400. Though, I would personally consider "bulk" to be the resource that is being accessed; that would mean that not having a trailing slash would be the right way to access it. Practically, to ensure compatibility, I think both variants should work equally...

Get data from DShield API

DShield project from SANS ICS have an API allowing to query info about IPs. It returns number of reports and unique targets reported to DShield. No API rate-limits are documented (which doesn't mean thare are none, we need to test it). These data can be queried for each IP address in NERD.

For each newly added IP address in NERD, and then every day (i.e. !NEW and !every1d events), request IP info via DSHield API, parse it, and store to the record in NERD as attributes dshield.*. The following fields should be stored: count (store as reports), attacks (store as targets), mindate, maxdate.

API documentation is at https://isc.sans.edu/api/#ip

Compare API output, e.g. https://isc.sans.edu/api/ip/5.188.10.180, with HTML version, https://isc.sans.edu/ipinfo.html?ip=5.188.10.180, to better understand meaning of fields in API output.

Consider using JSON output instead of the default XML.

Frontend (IP detail) should be updated to show new info in some "nice" way. There should also be a link to more details on DShield pages (https://isc.sans.edu/ipinfo.html?ip=5.188.10.180 or https://isc.sans.edu/ipdetails.html?ip=5.188.10.180 or both).

Module for counting various events

For better performance monitoring and debugging, it's needed to be able to count number of some events (like some updates, queries to external sources, etc.) per specified time interval. Currently, there is such counting in some modules (e.g. DNSBL module counts number of requests made per day, UpdateManager stores some statistics to files for Munin), but it's needed in other places as well and it should be unified.

Therefore, a special module/class EventCountLogger should be created.

It should keep a counter for each event ID. EventIDs are registered dynamically by other modules and then incremented by calling a logEvent method. EventIDs are grouped into groups. Each group has defined a time interval and a filename. Each time interval, current values of couters in the group are dumped to the file are reset to 0.

Methods:

  • registerEvents(group, log_frequency, [event_ids])
    • group - group of events, used as a name of file
    • log_frequency - how often to log the counters to file (in seconds)
    • event_ids - list of strings (all these events will be logged together to the given file)
    • All files are stored in a directory set in configuration.
    • Creates a group containing a counter (initialized to 0) for each id in event_ids
  • logEvent(event_id, count=1)
    • Add count to the counter(s) of event_id

EventIDs do NOT have to be unique among groups - if one id is present in more groups, all corresponding counters are incremented.
This can be used e.g. to count one event both every minute and every day - call:

registerEvents("mygroup_1h", 3600, ["myevent1", "myevent2"])
registerEvents("mygroup_24h", 24*3600, ["myevent1", "myevent2"])

Then, just call logEvent("myevent1") and numbers of such calls will be logged hourly in "mygroup_1h" file and daily in "mygroup_24h" file.


For time scheduling, I recommend to use APScheduler (which is already used by other parts of NERD).

Time intervals should be "aligned", so for example, if interval is 60 seconds, counters should be written at 00:00, 01:00, 02:00, etc. (basically, when unix_timestamp % interval == 0).

It would also be nice (not necessary in first version), if counters can survive NERDd restart. That is, counters should be dumped whe NERDd stops; and when it starts and event groups are registered, it should look for existing counter files and their time of last modification, and if the interval hasn't elapsed yet, initialize counters from the file.


Format of the file:

#YYYY-MM-DDTHH:MM:SS
eventid count
eventid count

The module will be part of NERD "core" (i.e. in core directory). An instance of EventCountLogger will be created in nerdd.py during startup and stored to g ("globals" module) so it's accessible throughout the application.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.