aau-network-security / richkit Goto Github PK

View Code? Open in Web Editor NEW

12.0 5.0 3.0 21.35 MB

Domain Enrichment Toolkit $ pip install richkit

Home Page: https://pypi.org/project/richkit/

License: MIT License

Python 95.77% Shell 2.37% Makefile 1.86%

domain analysis machine-learning categorization url

richkit's Introduction

Richkit

Richkit is a python3 package that provides tools taking a domain name as input, and returns addtional information on that domain. It can be an analysis of the domain itself, looked up from data-bases, retrieved from other services, or some combination thereof.

The purpose of richkit is to provide a reusable library of domain name-related analysis, lookups, and retrieval functions, that are shared within the Network Security research group at Aalborg University, and also availble to the public for reuse and modification.

Documentation can be found at https://richkit.readthedocs.io/en/latest/.

Requirements

Python >= 3.5

Installation

In order to install richikit just type in the terminal pip install richkit

Usage

The following codes can be used to retrieve the TLD and the URL category, respectively.

Retriving effective top level domain of a given url:

>>> from richkit.analyse import tld
>>> urls = ["www.aau.dk","www.github.com","www.google.com"]
>>>
>>> for url in urls:
...     print(tld(url))
dk
com
com

Retriving category of a given url:

>>> from richkit.retrieve.symantec import fetch_from_internet
>>> from richkit.retrieve.symantec import LocalCategoryDB
>>>
>>> urls = ["www.aau.dk","www.github.com","www.google.com"]
>>>
>>> local_db = LocalCategoryDB()
>>> for url in urls:
...     url_category=local_db.get_category(url)
...     if url_category=='':
...         url_category=fetch_from_internet(url)
...     print(url_category)
Education
Technology/Internet
Search Engines/Portals

Modules

Richkit define a set of functions categorized by the following modules:

richkit.analyse: This module provides functions that can be applied to a domain name. Similarly to richkit.lookup, and in contrast to richkit.retrieve, this is done without disclosing the domain name to third parties and breaching confidentiality.
richkit.lookup: This modules provides the ability to look up domain names in local resources, i.e. the domain name cannot be sent of to third parties. The module might fetch resources, such as lists or databasese, but this must be done in a way that keeps the domain name confidential. Contrast this with richkit.retrieve.
richkit.retrieve: This module provides the ability to retrieve data on domain names of any sort. It comes without the "confidentiality contract" of richkit.lookup.

Run Tests on Docker

In order to prevent any problems regarding to environment, we are providing Dockerfile.test file which basically constructs a docker image to run tests of Richkit.

The only thing to add is just MAXMIND_LICENCE_KEY in .github/local-test/run-test.sh at line 3. It is required to pass the test cases for lookup module.

Commands to test them in Docker environment.

docker build -t richkit-test -f Dockerfile.test . : Builds required image to run test cases
docker run -e MAXMIND_LICENSE_KEY="<licence-key> " richkit-test : Runs run-test.sh file in Docker image.

Contributing

Contributions are most welcome.

We use the gitflow branching strategy, so if you plan to push a branch to this repository please follow that. Note that we test branch names with .githooks/check-branch-name.py. The git pre-commit hook can be used to automatically check this on commit. An example that can be used directly as follows is available on linux, and can be enabled like this (assuming python>=3.6 and bash):

ln -s $(pwd)/.githooks/pre-commit.linux.sample $(pwd)/.git/hooks/pre-commit

Credits

Logo designed by indepedenthand

richkit's People

Contributors

Stargazers

Watchers

Forkers

darknesschieftain 5l1v3r1 kidmose

richkit's Issues

Document the assumed data model for domain names

On a meeting with kh, atu, gmm and egk on 19. sept. kh described a model for how to refer to different sort of domain names. This is currently captured in the readme with:

Todo: Describe the data model of FQDN > APEX Domain > Public Suffix > TLD

This needs to be done.
This will contribute to solving #1 .

As a reviewer I'd like adherence to pep8 to make code easier to read

A tool like flake8 can be used to weed out code that diverts from the pep8 coding style:.
We currently have 506 errors/warnings:

me@machine:dat$ pip install flake8
me@machine:dat$ flake8 dat/ test/ | wc -l
506

Adhering to pep8 would make my life easier when reviewing PRs, because I have an easier time reading the code and understanding what has changed.
When I code myself I already try to adhere, so I think the overhead is negligible.

I suggest to include flake8 in the CI/CD pipeline.
As a starting point, failing builds on errors and warnings.

Opinions @gianmarcomennecozzi , @mrturkmen06 and anyone?

URLVoid.get_asn return more than number

URLVoid.get_asn returns a string containing both AS Number and name, which conflicts with docstring (:return: ASN Number)

This also leads to unintented test failures, when the name changes: https://github.com/aau-network-security/richkit/runs/425212639?check_suite_focus=true#step:6:52

remove URLVoid code because it is dead

richkit/retrieve/util.py contains code for fetching data from URLVoid service, and the test of it fails.
As it is not currently expose as methods under retrieve.* it seems to be incomplete, and should be removed from master until it has been completed and until it passes relevant tests.

This seems like it will also solve #52

Fix example from README.md: retrieve.symantec.LocalCategoryDB

$ ipython
Python 3.7.0 (default, Feb  4 2020, 14:16:38) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.12.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: >>> from richkit.retrieve.symantec import fetch_from_internet 
   ...: >>> from richkit.retrieve.symantec import LocalCategoryDB 
   ...: >>> 
   ...: >>> urls = ["www.aau.dk","www.github.com","www.google.com"] 
   ...: >>> 
   ...: >>> local_db = LocalCategoryDB() 
   ...: >>> for url in urls: 
   ...: ...     url_category=local_db.get_category(url) 
   ...: ...     if url_category=='': 
   ...: ...         url_category=fetch_from_internet(url) 
   ...: ...     print(url_category) 
   ...:                                                                                                               
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-4c50769c2b82> in <module>
      4 urls = ["www.aau.dk","www.github.com","www.google.com"]
      5 
----> 6 local_db = LocalCategoryDB()
      7 for url in urls:
      8     url_category=local_db.get_category(url)

~/git-reps/richkit/richkit/retrieve/symantec.py in __init__(self)
     49     def __init__(self):
     50 
---> 51         self.url_to_category = read_categorized_file()
     52 
     53     def get_category(self, url):

~/git-reps/richkit/richkit/retrieve/symantec.py in read_categorized_file()
    149     url_to_category = dict()
    150     if not os.path.exists(categorized_urls_file):
--> 151         open(categorized_urls_file,'w').close()
    152     else:
    153         with open(categorized_urls_file, "r") as ins:

FileNotFoundError: [Errno 2] No such file or directory: 'dat/retrieve/data/categorized_urls.txt'

define git hooks for better git branch names

There should some name conventions in order to get out from a situation where everything is messed up. This issue could be closed by creating customised git hooks which might run on server and client side.

New branches should be created from master branch
If there is no issue which points your dev intention, first create issue and define your intention
There should not be branch-to-branch (other than master) pull requests at least for time being.
The name of branches should match with the issues that the project has.
Branch names should contain issue number at the end otherwise that should NOT be accepted
A proper length of branch name should be between 18 and 25.
Example :
- For issue #4, the ideal branch name could be add-more-sources-#4

Some tests are leaving test data

In test stage, some test classes are leaving traces in data folder, see following SS, if you wish.

https://streamable.com/e/az1mfb

propose of Makefile

A Makefile could be useful to ;

create virtualenv for python
clean ( cleaning cache files which are generated from different cases )
linting
build
run test cases

In my opinion, it could be useful to have it, to have easy interaction with project. What do you think about it @kidmose and @gianmarcomennecozzi ?

clean up dependencies for http library

We are currently using multiple libraries for HTTP capabilities, which is unwanted complexity and dependency.

requests is believed to cover all our needs, be the easiest to use, and therefore the way to go.

Success criteria: We have removed wget and urllib* from requirements, and rewritten existing code to use requests (https://pypi.org/project/requests/)

ngram analysis test with dummy data

After reactivating richkit.test.analyse.test_analyse.TestAnalyse.test_get_grams_alexa_2ld the test running time gone up, likely because it relies on downloading the Alexa Top-1M.

Goal:

Establish if downloading of the Top-1M has a significant impact on the test running time (share findings here and if necessary they can also be discussed here) and
and, if found relevant, modify test to run with a dummy file (e.g. the alexa Top-100)

Inconsistent test cases

It is quite weird to see failure and pass cases when you have same commit id. I have tried to merge update_docs branch into develop branch, although it was only containing doc's updates, it is failed. So, i recover the commit which was passed before, however when I recover it, github actions started to fail. There is something wrong either in github actions or in our test cases. Needs to be investigated. Here is the screen shot what I am trying to say :

Does anyone has any idea about it ?

Publish on PyPi/make available through pip

In order to build reach, it is important that dat is easily acessible.

Note that https://pypi.org/project/dat/ exists, so we might need to publish as as e.g. "https://pypi.org/project/domain-analysis-toolkit/ (Free atm).

Only download maxmind databases if the current one is outdated

In richkit.lookup.{country|asn} the databases are downloaded from the web if missing, as intended.

However, it also seems to me that the databases are downloaded again every time richkit.lookup.util is loaded, regardless if the files for the databases were downloaded just recently, even if the current ones are still up to date with the ones available from MaxMind.

According to docs tempfile.mkdtemp() will ensure that a new, empty temp folder is used on every new load of the module, thus requiring a new download:

richkit/richkit/lookup/util.py

Line 10 in c1ea7bd

temp_directory = tempfile.mkdtemp()

This causes the following code, that is intended to reuse a local file if it is new enough, to not have any effect (Looking in a new, empty tmp dir everytime):

richkit/richkit/lookup/util.py

Lines 55 to 60 in c1ea7bd

    
           # check if the database is updated 
        
           if (int(calendar.timegm(time.gmtime())) - int( 
        
                   os.path.getctime(MaxMind_CC_DB.get_db_path(self)))) > self.three_weeks: 
        
               shutil.rmtree(self.path_db) 
        
               os.mkdir(self.path_db) 
        
               MaxMind_CC_DB.get_db()

Goal: avoid downloading a database if one that is up to date already has been downloaded.

Features from "On the ground truth problem of malicious DNS traffic analysis"

https://www.sciencedirect.com/science/article/pii/S016740481500125X

Fill out the TODOs in README

Currently, the README has a lot of TODOs.
They need to be replaced with the relevant content.

Incorporate features found in pydomain

We have an existing code repository named pydomain, including calculations of 16 features.
These are useful, but the existing code is hard to reuse.

We need to:

take the code for each of these features,
reimplement it in this repo,
add relevant unit tests,
and document the features in docstring.

The old code is found here:
https://github.com/aau-network-security/pydomain/blob/master/pydomain/pydomain.py#L698

Please state in the comments when you start working on one of them, and note that leaving TODO's for e.g. docstrings is ok when it is not obvious how to interpret the feature.

Expose new features added in #9 as `dat.analyse.*`

In #9 / #10 we added the code to calculate new features under dat.analysis.[util|segment].
Ideally, we want some simple access to analysis functions directly as dat.analysis.<FCN_NAME>, like seen in https://github.com/aau-network-security/domain-analysis-toolkit/blob/master/dat/retrieve/__init__.py#L10 , including a one line summary docstring.

Expected fail !

Time to time, google DNS servers (8.8.8.8 8.8.4.4 ) are changing their ASN, which ends up having failure in our tests, ideally, those tests could be skipped.

Fix `lookup.country(...)` and `lookup.asn(...)`

https://github.com/aau-network-security/richkit/runs/425212639#step:6:23

integrate logging

Removing print statement in package and integrating a logging system would be nice to inform user

is_outdated function fails

in test_util.py file is_outdated function is failing, needs to be fixed.

Anonymization IP address (Naked IPs project)

Create some code in order to make the IPs from the DNS and NetFlow logs.

Add more sources to dat.retrieve

The is ideas for stuff to add under dat.retrieve:

urlvoid.com (all of the 36 services checked?)
Virus total
WHOIS (suggestion: pywhois)
github.com/aau-network-security/kraaler

Services on top of HTTPS Certificate Transparency logs
https://www.lucidchart.com/documents/edit/e1fc4013-d4db-41d5-ba47-01ac9ae19fa0/0_0

Reproducible local testing

We'd like to run test locally in docker, exactly like it runs on Github Actions.

Remove log message clutter

Currently, importing the three submodules produces multiple log messages that are not relevant (See bottom)
This is in conflict with good design principles, like Raymod states in "The Art of UNIX Programming":

Don't clutter output with extraneous information

Goal: When using richkit, without having taking any steps to increase log verbosity, the only log messages printed are warnings, when there is something the user really needs to now, and errors, when a given operation fails.

user@host:~/git-reps/richkit$ ipython
Python 3.7.0 (default, Feb  4 2020, 14:16:38) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.12.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from richkit import lookup, analyse, retrieve                                                                                                         
02-05 13:03 urllib3.connectionpool DEBUG    Starting new HTTPS connection (1): publicsuffix.org:443
02-05 13:03 urllib3.connectionpool DEBUG    https://publicsuffix.org:443 "GET /list/effective_tld_names.dat HTTP/1.1" 200 None
02-05 13:03 richkit.analyse.segment INFO     Fetching one gram file from gist ...
02-05 13:03 urllib3.connectionpool DEBUG    Starting new HTTPS connection (1): gist.githubusercontent.com:443
02-05 13:03 urllib3.connectionpool DEBUG    https://gist.githubusercontent.com:443 "GET /mrturkmen06/d9d5f8bc35be8efd81c447f70ca99fbf/raw/cfa317d7bce53ba55ca8f9bf27aa3170038f99cf/one-grams.txt HTTP/1.1" 200 4956240

In [2]:

Set up documentation using sphinx

Do something like https://medium.com/@eikonomega/getting-started-with-sphinx-autodoc-part-1-2cebbbca5365
so we eventually can publish documentation when we get there.

This implies that we will use docstring for documenting code.

Test dat.analyse for ability to correctly extract effetive 2LDs

When extracting effective 2LDs (aka apex domains aka <label>.<public suffix>) we need to ensure that this is done correctly.

This url contains a lot of test cases:
https://raw.githubusercontent.com/publicsuffix/list/master/tests/test_psl.txt
that is intended to cover the Mozilla Public Suffix List:
https://publicsuffix.org/list/

Implement a test that ensures compliance on the test set.

Use unittest.TestCase.assert* in TCs

We currently have some test cases under ./richkit/test/ that use the assert, which intended for debugging and is not intended for use in the unittest framework, which we use.

These should be changed to self.assert* (Where self is a unittest.TestCase).

AttributeError: module 'dat.retrieve.symantec' has no attribute 'refetch_from_internet'

I see the following error when trying to use dat.retrieve.symantec_category:

(dat) egk@egk-ThinkPad-T450s:~/git-reps/dat$ git branch -v
  develop                    94e4a0d Ideas in comments moved to Issue #4
  feature/docstring-refactor 870f1a9 refactor dat.retrieve.symantex to docstring + expose simple function
* master                     e8365c2 Merge pull request #13 from aau-network-security/clean-up-requirements-#11
(dat) egk@egk-ThinkPad-T450s:~/git-reps/dat$ python
Python 3.6.5 (default, Sep 19 2019, 13:56:05) 
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dat.retrieve
>>> dat.retrieve.symantec_category('google.com')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/egk/git-reps/dat/dat/retrieve/__init__.py", line 12, in symantec_category
    return symantec.refetch_from_internet(domain)
AttributeError: module 'dat.retrieve.symantec' has no attribute 'refetch_from_internet'

We need 1) a test case to catch this and 2) a fix.

remove pytest and coverage from requirements.txt

pytest and coverage are not (should not be) needed to use richkit, so they are to be removed

implement max-mind db into dat.lookup

Max mind db will be introduced under dat.lookup

https://richkit.readthedocs.io/en/latest/ not available

I was hoping to access the richkit documentation at https://richkit.readthedocs.io/en/latest/ but I get an error:

    \          SORRY            /
     \                         /
      \    This page does     /
       ]   not exist yet.    [    ,'|
       ]                     [   /  |
       ]___               ___[ ,'   |
       ]  ]\             /[  [ |:   |
       ]  ] \           / [  [ |:   |
       ]  ]  ]         [  [  [ |:   |
       ]  ]  ]__     __[  [  [ |:   |
       ]  ]  ] ]\ _ /[ [  [  [ |:   |
       ]  ]  ] ] (#) [ [  [  [ :===='
       ]  ]  ]_].nHn.[_[  [  [
       ]  ]  ]  HHHHH. [  [  [
       ]  ] /   `HH("N  \ [  [
       ]__]/     HHH  "  \[__[
       ]         NNN         [
       ]         N/"         [
       ]         N H         [
      /          N            \
     /           q,            \
    /                           \

Could be related to the need to do a release, as mentioned in #6 ?

Also the build seems to have failed: https://readthedocs.org/projects/richkit/builds/

Linting before commit

It is annoying to see linting error after changes have been made, integrating it into existing githook is nice way to prevent it, I think.

There are skipped tests

Get an overview of why we have 4 tests that are skipped, create relevant issues, and address them.

richkit/test/test_analyse.py ...ss................                       [ 61%]
richkit/test/test_lookup.py ..                                           [ 67%]
richkit/test/test_retrieve.py s...s                                      [ 82%]
richkit/test/test_util.py ......                                         [100%]

ensure examples in readme and docstring work

Handle URLVoid changing number of blacklists

URLVoid appears to have changed the number of blacklists , leading to failed test:

https://github.com/aau-network-security/richkit/pull/71/checks?check_run_id=452081551#step:6:23

____________________ URLVoidTestCase.test_blacklist_status _____________________

self = <richkit.test.retrieve.test_urlvoid.URLVoidTestCase testMethod=test_blacklist_status>

    def test_blacklist_status(self):
        for k, v in self.test_urls.items():
            instance = URLVoid(k)
>           assert instance.blacklist_status() == v["blacklist_status"]
E           AssertionError: assert '0/34' == '0/36'
E             - 0/34
E             ?    ^
E             + 0/36
E             ?    ^

richkit/test/retrieve/test_urlvoid.py:79: AssertionError

We need to make the code and tests to be robust to such changes as we have seen them introduce noise before (E.g. #70 )

Get an overview of missing tests

It seems that we haven't implemented tests for all the methods found under lookup, analyse and retrieve submodules, with analyse.n_grams_alexa being one example I encountered.

Task:

Identify all method in the __init__.py files of each of the three submodules,
Get an overview of which ones aren't tested
Create individual issues on github for each function that is not currently tested

In [1]: from richkit import lookup, analyse, retrieve                                                                                                                                                                                         

In [2]: analyse.n_grams_alexa('example.com')                                                                                                                                                                                                  
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-4-8d8491ba670c> in <module>
----> 1 analyse.n_grams_alexa('example.com')

~/git-reps/richkit/richkit/analyse/__init__.py in n_grams_alexa(domain)
    225 
    226     """
--> 227     return analyse.get_grams_alexa_2ld(domain)
    228 
    229 

~/git-reps/richkit/richkit/analyse/analyse.py in get_grams_alexa_2ld(domain, analyzer, ngram_range)
    153         :return: grams of second level domain
    154 	"""
--> 155         alexa_slds = load_alexa()
    156 	alexa_vc = CountVectorizer(analyzer=analyzer,
    157                                                            ngram_range=ngram_range,

~/git-reps/richkit/richkit/analyse/util.py in load_alexa(limit)
     64     alexa_domains = set()
     65     path = "top-1m.csv"
---> 66     with open(path) as f:
     67         for line in f:
     68             line = line.strip()

FileNotFoundError: [Errno 2] No such file or directory: 'top-1m.csv'

Fix pytest warning on analyse.util.TestEffect2LD

It seems that there is some test code in analyse.util, which is not collected by pytest beacuse it is not structured the right way, cf. https://github.com/aau-network-security/richkit/runs/435904430#step:6:17

Also, testing code should go into richkit/test/.

Please move the testing code and make sure it runs under pytest/the Github actions testing.

Make sure that test coverage does not go down as richkit expands

add coverage check for test cases

Coverage report could be useful for sake of keeping code base covered as we have decided in last meeting with Egon.

Make whois independent of linux and external binary

As per requirements.txt we are currently using the whois module;

Python wrapper for Linux “whois” command

This is expected to fail when whois is not installed/available on $PATH, and also when running on other OSs. I think the first case is what we see here:
https://github.com/aau-network-security/richkit/runs/531202016#step:5:191

In order to remove the tie to a specific OS and avoid being dependent on an external binary I suggest we move to a python implementation of whois.

I've previously worked with python-whois and experienced that to work nicely and be extensible (I added parsing for .dk whois), is I suggest that, but other alternatives might exist.

Goal: Unskip richkit.test.retrieve.test_whois.WhoisTestCase and make sure it passes the current tests.

Release error

Error on releasing new version to python package

I am checking it...

analyse.tld does not scale linearly

The runtime complexity of some of the functions may be prohibitive. Consider that of richkit.analyze.tld() as an example. Is this really due to the fact that TLDs are intrinsically difficult to compute (e.g., by accounting for examples such as *.co.uk) or could this be streamlined?

The output of the attached code (see below) is as follows:

1 domains processed:
split(): 0.0009176731109619141 s
Richkit: 0.017614364624023438 s

10 domains processed:
split(): 0.0008223056793212891 s
Richkit: 0.1530759334564209 s

100 domains processed:
split(): 0.0008115768432617188 s
Richkit: 1.5235605239868164 s

1000 domains processed:
split(): 0.001218557357788086 s
Richkit: 15.542202234268188 s

Benchmarking code: runtime.py.txt (remove .txt extension)
Data: domains.csv.txt (remove .txt extension)

Review documentation

https://richkit.readthedocs.io/en/latest/ has multiple errors: "ricikt" (missing h) and reference to the early name ("DAT").

Please review the produced documentation for these and other errors.

Publish documenation on https://readthedocs.org/

In order to build extend reach, I suggest the set up the continous delivery of the documentation to https://readthedocs.org/

HTTP CT logs features

@kdhageman has done some work on the area of HTTPS Certificate Transparency logs, and also has a script that extract some features for a domain.

We want those reimplemented in richkit (as a first iteration) such that richkit has a function for each feature, that given a domain name will return the value for the feature.

The script is likely based on a API/data source at Censys, which is very batch oriented (Along the lines that a batch, whether for 1 or for 1000 domains, has a fixed price). It seems likely that https://crt.sh/?q=example.com is a better candidate for richkit for now.

Not knowing the state or nature of the script, it might be necessary to analyse it to understand each feature and reimplement it from scratch here, but I'm sure Kaspar can provide some advice.

This is done when richkit has a method for each of the features, with the documentation and testing to with it.

@kdhageman : If you don't get arround to push the script to a repo, then perhaps you can share the current version here?

Select license

We need to select which license this is to be published under.

I see the goals of this as enabling adopters and contributors to easily;

Know how they can use this tool, including rights for derivative work.
Know how they can contribute, what rights they retain on their contribution, and what rights they must be ready to waive.

My initial idea is GNU GPL, but I'm also interested in inputs.

Streamline Maxmind licensing

When Maxmind (the company providing the resources backing some of the functions in lookup) changed their API from anonymous HTTP download to a HTTP download where a (free) license key is required, a quick fix was implemented so that the key was read from the MAXMIND_LICENSE_KEY environment variable.

This is lacking at least the following:

Documentation on where and how to obtain the license
Documentation on how to configure the license
Handling of missing license, e.g. such that all other parts of richkit, in particular richkit.lookup, still is functional when the key is not configured.
relevant test cases for the step above

And there might be more to add.

Solve pydig issue that makes test failing

pydig is a python wrapper library for the 'dig' command line tool. This means that this library will not work in all machines in which dig is not installed.
In order to make the domain analysis toolkit portable and easy to install it's necessary to address this issue.

This is the reason why the tests are failing https://github.com/aau-network-security/domain-analysis-toolkit/pull/25/checks?check_run_id=265618278#step:6:44

Clean up requirements.txt

Currently requirements.txt seems to include more entries than necessary to use the library.

Please remove all unnecessary entries from the file.

	# check if the database is updated
	if (int(calendar.timegm(time.gmtime())) - int(
	os.path.getctime(MaxMind_CC_DB.get_db_path(self)))) > self.three_weeks:
	shutil.rmtree(self.path_db)
	os.mkdir(self.path_db)
	MaxMind_CC_DB.get_db()