Code Monkey home page Code Monkey logo

cs229_ai_fight_financial_crime's Introduction

cs229_ai_fight_financial_crime

CS229 Final Project: An AI approach to fighting financial crime using the UK companies register

  1. Corporate informations are downloaded from Company `house, the UK companies registrar.

  2. Data from companies is of poor quality and required corrections: (i) names of countries are not standardized and sometimes name of cities or postcodes are entered instead. Use NLP to recognize countries names and clean-up the database. (ii) identical names of individuals and corporate entities are treated wrongly as separate entities in the Company House Database, requiring multiple searches to take place to correct this issue.

  3. We convert the raw inputs of the model (nationality and status corporate/individual of the officers and the beneficial owners) for each corporate using a bag of words approach into a vector of size 492, representing (245 countries+ one unknown)* 2. We multiply by 2 to represent an individual officer and a corporate officer.

Given the number of countries, we also created two other vectors: (i) a vector with regions and also included the European Union list of tax havens (American Samoa, Belize, Dominica, Fiji, Guam, Marshall Islands, Oman, Samoa, Trinidad and Tobago, United Arab Emirates, Vanuatu, US Virgin Islands the "Blacklist") as at 17 May 2019: https://ec.europa.eu/taxation_customs/sites/taxation/files/eu_list_update_17_05_2019_en.pdf (ii) a vector with a mix of regions and individual countries (aggregating Africa, Antarctica, North America, South America, Asia, other Oceania and Middle-East), and providing detailed countries for Europe, Eastern Europe and Central Asia, and Caribbean.

  1. Using Snorkel, we created four labeling functions and generated the final label for each data point.

  2. The dataset has been split between train 90%, dev 5%, and test 5%.

  3. The following models have been applied on the train and dev datasets. -SVM: sigmoid, gaussian, 5th and 10th degree polynomial kernels -Logistic regression -NN: containing just fully connected layers with 2, 5 and 10 hidden layers -CNN: LeNet-5 variation for 1D data

  4. The most promising models were CNN, 2 hidden layers NN and SVM/Logistic regression

  5. Discussion and conclusion We could show that a model can detect patterns in the data indicating companies andlimited partnerships more likely to be involved in money-laundering using the residence and legalpersonality (individual or corporate) of its officers and beneficial owners.

cs229_ai_fight_financial_crime's People

Contributors

chesnay avatar sebastianhurubaru avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

hieuqtran

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.