Code Monkey home page Code Monkey logo

koobzaar / newcomb Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 3.0 21.27 MB

[📊] Statistical analysis of over one million lines of death toll data released by countries around the globe. We then applied our algorithm and compared it to the curve of Benford's Law to detect under-reporting and false death data.

Home Page: https://koobzaar.github.io/Newcomb/

CSS 0.05% HTML 0.08% JavaScript 99.88%
benfords-law benford benford-analysis benford-compliant covid-19 covid19-data covid-analysis cov-bnf

newcomb's Introduction

Newcomb - aka. Benford Research

Initiative that is dedicated to statistical analysis of public data obtained from various Health Ministries, aiming to identify distortions in the mortality numbers of the population through the rigorous application of Benford's Law. The collection and processing of public data is carried out using data mining techniques, and the results obtained are subjected to statistical analysis for inferring relevant information related to the distortion of mortality indices.

Benford's Law

Benford's Law is a statistical distribution that describes the relative frequency of digits in a series of numerical data. It is based on the mathematical property known as the "Logarithmic Scale Property". The probability distribution of Benford's Law is given by the following equation:

Equação

where d is the digit considered. This equation indicates that the probability of a digit d appearing as the first digit of a number is given by the base 10 logarithm of (1 + 1/d). For example, the probability of the digit 1 appearing as the first digit is 30.1%, while the probability of the digit 9 appearing as the first digit is 4.6%.

By applying Benford's Law to sets of data, it is possible to evaluate the compliance of these data with the expected probability distribution, through distance measures, such as the similarity coefficient.

Similarity Coefficient

The Benford distance (also known as the similarity coefficient) is a measure used to compare the distribution of digits of a set of data with the expected distribution according to Benford's Law. This law states that smaller digits (specifically, digits 1 to 9) occur more frequently in the significant digit position of real numbers.

The equation to calculate our coefficient is:

Equação

OWhere:

  • D is the calculated Benford distance
  • d(i) is the relative frequency of digit i (i.e. the frequency of digit i divided by the total number of digits) in the data set
  • log10(1 + 1/i) is the expected frequency of digit i according to Benford's Law
  • The Σ symbol indicates that the sum is taken over all digits (i = 1 to 9)

The Benford distance is a measure of similarity between the digit distributions. The smaller the value of D, the greater the similarity between the digit distributions. D values close to zero indicate that the digit distribution of the data set is consistent with Benford's Law, while larger D values indicate a possible distortion in the data.

The use of the similarity coefficient given by the Benford distance aims to verify the consistency of the data, and can be used to identify possible distortions in the data, such as cases of fraud or human error in data collection.

Gallery

The images presented are visual illustrations of the results obtained through the application of the Benford's distance analysis method, also known as the similarity coefficient. They provide a graphical representation of the data collected from the Ministries of Health of various countries, allowing for easy understanding of mortality rates and possible distortions found. It is important to note that these images were generated from public data and are only an auxiliary tool in analyzing the results and should not be considered as conclusive evidence.

Introdução Resultados Resultados Resultados

Author

The author of this project is Bruno Trigueiro, currently affiliated with the São Paulo State Technological College (FATEC). At the time of this project, the author was affiliated with the Federal University of Rio de Janeiro (UFRJ). The author can be contacted through the email addresses [email protected] or [email protected]. This project was not funded by any financial recognition and there are no other authors involved. It is important to note that the information and results presented in this project are of an academic nature and should not be interpreted as proven scientific facts.

newcomb's People

Contributors

koobzaar avatar rodrigx16 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.