Code Monkey home page Code Monkey logo

filingsdb's Introduction

FilingsDB: Financial Statement and Notes Data Sets from sec.gov

The Financial Statement and Notes Data Sets provide the text and detailed numeric information from all financial statements and their notes. This data is extracted from exhibits to corporate financial reports filed with the SEC using eXtensible Business Reporting Language (XBRL).

Description

A small golang script to download financial statements and notes to a local sqlite database from sec.gov

Requirements

  • Go & Go modules
  • Sqlite

Installation

Clone this repository, then run

$ go install

to compile the source to an executable in the bin/ directory.

Usage

$ ./bin/filingsdb
Usage: filingsdb <year>

Give it a year and the script will download and store the data to a local filings_$YEAR.db sqlite database. This can take a while as there's a lot of data to ingest (the 2019 database clocks in at 16G)

Database schema

The DB schema (tables, columns and types) follows the structure outlined in the dataset official pdf documentation. The script also builds a convenient ticker <> cik table to make querying easier via join. Use this table with caution, as it's a snapshot of today's data. In the past a given ticker could potentially map to a different cik.

The submissions table (data_sub) contains one entry per submission. A filing's Accession Number (or adsh) is the main identifier used to join other facts tables.

Sample queries

Facebook's 10-K Accession Number

select data_tickers.*, data_subs.adsh, data_subs.form, data_subs.accepted
from data_subs 
left join data_tickers on data_subs.cik = data_tickers.cik
where data_tickers.ticker = "FB"
and form = '10-K';

Numbers from the filing

select *
from data_subs
left join data_nums on data_nums.adsh = data_subs.adsh
where data_subs.adsh = "0001326801-19-000009";

Identify companies in a given SIC (here biotech related)

select data_subs.adsh, data_subs.accepted, data_tickers.ticker , data_txts.tag, data_subs.form, data_subs.sic, data_subs.cik, data_txts.ddate, data_txts.value
from data_txts 
left join data_subs on data_txts.adsh = data_subs.adsh
left join data_tickers on data_tickers.cik = data_subs.cik
where data_txts.tag IN ("NatureOfOperations", "BusinessDescriptionAndBasisOfPresentationTextBlock", "BusinessDescriptionAndAccountingPoliciesTextBlock", "OrganizationConsolidationAndPresentationOfFinancialStatementsDisclosureTextBlock",
"OrganizationConsolidationAndPresentationOfFinancialStatementsDisclosureAndSignificantAccountingPoliciesTextBlock",
"OrganizationConsolidationBasisOfPresentationBusinessDescriptionAndAccountingPoliciesTextBlock"
) 
and data_subs.sic IN ("2834", "2835", "2836", "8071", "8731")
and data_subs.form = "10-K"
order by data_subs.accepted;

Welcome to the messy nature of XBRL. Different filers sometimes use different tags for the same thing.

License

MIT

filingsdb's People

Contributors

edouardswiac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

filingsdb's Issues

Histogram of holdings

Hi,
Nice code, what I want to be able to do is initialize a database with the latest quarter of 13F-HR filings and on a rolling basis once a week, update the database. What I want in the end is a histogram of the holdings, to see what most hedge hold and what are the ones not as popular. How easy to hack your code to do this?

Eventual data might looks like this from 500 sample filings

AAPL 400
MSFE 490
SPY 390
...
GE 100
etc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.