The regulatorycomplexity from cogeorg

Title by title

Make sure experiment appears correctly for various screen sizes

Bootstrap should take care of that, but we are not sure it really does. Would be great if you could test this somehow.

Fix the txt files with the proper string operands.

Add the missing words due to nested operands in the txt files

Dash Error

For the Halstead Approach, the dashes that were causing issues (in not being able to classify words prior to them) have now been removed but the inability to classify these words still remains.

There is a bunch of text in DODDFRANK.txt that I don't think should be there. Take for example the string:
anorris on DSK5R6SHH1PROD with PUBLIC LAWS SEC. 327. IMPLEMENTATION PLAN AND REPORTS. Consultation. 12 USC 5437.
and
21:17 Aug 02, 2010 Jkt 089139 PO 00203 Frm 00163 Fmt 6580 Sfmt 6581 E:\PUBLAW\PUBL203.111 PUBL203
which I think come from page breaks. They are not exactly identical for every page break, which makes it tricky to find them. But perhaps you can find a good regexp to get rid of them.

Enter variations of already categorized words

For instance "imply, implies, implying, implied" or "bank, banks", "agency, agencies" etc.

Is there a way to automatize this? Thesaurus crawling?

Improve 030_create_visuals.py

The current ./100_code/python/030_create_visuals.py creates a simple visualization of the Dodd-Frank act using a single Title and a set of files which contain different types of words.

Improve the visualization so that the structure of the regulation is kept (e.g. using your xml script). This should be displayed in .html
Then, allow people to add words to different categories (the files in 020_auxiliary_data/Sections/619/). Ideally, this would be possible by someone selecting a word and then right-clicking on it to get a menu with each file name as entry so that the word can directly be added (the script should add the word to the respective file).

Download and prepare raw data

Download the raw data for various regulatory texts from:
https://sites.google.com/site/unsharedtask2014/
and add it to the dropbox in ./001_raw_data/ in a structured way. Manually inspect the texts, starting with the Dodd-Frank act to get an idea how regulation documents can look like.

Create a code to highlight words that are already categorized

One color for each category

Apply to Dodd-Frank

Comparison of Basel III and CRD IV

Delete the reference to excel template

From landing page and /experiment page

Refactor xlm_parser

Rename it properly to xml_parser
Have the file_name be passed as a command-line argument
Write a .sh bash script that passes the respective command-line arguments
Have the output file be passed as a command-line argument
Turn the Readme.rtf into a proper .txt file. There are lot of special characters in the file right now.
Make sure to use 'html' throughout, not 'htm'

Proper identification of string operands

Identify and highlight the proper string for each operand. There are not nested operands.
For example:

Advisers Act of 1940 - Legal Operand, True
Advisers Act -of- Grammar operand 1940 - Legal Operand, False

Instead of "Enter answer" write "Please enter the bank's total risk weighted assets for this regulation"

Complete the word list for Basel I/II/III

Balance sheet fixes

Get rid of extra space on the right;
Use a larger font;
Use the same font everywhere;
Write below balance sheet "EUR is the domestic currency and USD is a foreign currency"
Delete "(national currency)" and "(foreign currency)" everywhere
Show subcategories (see excel I forwarded, this is how it should look on the website as well, including horizontal line to better differentiate the various balance sheet positions)

Halstead measures applied to Dodd-Frank

Use the Halstead (1976) measures of complexity and apply them to the Dodd-Frank regulation text.

Read introductory literature

Halstead (1976)
McCabe (1977)
Haldane (2012)
Li and Azar (2015)
Celerier and Vallee (2015)

Complete the word list for Dodd-Frank

Fix parser shell script

the shell script is not working since you forgot to update the link. Always run code once before submitting it to make sure it works.

c836f8f

Standardize strings

Delete all the punctuation, convert new line markers into spaces

Apply to Dodd-Frank

Compute Halstead measures, several variations

Operands can be: "economic operands" / everything that is not an operator / the terms with a separate definition, as identified by Ali
Operators can be function words / regulatory operators / legal operators / logical operators / combinations of the last three categories
We can compute volume, length, level, repetition of operands, unnecessary operators, difficulty, effort
Apply to Dodd-Frank, section by section. Correlation matrix between the different variations on operators and operands.
To be done once the counting of all word categories is done.

Count words in different categories in Dodd-Frank

For each section of Dodd-Frank, count the number of words/expressions in each category.

Get all the Legal References from Dodd-Frank

Create a python program to get all the Legal References from the Dodd-Frank xml file.

Recognizing logical structures

Once we have categorized the different words, could we go further, define and recognize patterns in the text ?

For instance sequences such as: "economic operand" - "regulation operator" - "economic operand" - "attribute/operand" - "logical operator" - "economic operand" etc.

e.g.: "a bank" - "should not" - "engage in" - " proprietary trading" - "unless" - "the regulator" - "is okay with it"

More like a long-run thing.

Give feedback at the end

Give feedback at the end: number of correct answers, total time taken, how many people did better

Questions on code

In RegulatoryComplexity/100_code/python:

What is the difference in 010_split_sections-DF.py and 011_analyze_sections-DF.py?
What happens in 020_compute_statistics_sections-DF.py ?

In RegulatoryComplexity/100_code/shells:

Why does 003_parser_xml.sh use HTML input?

In RegulatoryComplexity/050_results/DoddFrank/Visuals/VIsualizer_Versions/V8_visualizer/app:

What does tabledef.py create? Database with registered users?

Generally:

Please use more comments and documentation.

Comparison of different versions of Dodd-Frank

Requires to retrieve the different versions, obviously.

Write a clear definition of the different word categories

Expressions within existing expressions

At the moment, if someone classifies e.g. 'of', all instances of 'The Banking Act of 1956' etc. are changed in the visualization. This can be quite confusing. Perhaps the visualizer could be changed such that only unclassified words can be changed in an update. This would also solve the issue of longer versus shorter classified words.

Comparison of Basel I/II/III

Also by section, e.g. compare Basel I to capital regulation in Basel III

Include a hall of fame

Create a simple hall of fame where user times and number of correct answers is shown. This is not super trivial, I think, and we will likely have to have a call about it. For now, it would be fine if you can simply figure out exactly where the results from individual users are kept and prepare a very basic hall of fame mockup that eventually should read this data.

Comparison of original Acts and acts once amended by DF

Requires to retrieve the original acts.

Ask subjects to solve a first example before they start.

If they don't come up with the right solution explain the likely problem and let them compute again, until they find the right solution. You're not expected to create the example, of course, but we need one /experiment page that is slightly different from the others in that it shows a specific example which for now you can take as any of the examples that are being loaded. We want users to demonstrate that they understand what they need to do and hence the page should check whether a user has provided the correct answer (e.g. "42") and only advance them to the experiment stage if they got it right.

Comparison of Dodd-Frank titles

Register regulatorycomplexity.net

Standardize Literature

Ensure all literature in the Dropbox folder ./500_literature/ is named in the same way, i.e. Author1Author2YYYY-ShortTitle-journal.pdf . You can use abbreviations, such as JF (Journal of Finance), AER (American Economic Review), ECTA (Econometrica), etc.

Different types of word classifiers

We should have different types of word classifiers. One 'protected' list which is the approved list which with a user starts a new session. The other list of words is the user specific list. Users should be able to switch between these two lists, e.g. by selecting a switch on the right somewhere.

Find other regulation documents

Starting with the US, identify key regulation documents and find their sources. Download them and translate into .txt or .xml files.

Certain words not classifying

In the Halstead Approach, particular words such as 'financial', 'term', 'dealer', 'providing' are unable to be classified. This relates specifically to title 8 but the problem exists for all titles. When we classify one of these words it gets highlighted but once the document is updated the word is no longer classified.

Also when the document is updated, the number of classified words seems to change (i.e. sometimes a particular word will be classified and other times not).

cogeorg / regulatorycomplexity Goto Github PK

regulatorycomplexity's People

Contributors

Stargazers

Watchers

regulatorycomplexity's Issues

Recommend Projects

Recommend Topics

Recommend Org