Code Monkey home page Code Monkey logo

lorae / roundup Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 1.0 13.97 MB

Web scraper which aggregates pre-print academic economics papers from 20+ sources; presents titles, abstracts, authors and hyperlinks on an online dashboard. Auto-updates daily.

Home Page: https://roundup.streamlit.app/

License: MIT License

Python 100.00%
economics streamlit macroeconomics microeconomics api-scraping html-scraping selenium streamlit-dashboard streamlit-webapp web-scraping

roundup's Issues

Improve `append_data_to_historic` exception handling

The function append_data_to_historic within the HistoricDataComparer class contains a try-except block. The except is specifically a FileNotFoundError, which can introduce issues if the file is found but does not comport to the data standards assumed in the try block.

Create more graceful, generic error handling and a specific version of error handling when the file exists but does not comport with data standards. Once this is complete, remove the print statements (e.g. "existing_df read", "column order mapped", etc).

repair Fed_Boston.py

Fed_Boston.py is not able to navigate to 2024 publications. Adjust the web scraper accordingly. Explore API options.

Repair ECB 'abstract' data collection

Current ECB 'Abstract' field contains keyword data, like this:

Image

Ensure ECBScraper class gathers correct Abstract data field, and edit existing database for already collected entries.

repair IMF.py

Abstract data Xpath index is not working. Revisit this script and potentially use an API instead

Investigate StreamLit handling of text strings with dollar signs

The following text appeared incorrectly on the StreamLit app:

"Two in five Americans have medical debt, nearly half of whom owe at least $2,500. Concerned by this burden, governments and private donors have undertaken large, high-profile efforts to relieve medical debt. We partnered with RIP Medical Debt to conduct two randomized experiments that relieved medical debt with a face value of $169 million for 83,401 people between 2018 and 2020. We track outcomes using credit reports, collections account data, and a multimodal survey. There are three sets of results. First, we find no impact of debt relief on credit access, utilization, and financial distress on average. Second, we estimate that debt relief causes a moderate but statistically significant reduction in payment of existing medical bills. Third, we find no effect of medical debt relief on mental health on average, with detrimental effects for some groups in pre-registered heterogeneity analysis."

Image

Something is happening to make it italicized unintentionally.

Create Jupyter web scraper module debugging tutorial

Add to a new directory called "docs" or something similar. Walk through the process of resolving a broken script and describe common issues.

It would be especially cool if the notebook could show code from specific scripts within the project without copy-pasting the code (e.g. dynamically pulling code from a different file). I am not sure if this concept has an already built-out method, but it would be nice, especially when producing call-outs to explain how the GenericScraper ABC relates to the concrete scraper classes.

create global list of scraper IDs to be used in compare.py, streamlit_app.py, and runall.py

Currently, there are three lists of scraper IDs, each contained in compare.py, streamlit_app.py, or runall.py, which look like the following:

source_order = ['NBER', 'FED-BOARD', 'FED-BOARD-NOTES', 'FED-ATLANTA', 'FED-BOSTON', 'FED-CHICAGO', 'FED-CLEVELAND', 'FED-DALLAS', 'FED-KANSASCITY', 'FED-NEWYORK', 'FED-PHILADELPHIA', 'FED-RICHMOND', 'FED-SANFRANCISCO', 'FED-STLOUIS', 'BEA', 'BFI', 'BIS', 'BOE', 'ECB', 'IMF']

Integrating a new scraper module in the project involves updating all three lists, which is unintuitive and prone to producing bugs.

Resolving this issue would involve creating a global order of sources that can be called by all 3 scripts.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.