Code Monkey home page Code Monkey logo

requiam_csv's Introduction

About Us

Research Engagement (RE) helps UA faculty, researchers, and students with every stage of their research. We have three functional units:

  1. Research Incubator
  2. Data Cooperative
  3. Scholarly Communications

Members of the Data Cooperative:

GitHub Name Role/Title
Jeff Oliver Jeff Oliver Data Science Specialist
Fernando Rios Fernando Rios Data Management Specialist
Kiri Carini Kiri Carini GIS Specialist
Jonathan Ratliff Jonathan Ratliff Research Data Repository Assistant

Experience

Specialty Tools/Resources
Data Science R langPython Software Carpentry
Data Management & Publishing Data Management (DMPTool)Open Science FrameworkFigshare
GIS GISESRI

requiam_csv's People

Contributors

astrochun avatar damian-romero avatar yhan818 avatar zoidy avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

requiam_csv's Issues

Org Code bug (must be four letters)

The original UA Org Codes sheet is missing any leading 0's for 3-digit code (e.g., 0410 shows up as 410). This breaks things on the patron side as an EDS query of 410 returns nothing.

Minor and major changes for v0.9.0

Minor

  1. Add the Google Sheet to the README.md file [d1d8124]
  2. Describe in README.md the different version of the CSV file [916ffed]
  3. Update version number to 0.8.0 (mistake earlier) [ced16b8]
  4. Update version number to 0.9.0 [2c50277]
  5. Fix typo in README for # reference (#installation-instructions) [efc9fe0]
  6. Update workflow description (still needs a run to improve) [5c9e190]

Major

  1. Commit research_themes.csv [2362f13]

Logging to stdout and file

A logging-based object that handles the messaging to stdout and a logfile

Logs should be placed in a logs/ folder relative to path of execution. The logfile filename should include an ISO-formatted date.

The logfile prefix should be specified in default.ini

PEP8 compliance and documentation

General issue for best practices. This ticket will remain open indefinitely.

Update README to include:

  1. Link to Google Sheet
  2. Link to output CSV file

Dry-run option

An arg option to generate the CSV file but not overwrite the default existing CSV file ('data/research_themes.csv')

Changes are implemented in feature/dry_run branch.

Update README.md to describe dry run and full execution.

Update default.ini to use a 'data/' path for output

Note: Fast-forward version to 0.7.0

Use GitHub Actions for CI

Summary (REQUIRED)
With CI build tests already in place with Travis CI, it's relatively straightforward to migrate to GitHub Actions.

Motivation (REQUIRED)
Because of the change in pricing model by Travis CI for open-source software, it becomes cost prohibitive to use Travis CI moving forward when credits run out.

Objectives (REQUIRED)
CI build passes much like how it has occurred for Travis CI.

Proposal (REQUIRED)

This doc page and the GitHub Action YAML is a starting point:
https://docs.github.com/en/free-pro-team@latest/actions/guides/building-and-testing-python

We can start with this but not include python 2.7.
Will need to include other dependencies such as pytest-cov.
Best effort to resemble the .travis.yml should be considered

We should require a skip if "ci skip" or "skip ci"
We should also look at excluding certain files, such as README

Testing notes (Optional)

This might be useful for testing purposes before committing the github actions:
https://github.com/nektos/act

Action items:

  • Create a python-package GitHub Actions workflow
  • Add option to prevent skip CI with "ci skip" or "skip ci. See this
  • Disable certain files
  • GitHub Actions badge in README.md
  • Handle paths for output file for CI build

Implemented in: feature/gh_actions_build_test

Update scripts to use f-strings for clarity

Currently, scripts are using .format() for strings.

While this ok, the pythonic way prefers f-strings for their readability
and several other reasons. (See Item 4 in Effective Python)

Example:

log.info("   {} : {}".format(dept.loc[bb], bb + off))

## Change to:

log.info(f"   {dept.loc[bb]} : {bb + off}")

This can be a good first issue implemented as a feature for the next release.

Move UAL departments into University Libraries portal

Summary

The University Libraries portal was created for purchased datasets. Through discussion, we decided to have all University of Libraries employees under this portal. Below summarizes the steps needed to create a main portal or any portal (e.g., sub-portal).

Objectives

Grouper group membership updated for UAL members

Proposal

Here are the steps to implement the change:

Figshare/ReDATA:

  • Create portal on ReDATA (done some time ago)
  • Update Figshare settings for this group to use libraries as the UserAssociationCriteria (done some time ago). I might have done ual but since this is set, this should be fine.

Google Sheet:

  • Update "Main Portals" spreadsheet to include "University Libraries". The portal name is libraries
  • Update "Arizona Research Themes" spreadsheet to include "University Libraries" section and include UAL org codes

ReQUIAM:

  • Run ReQUIAM's add_grouper_group script with --main_portal and --add to create the new group on both figstest and figshare stems
    • Confirm on Grouper UI that the Grouper groups were created, portal:libraries

ReQUIAM_csv:

  • Checkout develop and run script (dry run) to check the result of the output
  • Do a complete run and commit the code and create the PR
  • Review the changes before merging it into master

Deployment:

  • Re-run run_script using the bash alias with --portal and --sync. This should trigger on the new group to include.

Testing notes

  • After change is implemented, this should apply to ReDATA team members. Log out and Log in to confirm (confirmation needed under the Users data panel of Figshare UI).

Additional notes

Implemented in: TBD

PyPI packaging

Configure for easy installation through pypi.

This requires:

  • MANIFEST.in
  • Changes to setup.py

Comparing Org Code spreadsheets

A nice feature would be a code that allows one to compare two files that provide the University's organization codes.

The idea would be to identify differences for maintainers to understand.

Ultimately it would be nice to update the Google Sheet with just the changes through Google Sheet API.

CSV updates

This issue will remain indefinitely. It is intended to track all changes to research_themes.csv

Automated script

A primary script called script_run to execute via the command line:
python script_run

The script should:

  1. Read in the default.ini configuration file
  2. Execute create_csv

For step no. 2, setup.py is needed to work with the absolute package, DataRepository_research_themes

Enhancement: Add CI testing for Python 3.9

Summary
For future enhancements including typing checks, we should include CI for Python 3.9.

Objectives

Eventually take advantage of Python 3.9 typing features

Proposal

Modify python-package.yml to include Python 3.9.

Implemented in: TBD

KeyError: 'Overall Themes'

After cleaning up the Google Sheet so it would run effectively, the following error came up:

Traceback (most recent call last):
  File "DataRepository_research_themes/script_run", line 29, in <module>
    create_csv(url, outfile, log_dir, logfile)
  File "/Users/cly/codes/UALibraries/DataRepository_research_themes/DataRepository_research_themes/create_csv.py", line 91, in create_csv
    no_org_code = no_org_code_index(df_new)
  File "/Users/cly/codes/UALibraries/DataRepository_research_themes/DataRepository_research_themes/commons.py", line 15, in no_org_code_index
    df['Overall Themes'].isna())[0]

The issue has to do with df_new not containing the 'Overall Themes' column as that was dropped.

This should be an an easy hotfix.

Sphinx documentation with RTDs

We are ready to provide documentation through ReadTheDocs.

This tutorial is a good starter. A couple of key points:

  1. Docs will be in a docs/ folder relative to the repo parent
  2. We will keep source and build separate, so answer yes to that question
  3. We will build with python 3.7
  4. We will transfer README.md over and have separate sections (separate .rst files)
  5. We will use sphinx_rtd_theme HTML theme. You will need to include to include the sphinx_rtd_theme extension

Action item:

  • Clean up all docstrings
  • Include requirements.txt under docs/ for Sphinx compilation
  • Include a GitHub Action script
  • Add shields.io badges. See this example

Rename package

Change will be from "DataRepository_research_themes" to "ReQUIAM_csv"

Updates needed:

  • setup.py
  • README.md
  • Rename "DataRepository_research_themes" package folder -> "requiam_csv"
  • script_run
  • create_csv.py
  • Update GitHub repo name
  • Add ReQUIAM_csv logo to README

Docs: Update README.md to Avoid Duplicates with Docs

Describe the hotfix
README.md contains duplicate sections from our Read the Docs documentation. The task is to simplify the maintenance of README.md by one of three methods:

  1. Using the external library m2r2 to generate ../../README.md from source/. Drawback: it hasn't been maintained for a few months now.

  2. Using the external library MyST parser as in (1). This library seems more up-to-date.

  3. Simply stripping sections that need constant maintenance, such as code and Python versions. Then, update the TOC with links to our RTDs, for instance:

- [Getting Started](https://requiam_csv.readthedocs.io/en/latest/gettinnstarted.html)

Issue: if the links to RTDs are implemented before merging #66 into master, we will not be able to properly test them because they use the "/latest/" path.

Version information

  • ReQUIAM_csv version: [0.11.1]

Plan of action

  1. Strip the following sections from README.md:
- Getting Started
    - Requirements
    - Installation Instructions
    - Configuration Settings
    - Testing Installation
- Execution
    - Workflow
- Versioning
  1. Add RTDs links to each of those sections in the TOC, for instance:
https://requiam_csv.readthedocs.io/en/latest/GettingStarted.html
https://requiam_csv.readthedocs.io/en/latest/GettingStarted.html#requirements
https://requiam_csv.readthedocs.io/en/latest/GettingStarted.html#installation-instructions
https://requiam_csv.readthedocs.io/en/latest/GettingStarted.html#configuration-settings
https://requiam_csv.readthedocs.io/en/latest/GettingStarted.html#testing-installation
https://requiam_csv.readthedocs.io/en/latest/Execution.html
https://requiam_csv.readthedocs.io/en/latest/Execution.html#workflow
...etc
  1. Add a message before the TOC to show where the full documentation lives:
## Developers' note:

Full documentation for this code is available on
[Read the Docs](https://requiam_csv.readthedocs.io/en/latest/). The table of contents below links to the appropriate sections within our Read the Docs documentation.

Additional context

@astrochun: This can be implemented as a simple PR that will be merged but without any bumping

Implemented in: >= v0.12.4

Generate git commit command for updates to CSV

When script_run is executed, conduct a comparison to identify differences and use that output to generate a git commit containing details of the changes. The commit message will consists of individual lines with each entry

CI with Travis

This is the main issue thread for Travis CI integration.

Steps include:

  • Creating a .travis.yml file
  • Creating a pytest.ini file
  • Building pytests - additional tests needed, but that will be a separate issue
  • Creating a .coveragerc file to coverage report
  • Update README.md to include build state

This is implemented in the feature/travis_ci branch

Enhancement: Migrate repository to UAL-ODIS

Summary
It's been decided to migrate from ualibraries to UAL-ODIS.

Objectives
Full transfer with branches, issues, PR and project boards. Settings may need to be manually transfer over

Proposal

  • Move forked copy from UAL-ODIS to a personal copy. I suggest mine
  • Perform transfer and confirm transfer was successful
  • Update local dev environments
    • Chun's
    • Damian's
  • Update ReQUIAM for new deployment path (Issue no. 137)

Testing notes
Already tested transfer here

No branch deployment here.

Enhancement: Update code to reflect Python 3.9 updates

Summary
After completing redata-commons UAL-RE/redata-commons#23 and ReQUIAM Python version upgrades UAL-RE/ReQUIAM#170, this repository shall be updated to be consistent with other ReDATA software.

Objectives
Migrate to Python 3.9

Proposal

  • Update requirements.txt (redata=0.5.0)
  • Remove .travis.yml (no longer in use)
  • Update setup.py
  • Update /requiam_csv/init.py (version = "0.13.0")
  • Update .github/workflows/python-package.yml

Testing notes
Need to verify that the current version of python, numpy, and pandas work as usual. Testing shall go through "script_run" and create_csv.py and Inpspect_csv.py under "/requiam_csv"

  • Check the difference "dry_run.csv" and previous research_themes.csv

Additional notes

  • There shall have no difference in the csv after the upgrade. So no need to update the production env.

redata >=0.4.2 shall work. However, for the maintainability, we try to use the same sets of dependencies (redata, numpy, pandas, sphinx)

Implemented in: TBD

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.