ual-re / requiam_csv Goto Github PK

2.0 2.0 2.0 302 KB

Construct list of research themes and organization mapping to work with figshare patron management

License: MIT License

Python 100.00%

requiam_csv's Introduction

About Us

Research Engagement (RE) helps UA faculty, researchers, and students with every stage of their research. We have three functional units:

Research Incubator
Data Cooperative
Scholarly Communications

Members of the Data Cooperative:

GitHub	Name	Role/Title
	Jeff Oliver	Data Science Specialist
	Fernando Rios	Data Management Specialist
	Kiri Carini	GIS Specialist
	Jonathan Ratliff	Research Data Repository Assistant

Experience

Specialty	Tools/Resources
Data Science
Data Management & Publishing
GIS

requiam_csv's People

Contributors

Stargazers

Watchers

Forkers

astrochun 96radhikajadhav

requiam_csv's Issues

Org Code bug (must be four letters)

The original UA Org Codes sheet is missing any leading 0's for 3-digit code (e.g., 0410 shows up as 410). This breaks things on the patron side as an EDS query of 410 returns nothing.

Chore: Add docs checkmark to PR templates

After documentation with RTDs #66 is finished, add checkmark reminders to update documentation on:

bug-pull-request.md
release-pull-request.md

Implement as hotfix

Minor and major changes for v0.9.0

Minor

~~Add the Google Sheet to the README.md file~~ [d1d8124]
~~Describe in README.md the different version of the CSV file~~ [916ffed]
~~Update version number to 0.8.0 (mistake earlier)~~ [ced16b8]
~~Update version number to 0.9.0~~ [2c50277]
~~Fix typo in README for # reference (#installation-instructions)~~ [efc9fe0]
~~Update workflow description (still needs a run to improve)~~ [5c9e190]

Major

~~Commit research_themes.csv~~ [2362f13]

AZPM organization codes should be associated with Journalism sub-portal

Create main function to populate associated portals for each organization

Main function (create_csv) will:

Read in CSV from URL
Identify the overall theme
Populate the overall theme for those without a sub-portal
Export CSV file

The file should be exported to a path within this repository so it can be made public on GitHub.

Logging to stdout and file

A logging-based object that handles the messaging to stdout and a logfile

Logs should be placed in a logs/ folder relative to path of execution. The logfile filename should include an ISO-formatted date.

The logfile prefix should be specified in default.ini

Type hinting throughout the code base

Add type hints to later integrate into our documentation.

PEP8 compliance and documentation

General issue for best practices. This ticket will remain open indefinitely.

Update README to include:

Link to Google Sheet
Link to output CSV file

Enhancement: Upgrade RTD and Sphinx compile and dependencies (as we did on redata-commons and ReQUIAM)

Summary

Upgrade RTD and Sphinx dependencies as these of redata-commons and ReQUIAM

Objectives

Proposal

Update requirements.txt
Add a new file at the root .readthedocs.yaml
Update Sphinx dependencies

Testing notes

check https://requiam-csv.readthedocs.io/

Additional notes

redata-commons UAL-RE/redata-commons#32
ReQUIAM UAL-RE/ReQUIAM#180

Implemented in: TBD

Dry-run option

An arg option to generate the CSV file but not overwrite the default existing CSV file ('data/research_themes.csv')

Changes are implemented in feature/dry_run branch.

Update README.md to describe dry run and full execution.

Update default.ini to use a 'data/' path for output

Note: Fast-forward version to 0.7.0

Use GitHub Actions for CI

Summary (REQUIRED)
With CI build tests already in place with Travis CI, it's relatively straightforward to migrate to GitHub Actions.

Motivation (REQUIRED)
Because of the change in pricing model by Travis CI for open-source software, it becomes cost prohibitive to use Travis CI moving forward when credits run out.

Objectives (REQUIRED)
CI build passes much like how it has occurred for Travis CI.

Proposal (REQUIRED)

This doc page and the GitHub Action YAML is a starting point:
https://docs.github.com/en/free-pro-team@latest/actions/guides/building-and-testing-python

We can start with this but not include python 2.7.
Will need to include other dependencies such as pytest-cov.
Best effort to resemble the .travis.yml should be considered

We should require a skip if "ci skip" or "skip ci"
We should also look at excluding certain files, such as README

Testing notes (Optional)

This might be useful for testing purposes before committing the github actions:
https://github.com/nektos/act

Action items:

Create a python-package GitHub Actions workflow
Add option to prevent skip CI with "ci skip" or "skip ci. See this
Disable certain files
GitHub Actions badge in README.md
Handle paths for output file for CI build

Implemented in: feature/gh_actions_build_test

Update scripts to use f-strings for clarity

Currently, scripts are using .format() for strings.

While this ok, the pythonic way prefers f-strings for their readability
and several other reasons. (See Item 4 in Effective Python)

Example:

log.info("   {} : {}".format(dept.loc[bb], bb + off))

## Change to:

log.info(f"   {dept.loc[bb]} : {bb + off}")

This can be a good first issue implemented as a feature for the next release.

Add GH Issue and PR templates

Mirror LD_Cool-Ps .github folder structure here:

https://github.com/ualibraries/ReQUIAM/tree/master/.github

Move UAL departments into University Libraries portal

Summary

The University Libraries portal was created for purchased datasets. Through discussion, we decided to have all University of Libraries employees under this portal. Below summarizes the steps needed to create a main portal or any portal (e.g., sub-portal).

Objectives

Grouper group membership updated for UAL members

Proposal

Here are the steps to implement the change:

Figshare/ReDATA:

Create portal on ReDATA (done some time ago)
Update Figshare settings for this group to use libraries as the UserAssociationCriteria (done some time ago). I might have done ual but since this is set, this should be fine.

Google Sheet:

Update "Main Portals" spreadsheet to include "University Libraries". The portal name is libraries
Update "Arizona Research Themes" spreadsheet to include "University Libraries" section and include UAL org codes

ReQUIAM:

Run ReQUIAM's add_grouper_group script with --main_portal and --add to create the new group on both figstest and figshare stems
- Confirm on Grouper UI that the Grouper groups were created, portal:libraries

ReQUIAM_csv:

Checkout develop and run script (dry run) to check the result of the output
Do a complete run and commit the code and create the PR
Review the changes before merging it into master

Deployment:

Re-run run_script using the bash alias with --portal and --sync. This should trigger on the new group to include.

Testing notes

After change is implemented, this should apply to ReDATA team members. Log out and Log in to confirm (confirmation needed under the Users data panel of Figshare UI).

Additional notes

Implemented in: TBD

PyPI packaging

Configure for easy installation through pypi.

This requires:

MANIFEST.in
Changes to setup.py

Chore: Refactor logging by using redata-commons

Summary

logging module is not refactored into redata-commons. Let's migrate to using it as a dependency.

Objectives

Proposal

Testing notes

Additional notes

Implemented in: TBD

Comparing Org Code spreadsheets

A nice feature would be a code that allows one to compare two files that provide the University's organization codes.

The idea would be to identify differences for maintainers to understand.

Ultimately it would be nice to update the Google Sheet with just the changes through Google Sheet API.

Identify bad data/information

A function is needed to

Vet the input CSV content and report any issues.
Abort if there are too many errors

CSV updates

This issue will remain indefinitely. It is intended to track all changes to research_themes.csv

Automated script

A primary script called script_run to execute via the command line:
python script_run

The script should:

Read in the default.ini configuration file
Execute create_csv

For step no. 2, setup.py is needed to work with the absolute package, DataRepository_research_themes

Enhancement: Add CI testing for Python 3.9

Summary
For future enhancements including typing checks, we should include CI for Python 3.9.

Objectives

Eventually take advantage of Python 3.9 typing features

Proposal

Modify python-package.yml to include Python 3.9.

Implemented in: TBD

KeyError: 'Overall Themes'

After cleaning up the Google Sheet so it would run effectively, the following error came up:

Traceback (most recent call last):
  File "DataRepository_research_themes/script_run", line 29, in <module>
    create_csv(url, outfile, log_dir, logfile)
  File "/Users/cly/codes/UALibraries/DataRepository_research_themes/DataRepository_research_themes/create_csv.py", line 91, in create_csv
    no_org_code = no_org_code_index(df_new)
  File "/Users/cly/codes/UALibraries/DataRepository_research_themes/DataRepository_research_themes/commons.py", line 15, in no_org_code_index
    df['Overall Themes'].isna())[0]

The issue has to do with df_new not containing the 'Overall Themes' column as that was dropped.

This should be an an easy hotfix.

Perform logging from script_run

Change use of logger from create_csv to script_run

Sphinx documentation with RTDs

We are ready to provide documentation through ReadTheDocs.

This tutorial is a good starter. A couple of key points:

Docs will be in a docs/ folder relative to the repo parent
We will keep source and build separate, so answer yes to that question
We will build with python 3.7
We will transfer README.md over and have separate sections (separate .rst files)
We will use sphinx_rtd_theme HTML theme. You will need to include to include the sphinx_rtd_theme extension

Action item:

Clean up all docstrings
Include requirements.txt under docs/ for Sphinx compilation
Include a GitHub Action script
Add shields.io badges. See this example

Rename package

Change will be from "DataRepository_research_themes" to "ReQUIAM_csv"

Updates needed:

Test GitHub Slack integration

Docs: Update README.md to Avoid Duplicates with Docs

Describe the hotfix
README.md contains duplicate sections from our Read the Docs documentation. The task is to simplify the maintenance of README.md by one of three methods:

Using the external library m2r2 to generate ../../README.md from source/. Drawback: it hasn't been maintained for a few months now.
Using the external library MyST parser as in (1). This library seems more up-to-date.
Simply stripping sections that need constant maintenance, such as code and Python versions. Then, update the TOC with links to our RTDs, for instance:

- [Getting Started](https://requiam_csv.readthedocs.io/en/latest/gettinnstarted.html)

Issue: if the links to RTDs are implemented before merging #66 into master, we will not be able to properly test them because they use the "/latest/" path.

Version information

ReQUIAM_csv version: [0.11.1]

Plan of action

Strip the following sections from README.md:

- Getting Started
    - Requirements
    - Installation Instructions
    - Configuration Settings
    - Testing Installation
- Execution
    - Workflow
- Versioning

Add RTDs links to each of those sections in the TOC, for instance:

https://requiam_csv.readthedocs.io/en/latest/GettingStarted.html
https://requiam_csv.readthedocs.io/en/latest/GettingStarted.html#requirements
https://requiam_csv.readthedocs.io/en/latest/GettingStarted.html#installation-instructions
https://requiam_csv.readthedocs.io/en/latest/GettingStarted.html#configuration-settings
https://requiam_csv.readthedocs.io/en/latest/GettingStarted.html#testing-installation
https://requiam_csv.readthedocs.io/en/latest/Execution.html
https://requiam_csv.readthedocs.io/en/latest/Execution.html#workflow
...etc

Add a message before the TOC to show where the full documentation lives:

## Developers' note:

Full documentation for this code is available on
[Read the Docs](https://requiam_csv.readthedocs.io/en/latest/). The table of contents below links to the appropriate sections within our Read the Docs documentation.

Additional context

@astrochun: This can be implemented as a simple PR that will be merged but without any bumping

Implemented in: >= v0.12.4

Generate git commit command for updates to CSV

When script_run is executed, conduct a comparison to identify differences and use that output to generate a git commit containing details of the changes. The commit message will consists of individual lines with each entry

CI with Travis

This is the main issue thread for Travis CI integration.

Steps include:

Creating a .travis.yml file
Creating a pytest.ini file
Building pytests - additional tests needed, but that will be a separate issue
Creating a .coveragerc file to coverage report
Update README.md to include build state

This is implemented in the feature/travis_ci branch

Enhancement: Migrate repository to UAL-ODIS

Summary
It's been decided to migrate from ualibraries to UAL-ODIS.

Objectives
Full transfer with branches, issues, PR and project boards. Settings may need to be manually transfer over

Proposal

Move forked copy from UAL-ODIS to a personal copy. I suggest mine
Perform transfer and confirm transfer was successful
Update local dev environments
- Chun's
- Damian's
Update ReQUIAM for new deployment path (Issue no. 137)

Testing notes
Already tested transfer here

No branch deployment here.

Re-organization and packaging

Items to do:

~~Strip individual functions into separate files.~~ See feature/organize
~~Create requirements.txt file~~ See ca1c103
~~Create a setup.py script~~ See 57aacfb

Enhancement: Update code to reflect Python 3.9 updates

Summary
After completing redata-commons UAL-RE/redata-commons#23 and ReQUIAM Python version upgrades UAL-RE/ReQUIAM#170, this repository shall be updated to be consistent with other ReDATA software.

Objectives
Migrate to Python 3.9

Proposal

Update requirements.txt (redata=0.5.0)
Remove .travis.yml (no longer in use)
Update setup.py
Update /requiam_csv/init.py (version = "0.13.0")
Update .github/workflows/python-package.yml

Testing notes
Need to verify that the current version of python, numpy, and pandas work as usual. Testing shall go through "script_run" and create_csv.py and Inpspect_csv.py under "/requiam_csv"

Check the difference "dry_run.csv" and previous research_themes.csv

Additional notes

There shall have no difference in the csv after the upgrade. So no need to update the production env.

redata >=0.4.2 shall work. However, for the maintainability, we try to use the same sets of dependencies (redata, numpy, pandas, sphinx)

Implemented in: TBD

Config file and config parsing

The config file (default.ini) should include:

The URL to retrieve the CSV-formatted spreadsheet
The outfile name

GitHub Actions for release gating

Deploy actions/create-release action to create a new release on any tagged version that is pushed.

ual-re / requiam_csv Goto Github PK

requiam_csv's Introduction

About Us

Experience

requiam_csv's People

Contributors

Stargazers

Watchers

Forkers

requiam_csv's Issues

Minor

Major

Update README to include:

Items to do:

Recommend Projects

Recommend Topics

Recommend Org