oeg-upm / soca Goto Github PK

Software Catalog Creator. A repository that given an organization URL, it will create a software catalog for browsing all repositories

License: Apache License 2.0

Python 33.53% HTML 30.65% CSS 13.08% JavaScript 22.34% Dockerfile 0.16% Shell 0.24%

software software-engineering software-metadata

soca's Introduction

Software Catalog Creator (soca)

A python package that given an organization/user name, it will create a software catalog for browsing all repositories or just a single repository in a minimalist card.

Sample result

Click here to see an interactive example generated by using the oeg-upm organization as input for SOCA.

Click here to see an interactive example generated by using the KnowledgeCaptureAndDiscovery and mintproject organization as input for SOCA.

Click here to see an interactive example generated by using the LinkedEarth organization as input for SOCA.

Command used:

soca fetch -i oeg-upm --org -o oeg-upm_repos -na
soca extract -i oeg-upm_repos -o oeg-upm_metadata -i4p
soca portal -i oeg-upm_metadata -o oeg-upm_portal

This is an example of a single card using the command:

soca card -i https://github.com/oeg-upm/soca --png

Requirements

Git
Python 3.10

Install from GitHub

git clone https://github.com/oeg-upm/soca
cd soca
pip install -e .

Highly recommended steps:

somef configure

Alternatively you may run the installer.sh file which will also configure SOMEF, just edit it to it for your needs.

And you will be asked to provide the following:

A GitHub authentication token [optional, leave blank if not used], which SOMEF uses to retrieve metadata from GitHub. If you don't include an authentication token, you can still use SOMEF. However, you may be limited to a series of requests per hour. For more information, see https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line
The path to the trained classifiers (pickle files). If you have your own classifiers, you can provide them here. Otherwise, you can leave it blank

InfluxDB setup

For SOCA-Dash to work you will need to have a working version of influx 2.+ as well as grafana on your machine. SOCA-Dash needs two datasources and requires tokens to be able to access the influxDB datasources. For more information please visit: https://docs.influxdata.com/influxdb/cloud/reference/cli/influx/auth/create/

To generate a token:

influx auth create -o [organistation name] --all-access

SOCA-Dash requires influxQL datasource connection within grafana. To ensure that influx 2.+ allows influxQL queries execute the following:

influx v1 dbrp create --db [Bucket Name] -rp 0 --bucket-id [Bucket-id]

You also need to create a v1 authentication:

influx v1 auth create \
  --read-bucket [Bucket-id] \
  --write-bucket [Bucket-id] \
  --username admin

Once the influx has been setup and token created please ensure that SOCA is using said token. Now is a good time to execute the SOCA configure command. Or edit the ./installer.sh file to your needs and executing the script.

Install from DockerFile

git clone https://github.com/oeg-upm/soca
cd soca

SOCA comes with a installer.sh file which will automatically run the SOCA and SOMEF configure commands. Please edit it in accordance to your needs. The installer.sh file is necessary for the docker installation process

docker compose up

Docker compose up starts the grafana and the influxdb within their own container. It also creates its own network: "socaNet" You may want to list the containers you have/running:

docker ps -a

If you wish to access the influx container to generate a token you will first need to enter the container:

docker run exec -it [influx container id] /bin/bash

This starts an bash shell for the container. Remember, the container must be running at the time of executing this command.

Once within the container you will need to generate a influx token. The following command will generate a token, you may change the token flags to your needs. Once this command returns a token please copy this into the installer.sh file "databaseToken" For more information please visit: https://docs.influxdata.com/influxdb/cloud/reference/cli/influx/auth/create/

To generate a token:

influx auth create -o [organistation name] --access-all

SOCA-Dash requires influxQL datasource connection within grafana. To ensure that influx 2.+ allows influxQL queries execute the following:

influx v1 dbrp create --db [Bucket Name] -rp 0 --bucket-id [Bucket-id]

You also need to create a v1 authentication:

influx v1 auth create \
  --read-bucket [Bucket-id] \
  --write-bucket [Bucket-id] \
  --username admin

Once the influx has been setup and token copied to installer.sh you may feel free to exit the container.

Now we need to build the SOCA container, please ensure you are within the github directory when executing this command: Remember, container_run.sh will create a summary for the oeg-upm group, modify to your needs and desires. More information can be found within USAGE

docker build -t [INSERT_NAME] .

Once the container has been built you may execute the SOCA container by running the following:

docker run -it --network [network influx is running on] [container name]

SOCA-Dash

Once the grafana, influx and soca have been set up correctly you can create a grafana dashboard by importing SOCA-Dash.json. This will allow you to visualise the Summary being uploaded to the influxDB.

You will require to have created 2 influxDB datasources, one for flux queries and another for influxQL. The following are two examples on how to do so.

For the token use the one previously created.

For the influxQL follow the example provided below.

Here you can see you must create custom headers. Key being "Authorization" and the key being the same token used for the flux datasource.

For the login please use the login created during the influx v1 auth create. For the rest add your org_name and bucket name. If you have used the SOCA defaults you can just copy the image

Usage

Usage: soca [OPTIONS] COMMAND [ARGS]...

  SOCA (Software Catalog Creator)

  Automatically generates a searchable portal for every repository of an
  organization/s or user/s, which is easy to host.

  Usage:
  
  =. (Configure) Create configuration file for database etc
  1. (fetch) Fetch all repos from the desired organization/s
  2. (extract) Extract all metadata for every repo
  3. (portal) Generate a searchable portal for all the retrieved data
  4. (summary) Create a summary from the portal information

Options:
  -h, --help  Show this message and exit.

Commands:
  card        Create a stand-alone card ready to be embedded in a website
  configure   This creates a ~/.soca/configure.ini file
  extract     Fetch and save metadata from introduced repos
  portal      Build a portal with a minimalist design
  fetch       Retrieve all organization/s or user/s repositories
  summary     Create a summary of good practices from portal card data

In order to use SOCA you will need to follow the next steps:

1 - Fetch

First thing to do is gather all repositories pointers that we want to use. We'll use the fetch command to ease this task.

  -i, --input <name-or-path>  Organization or user name  [required]
  -o, --output <path>         Repository list output file  [default: repos]
  --org                       Extracting from a organization  [default: True]
  --user                      Extracting from a user  [default: False]
  -na, --not_archived         Fetch only repos that are not archived
                              [default: False]
  -nf, --not_forked           Fetch only repos that are not forked  [default:
                              False]
  -nd, --not_disabled         Fetch only repos that are not disabled
                              [default: False]
  -h, --help                  Show this message and exit.

Is important to determine if the name belongs to a user or a organization by using the --user or --org flag, additionally you can specify an output path with the flag -o.

Example:

soca fetch -i dakixr --user
soca fetch -i oeg-upm --org -o oeg-upm_repos --not_archived

This command also accepts a file as input (names separated by a new-line) for ingesting multiple names at a time.

Example:
soca fetch -i multiple-users.csv --user -o multiple-users_repos
soca fetch -i multiple-orgs.csv --org -o multiple-orgs_repos --not_archived

The output of this command is a csv file with all the repos of the selected users/orgs. At this moment is a good time to clean this file (remove all repos that you don't want to use). Note: you can add manually any other repository.

2 - Extract

Then we use the extract command to extract all the metadata required from each repository. If you want a more in-depth analysis on Python repositories use the flag -i4p or --inspect4py.

  -i, --input <csv-repos>  Pointers to the repositories in csv format
                           [required]
  -o, --output <path>      Dir where repositories metadata will be saved
  -i4p, --inspect4py       Use inspect4py to extract additional metadata from
                           Python repositories
  -h, --help               Show this message and exit.

Example:
soca extract -i oeg-upm_repos -o oeg-upm_metadata

3 - Portal

This is the last step in the pipeline. For building the portal we need to use the command portal, it will take as input the directory created by the command extract.

  -i, --input <dir-json-metadata>
                                  Dir repositories metadata in json format
                                  [required]
  -o, --output <path>             Dir where Software Catalog Portal will be
                                  saved  [default: portal]
  -t, --title <title>             Portal's title  [default: Software Catalog]
  -fi, --favicon <path-icon.ico>  Portal's favicon  [default: img/soca-
                                  logo.ico]
  -h, --help                      Show this message and exit.

Example:
soca portal -i oeg-upm_metadata -o dir_portal --title '[Portal's title]'

If everything worked fine now a new dir should have been created with all the assets and code to deploy this portal.

Summary

SOCA now allows to produce a summary json of a given cards_data.json created by the previous portal step. User must decide whether or not to upload (default = false), or to create JSON file for output summary For building the summary we need to use the command summary

  -i, --input <dir-json-metadata>
                                  Dir repositories metadata in json format
                                  [required]
  -o, --output <path>             Dir where Software Catalog Portal will be
                                  saved  [default: summary]
  -U, --upload                    Will upload file to influxdb

Example soca summary -i cards_data.json -o test '

Create a stand-alone card

SOCA also gives the option to create a single card in one of two different formats:

HTML
PNG

  -i, --input <url>    Repository URL  [required]
  -o, --output <path>  Output file where the html will be saved  [default:
                       card]
  --html               Save card as html  [default: True]
  --png                Save card as a png  [default: False]
  -h, --help           Show this message and exit.

As input you will need a github repository url and use one of the flags: --html or --png.
Note: if no flag is used the default is html.

Example:
soca card -i https://github.com/oeg-upm/soca --html
soca card -i https://github.com/oeg-upm/soca --png

Styling the portal

In case you want to change the default style of the portal, SOCA decouples the .css files from the code-base. So in the resulting portal directory there will be two .css files are available for further tinkering and styling to everyone needs.

soca's People

Contributors

Stargazers

Watchers

Forkers

dakixr str3am786 stankovskia debkantap

soca's Issues

SCC should infer whether the repo is a web page, ontology or other

At the OEG organization we have many repos that are ontologies, or HTML pages. I think we can distinguish these, and maybe filter them in a separate category.

Usage is not available for some python repositories

I wonder why for some repos, no information is shown. For example, for ner4soft (a service), no usage is shown. Similarly, for https://github.com/oeg-upm/drugs4covid19-kg (a repo run with a script) the information is not shown either.

Repo looks incorrect

The repo below seems like is derived from the pits of hell:

Application crashes when it doesn't find a repo

The command scc extract crashes when it doesn't find a repo:

FileNotFoundError: [Errno 2] No such file or directory: '/home/XXXXXX/oeg-upm_metadata/Jarsomatic'
Extracting metadata from https://github.com/oeg-upm/Jarsomatic

Previous to this error, another exception occurred:
Traceback (most recent call last):
File "/home/egonzalez/scc/src/scc/commands/extract_metadata.py", line 49, in fetch
metadata["inspect4py"]["run"] = ins4py["software_invocation"][0]["run"]
KeyError: 'run'

The problem shows up when we run scc with the option ins4py.

Re-apply somef (logos and other newer fileds are still not appearing)

The portal states that the latest generation was today (4/3/2022), but there are no logos on any of the repos. Many repos have logos, and somef (the latest version) should detect them.

Order and filters

We need to make sure the repos shown have some sort of order, and that the order is explained.

If there are multiple citations, prioritize citation.cff

Citation files have the preferred citation from the authors.

If somef detects bibtex contents and a citation file, we should prioritize only the citation preferred by the authors.

Inconsistent "last updated"

For example, SCC web shows that for https://github.com/oeg-upm/webODE the last update is 11/2021, but it has not been contributed to in the last 4 years.

Error: lists in usage are not properly represented

For morph-graphql, I see:

Translating mappings online for Javascript and a set of CSV files (assuming that you have npm and node or docker installed)

    https://github.com/oeg-upm/morph-graphql/tree/master/examples/starwars

in usage. However, somef returns a list with 2 elements. The second element should be shown.

Logs for each repo are not shown

It is unclear what the status of each repository is.

Somef produces a log, but we don't see it or know where it is in scc.

The log should be accessible

Button to copy citation

I would like to have a button in the window that shows up to easily copy the citation text.

Add "About" page

We need an about page explaining the legend and what each card means.
Also, info about the authors who created SCC.

Mismatch between repos and fectch command

The documentation indicates that we should run scc fetch bla blah, but fetch is not a command. Instead there is a repos command.

The repos command should be renamed to fetch

If there are multiple tags, show latest version number

This would help in the card to know what is the latest version of that software component

workflow for pypi

See the workflow for somef in order to automate the package deployment upon release.

Full description?

Some repos do not show the full description. Show it in a read mode/ link so the modal appears with the full description

New badge based on status

We now capture the repostatus with somef, which is a badge not shown in scc

Readme links are broken

You construct them like https://github.com/oeg-upm/gtfs-bench/README.md but they should be like https://github.com/oeg-upm/gtfs-bench/blob/master/README.md

Header in SCC

@dakixr I don't really like how it's now. It's hard to read, and the text is clumped up.

Is it possible to have a small box around the filters? Or some sort of separation
Have "Filter by" and "Sort by" instead with "Only with" (it does not sound well in English).

Logo shown in page

In the tab, SCC shows the react default logo.

I think we should remove it. And the name of the tab should not be SCC, but Software Catalog.

I would like the title to be configurable, i.e., OEG Software Catalog

Why are some logos not detected?

For example, https://github.com/oeg-upm/Morph-OME has a logo. I tested with SOMEF and it was extracted fine. But I don't see it in the portal

Documentation: what does "updated recently" mean?

I know this is something like less than 3 months. Is this correct?

Double header if no usage instructions are found, but it's a Python package

See the screenshot below

In this case, it should only show "How to use it"

Configuration for bringing repos from an organization

I would like an option for ignoring archived repos.

Improve status descriptions

The descriptions are not great. They don't render very good. It would be nice to have some post processing.

State which version has been used for generating the portal

We should have the scc version and the somef version used somewhere. It can be in the about page, or in the footer.

Error in repo

In Morph-OME, somef detects the logo:
https://raw.githubusercontent.com/oeg-upm/Morph-OME/master/static/logo-min.png

But SCC does not show it. Why?

Add tests

We need tests to assess the functionality of the package.

Different type of tests should assess the different available commands in the package, i.e., the ability to create lists of sample repos from one or multiple organizations, the ability to apply somef, and the ability to generate a portal from the final JSON.

Missing badges

I see there is no "usage" badge. Why?

Documentation field is not shown in some cases

For example in morph-kgc, there is a readthedocs documentation (should be highest priority). However, it only shows the file in GitHub where the docs are.

SOMEF right now returns "documentation" and "hasDocumentation". It looks like only one of this fields is used in scc. We will fix this in somef, but in the meantime, I would like the info in those fields used. If there is a doc with type readthedocs, then use that link as primary.

Codemeta and TTL download

When we hit "download" it would be great to show the option to download TTL/Codemeta results after applying somef

Paths from "usage" have absolute paths

For example, for morph OME, I get:

Usage
How to use it

python /Users/dakixr/Desktop/github/scc/tmp-data/metadata/oeg-upm/Morph-OME/app.py

This is almost correct, but I would like the path to be:

python oeg-upm/Morph-OME/app.py

I.e., start from the organization name

Copy feature in citations

When I click on the citation, I would like to have the ability to copy the citation text.
Otherwise I have to do this manually

Update and do release in pypi

We need a stable version of the repository for other projects (ya2ro)

@dakixr please do a pull request to this repository and we will do a release in pypi. We need installation and usage instructions

scc URL is incorrect

Instead to referring to https://github.com/Dakixr/scc/ (footer), the correct URL should be https://github.com/oeg-upm/scc/

Better management of pop-up modals

Right now, if you open (for example) a popup of the citation or the requirements, it will be only closed if the "x" button is clicked. The usual behavior of modern websites is that if you click outside the pop-up, it automatically closes as well

Filter by "Ontology" or "Website" ?

I am not sure if these should be a different category; or just additional filters.

Make explicit the organization a repo belongs to

Right now scc can generate description from repos belonging to more than 1 organization. We should be able to show the organization the repo belongs to

Ontology logo uses the wrong ico

Please use one of the icons from the last table in: https://www.w3.org/RDF/icons/
(You can edit them to fit the visual style of the website)

Example http://www.w3.org/RDF/icons/rdf_flyer.64

There is no version in package

We need to know which is the version of the library.

This should be under a command named scc version

Wrong logo in repo

In Wot hive (https://github.com/oeg-upm/wot-hive), the logo is shown incorrectly. For some reason, it shows the European Union logo.

I checked and SOMEF extracts the logo correctly.

Add logo

We need a logo for the project!

Ability to filter repos that are forked or deprecated in an organization

The fetch command should support filtering this, otherwise it brings too many repo.

It should be a parameter

Ontology repos should show Ontology URI

Right now they only show whether the repo is an ontology or not. However, SOMEF returns all the URIs that could be loaded in the repo. I think we should provide that info.

Create a proper log from scc

As discussed in our meeting today, it would be great to produce logs of the output generated by SOMEF, and respective errors, in case there is one.

Add logo to readme

We have a logo! :)
Thanks to Laura Camacho.

@dakixr can you please add it to the redme?

SCC does not process all repos of an organization

I was looking at morph-kgc and it does not appear
@dakixr, any explanation why?

Distinguish between types of repositories

At the moment in the OEG organization we have mainly:

Software components: regular pieces of code and tools.
Ontology repositories.
Website repositories.
Helper repositories (some sort of ROs for certain projects)

We should be able to extend SCC to recognize at ontologies and websites so as to update vocab.linkeddata and automatically collect the information from our web page sources.

Run command for python repos

Inspect4py returns how to run each repository (for python repos in particular). However, this information is not shown right now. We should show it!

Some "Ontology" Pages are not ontologies.

For example:

https://github.com/oeg-upm/r2rml-editor does not have anything, yet it's classified as an ontology.
https://github.com/oeg-upm/docker-geokettle-x3geo

Please double check these

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.