Code Monkey home page Code Monkey logo

climate's Introduction

petermr repositories

Many of these repos are widely used in collaborative projects and include:

  • code
  • data
  • projects

This special repo is to coordinate navigation and discussion

discussion lists

The "Discussions" for this repo https://github.com/petermr/petermr/discussions include discussions for the other repos and are of indicated by their name. They may replace our (private) Slack for all public-facing material (private project management will remain on Slack).

active repos

active Python projects:

For context: We have 4 packages (if that's the right word). They are largely standalone but can have useful library routines. They all share a common data structure on disk (simply named directories). This means that state is less important and often held on the filesystem. It also means that data can be further manipulated by Unix tools and other utilities. This is very fluid as we are constantly adding new data substructures. (I developed much of this in Java - https://github.com/petermr/ami3/blob/master/README.md) . The top directory is a CProject and its document children are called CTrees as they are useful split into many subdirectory trees.

Each package has a maintainer. These are all volunteers. Their Python is all self-taught . There are also interns - mixture of compsci/engineers/plant_sci who have a 3-month stay. They test the tools, develop resources, explore text-mining, NLP, image analysis, machine-learning, etc. They are encouraged to use the packages, link them into Python scripts or Notebooks but don't have time for serious development. (They might add readers or exporters).

  • pygetpapers , Ayush Garg. https://github.com/petermr/pygetpapers . Searches and downloads articles from repositories. Standalone, but the results may be used by docanalysis or possibly imageanalysis. Can be called from other tools.

  • docanalysis. Shweata Hegde. https://github.com/petermr/docanalysis . Ingests CProjects and carries out text-analysis of documents, including sectioning, NLP/text-mining, vocabulary generation. Uses NLTK and other Python tools for many operations, and spaCy, scispaCy for annotation of entities. Outputs summary data, correlations, word-dictionaries. Links entities to Wikidata.

  • pyamiimage, Anuv Chakroborty + PMR. https://github.com/petermr/pyamiimage . Ingests Figures/images, applies many image processing techniques (erode-dilate, colour quantization, skeletons, etc.), extracts words (Tesseract) , extracts lines and symbols (uses sknw/NetworkX) and recreates semantic diagrams (not finished)

  • py4ami . PMR. https://github.com/petermr/pyami . Translation of ami3(J) to Python. Processes CProjects to extract and combine primitives into semantic objects. Some functionality overlaps with docanalysis and imageanalysis. Includes libraries (e.g. for Wikimedia) and includes prototype GUI in tkinter, and a complex structure of word-dictionaries covering science and related disciplines. (Note the project is called pyami locally but there is already a PyAMI project so there it is called py4ami)

All packages aim to have a common commandline approach, use config files, generate and process CProjects (e.g. iterating over CTrees and applying filters, transformers, map/reduce, etc.). All 4 packages have been uploaded to PyPI

basicTest

Checks that the Python environment works (independently of the applications) https://github.com/petermr/basicTest/blob/main/README.md

presentations

Some presentations about the software, many from collaborators/interns

pygetpapers

notebook

docanalysis

wikidata

climate's People

Contributors

katrinleinweber avatar mrchristian avatar petermr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

climate's Issues

Organising project comms

My suggestion for my next contribution to the Open Climate Knowledge project is to work out a plan for the next steps in communicating the project, in an agile bite size way, so just a few next steps. Then rinse repeat.

I'll write out some ideas here https://demo.codimd.org/VrRq3-_QQ2eNVQiVAYNgbA?view

In no particular order:

  • Create website with Jekyll GitHub pages
  • Short project description: about, attribution
  • Populate repositories with standard open project docs: CoC, Contribute, licence, etc.
  • Short presentation
  • Guide of how to use OpenNotebook: which also means understanding the OpenNotebook functionality, outputs, etc. :-)
  • Guide for groups wanting to use OCK to add to or create new dictionaries
  • Roadmap
  • Define and document process for a dictionary/search so we have a solid base to work
  • Confirming our initial short term goals for this stage: build project comms; dictionary on 'runaway climate change'; invite others to use.
  • Produce a paper

That's more then enough...

Easier software install Qs

CM software easier to install, e.g. have all the Java ones on Maven Central, etc.

If this is felt necessary it would be good to look at what needs to be done.

PDF processing

Can you point me the the part of ContentMine or the instructions for processing and extracting PDF parts. Also is there an example of a source document and the outputs.

I am asking as some colleagues have a PDF document set that they need to extract and enrich components from.

Force11 WG setup

I'm setting up the Force11 WG ready to make an announcement, either today or tomorrow, depending on how simple or complicated things get.

I'm going to follow how the Software Citation Principles WG have done things as they have been successful with their WGs and have experience running a couple. See: https://github.com/force11/force11-sciwg

Like them I will create an information WG repo, which seems like a good idea as document updates will get in the way of the software code repo. And of course I will do this over on the OCKProject area and move to Issue tracking there at some point.

download crossref metadata

General idea: download Crossref metadata and see how much is climate literature is open.

$ getpapers -q "climate change" --api crossref -k 100 --outdir crossref
info: Searching using crossref API
info: Found 689472 results
info: limiting hits
info: Saving result metadata
info: Full CrossRef result metadata written to crossref_results.json
info: Individual CrossRef result metadata records written
MacBook-Pro-3:climate pm286$ tree crossref/
crossref/
├── 10.1002_9781118279380.ch2
│   └── crossref_result.json
├── 10.1002_9781119974178.ch3
│   └── crossref_result.json
├── 10.1002_wcc.158
│   └── crossref_result.json

The metadata is added by publishers and is highly variable

Ideas for help that OCK needs to recruit

As mentioned in the accompanying Issues #29 'OCK next steps tech' I want to list out areas where OCK needs help as it may be that TIB colleagues have suggestions or ideas about how to plug the gaps.

This is my list of help needed, in no particular order, please add, etc:

  • Software support documentation
  • Climate Change specialists as advisors on use of Content Mine, issues in their field, uses of ContentMine OCK, etc
  • Data science software developers, users to carry out searches, experiments
  • Members to join a Force11 working group for OCK, contribute on research, papers, WG duties, OA stats, contribute to recommendations and plans for transition to open research/OA
  • OA experts to help on informing OCK on existing research on OA rates, stats and how OCK can deal with speeding up OA in Climate Change
  • Wikidata wranglers
  • knowledge graph expertise
  • content curation and repository building
  • RDM
  • community development and open project strategy implementation

That's all folks

S

Rendering JATS/XML as HTML5

You want to have some JATS/XML rendered as HTML5 for the Oxford XML Summer School. Can you point me to the type of source, or an example, content that needs rendering that way I can try some things out. Preferably the GitHub Pages Jekyll framework could just use the JATS as is but will have to see.

I take it we would either be wanting concatenate a series of papers from directories into one big HTML output, or create a mini website linking to papers?

help with lists and knowledgebase

To have a speedier list and knowledgebase building process to then feed into Content Mine use I have made a page on the GenR repository for collecting contributions. Also this can help contect into the larger projects of climate change OA liberation. See https://github.com/Gen-R/open-climate/

These pages can then be merged in this repository.

Frequently Asked Questions (FAQs)

Introduction

This is a growing assembly of questions that newcomers to the project and its software might ask. If you join the project you can ask or answer more.

Themes

purpose of the project

how to run the software

raw material

dictionaries

people

GitHub Organisation for OCK

To facilitate the new Force11 Working Group we need to setup a GitHub organisation, this way we can have members and other group functions.

If @petermr you could setup a group and add @mrchristian as an admin, then I will fork the climate repo there and we can use the forked repo as the new working place for the WG.

The GitHub organisation should be an individual account and be named 'Open Climate Knowledge'

created 200 scoping set for runaway climate

A quick search to see how many papers relate to runaway or tipping .

MacBook-Pro-3:climate pm286$ getpapers -q "((climate change) AND ((runaway) OR (feedback) OR (tipping)))" -k 500 -x -o runaway500
info: Searching using eupmc API
info: Found 9650 open access results
warn: This version of getpapers wasn't built with this version of the EuPMC api in mind
warn: getpapers EuPMCVersion: 5.3.2 vs. 6.1 reported by api
info: Limiting to 500 hits
Retrieving results [==============================] 100% (eta 0.0s)
info: Done collecting results
info: Duplicate records found: 998 unique results identified
info: limiting hits
info: Saving result metadata
info: Full EUPMC result metadata written to eupmc_results.json
info: Individual EUPMC result metadata records written
info: Extracting fulltext HTML URL list (may not be available for all articles)
info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt
info: Got XML URLs for 500 out of 500 results
info: Downloading fulltext XML files
Downloading files [==============----------------] 46% (232/500) [77.8s elapsed, eta 89.9]^C 

stopped after 40 %, got 222
ami-search

MacBook-Pro-3:climate pm286$ ami-search -p runaway222/ --dictionary compound species country funders 

Generic values (AMISearchTool)
================================
-v to see generic values
oldstyle            true

Specific values (AMISearchTool)
================================
oldstyle             true
strip numbers        false
wordCountRange       (20,1000000)
wordLengthRange      (1,20)

dictionaryList       [compound, species, country, funders]
dictionaryTop        null
dictionarySuffix     [xml]

0    [main] DEBUG org.contentmine.ami.tools.AbstractAMISearchTool  - old style search command); change
cProject: runaway222
legacy cmd> word(frequencies)xpath:@count>20~w.stopwords:pmcstop.txt_stopwords.txt
legacy cmd> search(compound)
legacy cmd> species(binomial)
legacy cmd> search(country)
legacy cmd> search(funders)
!PMC5264177 .!PMC5299408 !PMC5459990 !PMC5472773 !PMC5551099 !PMC5577139 !PMC5578963 !PMC5593823 !PMC5595922 !PMC5651905 !PMC5678106 .!PMC5719437 !PMC5734744 !PMC5770443 PMC5789925 !PMC5795745 !PMC5798756 !PMC5820313 ...
PMC6536552 PMC6538627 !PMC6539176 .PMC6539203 !PMC6540656 PMC6540663 !PMC6541288 PMC6541573 !PMC6541581 PMC6541717 !PMC6542552 !PMC6542844 !PMC6543642 .PMC6544233 PMC6545051 PMC6545231 UNKNOWN nlm tag: city
UNKNOWN nlm tag: city
!PMC6547168 !PMC6549952 PMC6550257 PMC6553685 !PMC6555712 PMC6556101 UNKNOWN nlm tag: city
UNKNOWN nlm tag: city
UNKNOWN nlm tag: version
UNKNOWN nlm tag: version
UNKNOWN nlm tag: version
!PMC6556939 .PMC6558283 !PMC6559081 !PMC6559268 !PMC6559292 !PMC6561295 !PMC6562896 !PMC6563524 PMC6565653 !PMC6566821 PMC6566967 .PMC65679
...
PMC6723259 PMC6724111 !PMC6724177 !PMC6724306 PMC6724339 !PMC6726645 !PMC6727426 PMC5264177 97035 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
.PMC5299408 97036 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
(PMR see to be a lot of these)
PMC5459990 97036 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
...

.PMC6706196 PMC6706372 PMC6706434 PMC6708170 PMC6708426 PMC6709546 105060 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
PMC6709957 105060 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
PMC6710573 105060 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
PMC6711539 PMC6712833 105149 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
.PMC6712961 105149 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
PMC6714084 105149 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
PMC6714099 PMC6716414 PMC6716840 PMC6717165 PMC6717645 105225 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
PMC6718425 105225 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
PMC6718993 PMC6720849 .PMC6721090 PMC6721118 105351 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
PMC6723259 PMC6724111 PMC6724177 105406 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
PMC6724306 105406 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
PMC6724339 PMC6726645 105438 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
PMC6727426 105438 [main] DEBUG org.contentmine.ami.plugins.word.WordCollectionFactory  - no words found to extract
....................................................................................................cannot run command: search([compound])[]; cannot process argument: --sr.search (RuntimeException: cannot read inputStream for dictionary: /org/contentmine/ami/plugins/dictionary/compound.xml)
SP: runaway222..................................................................................................................................................................................................................................................................................................................................................................................................................................................................
create data tables
rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrMacBook-Pro-3:climate pm286$ 

SECTIONS

new tool to find sections. Value depends on publisher consistency

MacBook-Pro-3:climate pm286$ ami-section -p runaway222/ --sections ALL

Generic values (AMISectionTool)
================================
-v to see generic values
oldstyle            true

Specific values (AMISectionTool)
================================
sectionList             [ABBREVIATION, ABSTRACT, ACK_FUND, APPENDIX, ARTICLE_META, ARTICLE_TITLE, CONTRIB, AUTH_CONT, BACK, BODY, CASE, CONCL, COMP_INT, DISCUSS, FINANCIAL, FIG, FRONT, INTRO, JOURNAL_META, JOURNAL_TITLE, PUBLISHER_NAME, KEYWORD, METHODS, OTHER, PMCID, REF, RESULTS, SUPPL, TABLE, SUBTITLE, TITLE]
write                   true

AMISectionTool cTree: PMC5264177
AMISectionTool cTree: PMC5299408
AMISectionTool cTree: PMC5459990
AMISectionTool cTree: PMC5472773
...

creates a section/ dir for each CTree

This is new ...
title of section depends on the subtitles from the publisher.

Comments useful.

Energy Modeling Search

Meeting with Ludwig Hülk @Ludee on Monday at the Reiner Lemoine Institut https://github.com/rl-institut in Berlin to talk about Open Energy Modeling.

I discussed creating a dictionary for a search on Energy Modeling with Ludwig and his colleague, who are experts in the field.

There are two resources we can do this dictionary from, firstly, a Glossary from the Open Energy Modeling Initiative https://wiki.openmod-initiative.org/wiki/Category:Glossary and, second, a ontology that RLI have made https://github.com/OpenEnergyPlatform/ontology

I'll consult with Ludwig and co about how we can collate useful terms from these sources, connect to WikiData and have a hand over and then refinement as we carry out searches.

Scholarly HTML

do the scholarly HTML files use this Scholarly markup, I know sounds like a silly question, but just need a reality check. Although w3C group seems inactive, so here https://vivliostyle.github.io/vivliostyle_doc/samples/scholarly/index.html

I'm seeing if I can render the HTML outputs from a 'mining session' using the paginated CSS setup from the lovely Vivliostyle people https://vivliostyle.org/

Like so https://vivliostyle.github.io/vivliostyle.js/viewer/vivliostyle-viewer.html#x=https://vivliostyle.github.io/vivliostyle_doc/samples/scholarly/index.html

I should be able to make an inventory of the papers, somehow, then some custom CSS and might work :-) In terms of outputting as standalone website, MD for internal GitHub viewing will be different.

technologies for OCK next steps

Hi,

I have a presentation for TIB colleagues tomorrow 29th Oct and I need to ask a couple of questions about the 'thoughts' on technology routes for OCKs proposed next steps. And whether there are: existing systems in place, choices already made, routes contemplated, or explorations being made.

  • Building an open metadata or/and document repository - so how we could present the data collated by OCK in a usable form to the public?
  • Knowledge Graph creation?

I would like to be able to present the current view on these two parts of the project as if we need input I can see what people can offer, or have ideas about.

Thanks

Simon

Pull request waiting for merge

Pull request with some lists information is waiting for merge. #8

It covers text files from contributing and lists files in a new directory 'lists'.

Some file changes seemed to appear in the directory 'clim107' this wasn't intentional and not sure how, why happened.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.