Code Monkey home page Code Monkey logo

grobid-superconductors's Introduction

Hi there 👋

Artificial intelligence specialist with 10+ years of experience in software engineering. I have expertise in Text and Data Mining, Natural Language Processing, and Data Science. I'm interested in the development of specialized processes for scientific text, in particular document parsing and structuring.

I like mostly anything related to the outdoor and travel.

grobid-superconductors's People

Contributors

dependabot[bot] avatar lfoppiano avatar t29mato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grobid-superconductors's Issues

Output sentence offsets

When we process paragraphs and the output are sentence-based, we need the sentence offsets to be included in the output.

With the sentence offsets we can easily re-match the spans offsets back to the original paragraphs.

E.g. Spaces between sentences are usually lost in the sentence split.

Error case for space groups

Example:
LaOFeAs, a member of the group of quaternary oxypnictides LnOTMPn, has a layered structure belonging to the tetragonal P 4/ nmm space group with lattice constants a 5 0.403552(8) nm and c 5 0.87393(2) nm.

P 4/ nmm is not extracted

Re-organise urls

here the current URL API:

    POST    /service/annotations/feedback (org.grobid.service.controller.AnnotationController)
    GET     /service/health (org.grobid.service.controller.HealthCheck)
    GET     /service/material/parse (org.grobid.service.controller.MaterialController)
    POST    /service/material/parse (org.grobid.service.controller.MaterialController)
    POST    /service/process/pdf (org.grobid.service.controller.AnnotationController)
    POST    /service/process/text (org.grobid.service.controller.AnnotationController)

/material/parse -> /material/process, although parse looks more correct...

Fix variable extraction with intervals

Variables with intervals are not parsed correctly:

Superconductivity has been observed in all samples with x ⩾ 0.05 and the maximum critical temperature (T c ) ≈ 32 K has been obtained in samples with 0.1 ⩽ x ⩽ 0.2 from electronic resistivity measurement.

results in the extraction of 0.1 ⩽ x ⩽ 0.2 linked to 32K, however the material 0.1 ⩽ x ⩽ 0.2 is wrongly parsed as x =0.1

Tc classification breaking use case

Apparently this document (document2.pdf) + superconductors using scibert makes a risotto with the tc classification:

Jul 11 12:47:07 falcon docker[11065]: Traceback (most recent call last):
Jul 11 12:47:07 falcon docker[11065]: File "/opt/service/venv/lib/python3.7/site-packages/bottle.py", line 870, in _handle
Jul 11 12:47:07 falcon docker[11065]: return route.call(**args)
Jul 11 12:47:07 falcon docker[11065]: File "/opt/service/venv/lib/python3.7/site-packages/bottle.py", line 1750, in wrapper
Jul 11 12:47:07 falcon docker[11065]: rv = callback(*a, **ka)
Jul 11 12:47:07 falcon docker[11065]: File "/opt/service/grobid_superconductors/service.py", line 118, in process_link
Jul 11 12:47:07 falcon docker[11065]: result.append(self.process_single_sentence(sentence_input, 
link_types_as_list, skip_classification))
Jul 11 12:47:07 falcon docker[11065]: File "/opt/service/grobid_superconductors/service.py", line 143, in process_single_sentence
Jul 11 12:47:07 falcon docker[11065]: marked_tc_paragraph = self.temperature_classifier.mark_temperatures_paragraph(paragraph_input)
Jul 11 12:47:07 falcon docker[11065]: File "/opt/service/grobid_superconductors/linking/linking_module.py", line 561, in mark_temperatures_paragraph
Jul 11 12:47:07 falcon docker[11065]: return self.mark_temperatures(text_, tokens_, spans_)
Jul 11 12:47:07 falcon docker[11065]: File "/opt/service/grobid_superconductors/linking/linking_module.py", line 543, in mark_temperatures
Jul 11 12:47:07 falcon docker[11065]: doc = self.init_doc(words, spaces, spans_remapped)
Jul 11 12:47:07 falcon docker[11065]: File "/opt/service/grobid_superconductors/linking/linking_module.py", line 68, in init_doc
Jul 11 12:47:07 falcon docker[11065]: span = Span(doc=doc, start=s['token_start'], end=s['token_end'], label=s['type'])
Jul 11 12:47:07 falcon docker[11065]: File "spacy/tokens/span.pyx", line 99, in spacy.tokens.span.Span.__cinit__
Jul 11 12:47:07 falcon docker[11065]: IndexError: [E035] Error creating span with start 9 and end 6 for Doc of length 24.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.