Code Monkey home page Code Monkey logo

eucy's People

Contributors

ghxm avatar

Watchers

 avatar

eucy's Issues

Create `EuDoc` class

  • could provide easy access to all euCy metrics
  • could provide reference to used nlp objects which can then be used for functions that require re-running (such as modify)

How to handle amending acts

see e.g. #9 but also others.

Options are to either

  • try to parse outer structure and account for amending acts by setting specific flag or similar in annotation
  • parse inner structure -> e.g. inserted article within article counts as extra article (messy)

Measure `WORK_CITES_WORK` in notices for final texts

  • euplexer version:
  • Python version:
  • Operating System:

Description

Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.

What I Did

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

Validation

Proposals, Final texts
Time periods
Non/amending
Regulations, Directives, Decisions

Recital parsing captures random text snippet

  • euCy version:
  • Python version:
  • Operating System:

Description

Recital 2 not matched correctly (seems like text of an article is picked up.

What I Did

Proposal 2018_232.txt

Possible solution

Make sure individual elements are more or less adjacent

Improve sentence parsing

  • Blackstone sentencizer?
  • What about footnotes? ('Having regard to the proposal from the Commission [4],')
  • Convert _get_sentences function with min length etc to custom Sentencizer to use and adjust/replace over time?

Ability to modify laws

Objective

Individual parts of laws (e.g. spans in the .spans SpanGroups should be editable.

Aims

  • Obtain a spaCy document with updated text while keeping all annotations/elements

Solutions

A. Markup text and export marked text

  • Store replacement text in ._.replacement_text attribute
  • Import into spaCy/euCy with special markup reader for element detection

Problem: When re-reading into spaCy, markup will become part of document

B. Export text and annotation separately

  • Store replacement text in ._.replacement_text attribute
  • Export annotation in standard spaCy JSON

Problems:

  • How to export the text and
  • update the spans (when a previous span changes, start of next one also changes)

C. Split text and recreate doc at every (changes) span to obtain text + annotation (can then continue with e.g. 2.)

โ†’ https://stackoverflow.com/a/75300856/5565500

Works at token level?

Problem: Problem might be computationally heavy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.