Code Monkey home page Code Monkey logo

phileas's People

Contributors

dependabot[bot] avatar jzonthemtn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

zhutony

phileas's Issues

Incorporate zip code database

The goal is to reduce zip code false positives by including a look up when text matches a potential zip code. Because zip codes change, the lookup should not be definitive but should be an additional factor when determining if it is a true positive.

POS post filter does not handle multi-word tokens

Given the input:

"George Washington was president and his ssn was 123-45-6789 and he lived at 90210."

The POS filter fails because the tokens are "George" and "Washington" individually and not "George Washington." The filter needs changed to allow for multi-word tokens.

Failing tests on OSX M2

Due to ONNX Runtime on M2.

[ERROR] Errors:
[ERROR]   PersonsV2FilterTest.filter1:64 » UnsatisfiedLink no onnxruntime in java.librar...
[ERROR]   PersonsV2FilterTest.filter2:96 » NoClassDefFound Could not initialize class ai...
[ERROR]   PersonsV2FilterTest.filter3:135 » NoClassDefFound Could not initialize class a...
[ERROR]   PersonsV2FilterTest.filter4:172 » NoClassDefFound Could not initialize class a...
[ERROR]   PersonsV2FilterTest.filter5:205 » NoClassDefFound Could not initialize class a...
[ERROR]   PersonsV2FilterTest.filter6:240 » NoClassDefFound Could not initialize class a...

Use stop words to shorten physician names

Use stop words to shorten physician names. Instead of taking the entire n-gram, see if we can use stop words to shorten the span by cutting it based on the location of the stop words.

Look at each token in the physician name span from the outsides to see if they are stop words. If they are condense the span.

How to launch Phileas?

Hi,

I am not very knowledgeable about Java, but much to my surprise I did manage to write a simple client using your instructions and get it to compile and run using Maven. However, I have not been able to figure out how to launch the Phineas service it expects at https://127.0.0.1:8080. I was wondering how to do that?

Cheers,
Andrew

Support non-USD currencies

Support non-USD currencies. Need to add options to the filter strategy to designate the type of currency (or none for all types).

Ignore cities in court names

Ignore cities when they appear as part of a court name, e.g. District Court of Baltimore City.

This requires consideration about where to implement the feature. If we are looking for city names then it seems to be a function of the CITY filter. So that would require a flag in the CITY filter strategy to ignore the city if it is given as part of a court name.

Court names seem to be either:

… Court of … - Supreme Court of West Virginia
… Court of the … - Supreme Court of the United States
… Court - Wisconsin Supreme Court
… Court for the … - United States District Court for the Eastern District of Wisconsin

The Restriction class could probably be used as a means of doing a lookup.

Allow individual filter regex to be enabled/disabled

Allow individual filter regex to be enabled/disabled. The purpose is to allow only a set of regexes to be enabled.

There could be magic environment variables that can be set/unset to enable/disable the regex patterns. (Or some other method.)

Add a priority to each filter

Consider adding priority to filters in events of where two spans are completely identical, the priority would be used to determine which span is selected.

This needs tested well. Will need to test:

  • getFiltersForFilterProfile - to ensure the filters are in the order given by the priorities (high to low).
  • Identical spans found by different filters only return the span having the highest priority.

Was coded in 1.10.0 but not tested or added to documentation.

Disable dependency logging

Disable this logging:

Jan 16 15:31:14 ip-10-0-2-32.ec2.internal bash[3348]: 2021-01-16 15:31:14.544 ERROR 3363 — [nio-8080-exec-6] c.m.p.s.validators.DateSpanValidator : Text '3/2018' could not be parsed: Unable to obtain LocalDate from TemporalAccessor: {MonthOfYear=3, Year=2018},ISO of type java.time.format.Parsed

Add OR boolean operator to grammar

Add OR boolean operator to grammar.

Currently, OR can be accomplished to some degree by using multiple filter strategies.

It would be ideal to allow expressions like:

context == 'test' and confidence > 1.0 or token == 'asdf'

Add options to make first names and surnames be adjacent

Add an optional parameter to the FirstName filter that requires a Surname immediately after.

Likewise, add an optional parameter to the Surname filter that requires a FirstName immediately preceding it.

Both options can be set independently, and both should default to false.

When either option is set to true, that filter should only report a span when it is preceded/followed by a span from the other filter.

Condition should be a list of strings instead of just a string

Condition should be a list of strings instead of just a string. As written now, there is a one-to-one between condition and filter strategy.

This allows for multiple conditions for a given filter strategy. This is how it was done in Philter Studio before it was discovered that "condition" is just a single string in the filter condition.

Not finding name with apostrophe

In PhileasFilterServiceTest.endToEnd15(), the name “David O’Brien” is not being identified. “David O '“ is being found but the space between the O and the apostrophe is causing the findByRegex to return -1.

Allow filter profiles to be written in YAML

Allow filter profiles to be written in YAML.

  • What format will the API return when retrieving filter profiles?
  • When saving filter profiles through the API, how to set the format? Content-type header?
  • The .json extension is used extensively through the filter profile services to find filter profiles on disk.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.