Code Monkey home page Code Monkey logo

lexpredict-lexnlp's Introduction

Build Status Coverage Status Docs

LexNLP by LexPredict

Information retrieval and extraction for real, unstructured legal text

LexNLP is a library for working with real, unstructured legal text, including contracts, plans, policies, procedures, and other material.

LexNLP provides functionality such as:

  • Segmentation and tokenization, such as
    • A sentence parser that is aware of common legal abbreviations like LLC. or F.3d.
    • Pre-trained segmentation models for legal concepts such as pages or sections.
  • Pre-trained word embedding and topic models, broadly and for specific practice areas
  • Pre-trained classifiers for document type and clause type
  • Broad range of fact extraction, such as:
    • Monetary amounts, non-monetary amounts, percentages, ratios
    • Conditional statements and constraints, like "less than" or "later than"
    • Dates, recurring dates, and durations
    • Courts, regulations, and citations
  • Tools for building new clustering and classification methods
  • Hundreds of unit tests from real legal documents

Logo

Information

Structure

Licensing

LexNLP is available under a dual-licensing model. By default, this library can be used under AGPLv3 terms as detailed in the repository LICENSE file; however, organizations can request a release from the AGPL terms or a non-GPL evaluation license by contacting ContraxSuite Licensing at <[email protected]>.

Requirements

  • Python 3.8
  • pipenv

Releases

  • 2.3.0: November 30, 2022 - Twenty sixth scheduled public release; code
  • 2.2.1.0: August 10, 2022 - Twenty fifth scheduled public release; code
  • 2.2.0: July 7, 2022 - Twenty fourth scheduled public release; code
  • 2.1.0: September 16, 2021 - Twenty third scheduled public release; code
  • 2.0.0: May 10, 2021 - Twenty second scheduled public release; code
  • 1.8.0: December 2, 2020 - Twenty first scheduled public release; code
  • 1.7.0: August 27, 2020 - Twentieth scheduled public release; code
  • 1.6.0: May 27, 2020 - Nineteenth scheduled public release; code
  • 1.4.0: December 20, 2019 - Eighteenth scheduled public release; code
  • 1.3.0: November 1, 2019 - Seventeenth scheduled public release; code
  • 0.2.7: August 1, 2019 - Sixteenth scheduled public release; code
  • 0.2.6: June 12, 2019 - Fifteenth scheduled public release; code
  • 0.2.5: March 1, 2019 - Fourteenth scheduled public release; code
  • 0.2.4: February 1, 2019 - Thirteenth scheduled public release; code
  • 0.2.3: Junuary 10, 2019 - Twelfth scheduled public release; code
  • 0.2.2: September 30, 2018 - Eleventh scheduled public release; code
  • 0.2.1: August 24, 2018 - Tenth scheduled public release; code
  • 0.2.0: August 1, 2018 - Ninth scheduled public release; code
  • 0.1.9: July 1, 2018 - Ninth scheduled public release; code
  • 0.1.8: May 1, 2018 - Eighth scheduled public release; code
  • 0.1.7: April 1, 2018 - Seventh scheduled public release; code
  • 0.1.6: March 1, 2018 - Sixth scheduled public release; code
  • 0.1.5: February 1, 2018 - Fifth scheduled public release; code
  • 0.1.4: January 1, 2018 - Fourth scheduled public release; code
  • 0.1.3: December 1, 2017 - Third scheduled public release; code
  • 0.1.2: November 1, 2017 - Second scheduled public release; code
  • 0.1.1: October 2, 2017 - Bug fix release for 0.1.0; code
  • 0.1.0: September 30, 2017 - First public release; code

lexpredict-lexnlp's People

Contributors

afparsons avatar andreycorelli avatar kpmarsh avatar mjbommar avatar mviktorov avatar pchestek avatar reddalexx avatar warrenagin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lexpredict-lexnlp's Issues

How to use the ML models

I can see that I can use the ML classifier to identify the definitions from the code.
File - lexnlp.extract.en.definitions

def get_definitions(text: str,
                    return_sources=False,
                    decode_unicode=True,
                    return_coords=False,
                    locator_type: AnnotationLocatorType = AnnotationLocatorType.RegexpBased) -> Generator:
    """
    Find possible definitions in natural language in text.
    The text will be split to sentences first.
    :param return_coords: returns a (x, y) tuple in each record. x - definition's text start, y - definition's text end
    :param decode_unicode:
    :param return_sources: returns a tuple with the extracted term and the source sentence
    :param text: the input text
    :param locator_type: use default (Regexp-based) or ML-based locator
    :return: Generator[name] or Generator[name, text] or Generator[name, text, coords]
    """

So I've tried giving the 'locator_type' as AnnotationLocatorType.MlWordVectorBased for the get_definitions() function, then I'm getting this error.

"parser_ml_classifier" object should be initialized (call load_compressed method)

I've gone through the definitions file and I can see this in line 43
parser_ml_classifier = LayeredDefinitionDetector()

I tried to run the load_compressed method inside the LayeredDefinitionDetector() but it is asking for a file_path and I don't understand which file path should be given. Am I missing something, could anyone guide me on how to use the ML models for definitions? Thanks!!

Documentation for case 3 and case 1 in definition extraction methods

Documentation for case 3 and case 1 in definition extraction methods is the same (ref:

# Case 3. Term is without quotes, is preceded by word|term|phrase or :,.^
) however, case #3 seems to only work for Title case or upper case words followed by something from the strong trigger list, and case #1 must necessarily have the words "word(s)|term(s)|phrase(s)" preceding the Term.

ETA for new GitHub?

I posted this in a more relevant issue, but it's now closed - so just in case:

I wonder if you have a general idea as to when the new GitHub repository is available? I'm really, really, interested in playing with Lexnlp, but I'd rather wait for the most recent version.

Additionally, if the new version could also contain a tutorial (preferably in a jupyter notebook) that would be great!

Problem installing with poetry

Info

Python 3.8.5 or Python 3.6.12
Poetry 1.0.10

Installation Attempt

poetry new nlp
cd nlp
poetry add lexnlp
...
[SolverProblemError]
Because datefinder-lexpredict (0.6.2) depends on regex (2017.9.23)
 and no versions of datefinder-lexpredict match >0.6.2,<0.7.0, datefinder-lexpredict (>=0.6.2,<0.7.0) requires regex (2017.9.23).
So, because nlp depends on both regex (^2020.7.14) and datefinder-lexpredict (^0.6.2), version solving failed.

Question

Has anyone else experienced these dependency version issues? I've also tried using pipenv and pip, which both return a similar error to above. Other libraries install fine with either tool.

Missing dates, fine-tuning dates model.

Hello, guys, thanks you are making a really cool tool.
I've faced with number of missing dates as wel ass false positives.
Like:

  • n\nMaster Transportation Agreement, April 2020.\n\n22777 Springwoods Village - Aprill 2020 will not be recognized as next sentence starts from digits.
  • record of positive Sept 1st, 2005 earnings - also not recognized if day with '-st', in your model only '-th' taken into account.
    And this is strange, as even get_raw_dates not recognize it. But if I use modified forked datefinder from your repo - it's ok: datetime.datetime(2005, 9, 1, 0, 0), 'Sept 1st, 2005'),.
  • It could be 12.1987 or in 01.1988 - not recognizing date if month is in digit format.

I want to ask if there is an easy way for users to fine-tune your model?

Not installable with pipenv

As there are already issues regarding installation which are not resolved. Here is another one. I am using Python 3.6.6 and Pipenv for installing.
It would be nice if it would work. The project seems highly interesting.
Can you tell me what the problem is? Maybe I can help solving it?


Repsonse:

[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/resolver.py", line 69, in resolve
[pipenv.exceptions.ResolutionFailure]:       req_dir=requirements_dir
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 726, in resolve_deps
[pipenv.exceptions.ResolutionFailure]:       req_dir=req_dir,
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 480, in actually_resolve_deps
[pipenv.exceptions.ResolutionFailure]:       resolved_tree = resolver.resolve()
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 395, in resolve
[pipenv.exceptions.ResolutionFailure]:       raise ResolutionFailure(message=str(e))
[pipenv.exceptions.ResolutionFailure]:       pipenv.exceptions.ResolutionFailure: ERROR: ERROR: Could not find a version that matches urllib
[pipenv.exceptions.ResolutionFailure]:       No versions found
[pipenv.exceptions.ResolutionFailure]: Warning: Your dependencies could not be resolved. You likely have a mismatch in your sub-dependencies.
  First try clearing your dependency cache with $ pipenv lock --clear, then try the original command again.
 Alternatively, you can use $ pipenv install --skip-lock to bypass this mechanism, then run $ pipenv graph to inspect the situation.
  Hint: try $ pipenv lock --pre if it is a pre-release dependency.
ERROR: ERROR: Could not find a version that matches urllib
No versions found
Was https://pypi.org/simple reachable?
[pipenv.exceptions.ResolutionFailure]:       req_dir=requirements_dir
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 726, in resolve_deps
[pipenv.exceptions.ResolutionFailure]:       req_dir=req_dir,
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 480, in actually_resolve_deps
[pipenv.exceptions.ResolutionFailure]:       resolved_tree = resolver.resolve()
[pipenv.exceptions.ResolutionFailure]:   File "/usr/local/Cellar/pipenv/2018.11.26_2/libexec/lib/python3.7/site-packages/pipenv/utils.py", line 395, in resolve
[pipenv.exceptions.ResolutionFailure]:       raise ResolutionFailure(message=str(e))
[pipenv.exceptions.ResolutionFailure]:       pipenv.exceptions.ResolutionFailure: ERROR: ERROR: Could not find a version that matches urllib
[pipenv.exceptions.ResolutionFailure]:       No versions found
[pipenv.exceptions.ResolutionFailure]: Warning: Your dependencies could not be resolved. You likely have a mismatch in your sub-dependencies.
  First try clearing your dependency cache with $ pipenv lock --clear, then try the original command again.
 Alternatively, you can use $ pipenv install --skip-lock to bypass this mechanism, then run $ pipenv graph to inspect the situation.
  Hint: try $ pipenv lock --pre if it is a pre-release dependency.
ERROR: ERROR: Could not find a version that matches urllib
No versions found
Was https://pypi.org/simple reachable?```

Working Examples for title extraction

Hello,

Using the get_titles functions I haven't been able to extract titles (the output list is empty).
Do you have some working examples (such as the ones provided for the extraction task) ?

Bigram and trigram collection files are not as listed

It appears that all collocation_bigram_*.pickle files are the same; they are smaller than reported and all contain the same exact list.

In [1]: import pickle
In [2]: BIGRAM_COLLOCATIONS_100 = pickle.load(open("collocation_bigrams_100.pickle", 'rb'))
In [3]: BIGRAM_COLLOCATIONS_1000 = pickle.load(open("collocation_bigrams_1000.pickle", 'rb'))
In [4]: BIGRAM_COLLOCATIONS_10000 = pickle.load(open("collocation_bigrams_10000.pickle", 'rb'))
In [5]: len(BIGRAM_COLLOCATIONS_100)
Out[5]: 46

In [6]: len(BIGRAM_COLLOCATIONS_1000)
Out[6]: 46

In [7]: len(BIGRAM_COLLOCATIONS_10000)
Out[7]: 46

There's a similar issue with trigrams

In [10]: TRIGRAM_COLLOCATIONS_1000 = pickle.load(open("collocation_trigrams_1000.pickle", 'rb'))
In [11]: TRIGRAM_COLLOCATIONS_10000 = pickle.load(open("collocation_trigrams_10000.pickle", 'rb'))
In [12]: len(TRIGRAM_COLLOCATIONS_100)
Out[12]: 100

In [13]: len(TRIGRAM_COLLOCATIONS_1000)
Out[13]: 431

In [14]: len(TRIGRAM_COLLOCATIONS_10000)
Out[14]: 431

Doc or Sample request for Segmentation

I'm test driving your NLP, and I'm interested in segment recognition of Paragraphs. I see you stubbed your docs Segmentation and related methods for real-world text. May I nudge you to produce a code sample? How should I detect 1.2.2.2 is "Subparagraph 2 of Section one subsection 2 paragraph 2"?

My scenario is text of bulleted lists

1 (1) Section one subsection 1
    (2) Section one subsection 2
        (a) Paragraph 1 of Section one subsection 2
        (b) Paragraph 2 of Section one subsection 2
            (i) Subparagraph 1 of Section one subsection 2 paragraph 2
            (ii) Subparagraph 2 of Section one subsection 2 paragraph 2
        (c) Paragraph 3 of Section one subsection 2
    (3) Section one subsection 3
2 (1) Section two subsection 1

Import Error: lexnlp.extract.en.dates

I have installed lexnlp 1.4.0, and am receiving import errors when trying to run the above function.

Error attached

image

Please note that I have installed all of the requirements, including dateparser.

Couldn't able to extract address from text or string

I wrote like this.
from lexnlp.extract.en.addresses import address_feature
str = "Vistra Corporate Services Centre Wickhams Cay II Road Town Tortola VG1110 British Virgin Islands"
print("address:", list(lexnlp.extract.en.addresses.address_features.get_word_features(str,part_of_speech="NP")))
but am getting results like this: address: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Can you help this, how to extract address for strings or text files???what i need to pass in the palce of part_of_speech???? what is the result am geeting it is binary i can't understand that??bcz am bigginer to nlp thing.am waiting for your positive response.
(sorry for my english words).

Cannot install Version 2.1.0

Installation of version 2.1.0 fails due to a dependency conflict

The conflict is caused by:                                                                                                                                           
    lexnlp 2.1.0 depends on regex==2020.11.13                                                                                                                 
    datefinder-lexpredict 0.6.2.1 depends on regex==2020.7.14 


The same problem holds for version 2.0.0

Python=3.6
OS = Ubuntu 18.04.6

Installation problem on Windows

Hi LexPredict,

I am trying to install lexnlp on windows and I'm unable to solve the error listed below when running python setup.py install. I have installed numpy and scipy. How can I solve this error?

error: Setup script exited with error: Command "C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\eucje\Anaconda3\lib\site-packages\numpy\core\include -IC:\Users\eucje\Anaconda3\lib\site-packages\numpy\core\include -IC:\Users\eucje\Anaconda3\include -IC:\Users\eucje\Anaconda3\include -I"C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\include" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tpsklearn\cluster\_dbscan_inner.cpp /Fobuild\temp.win-amd64-3.7\Release\sklearn\cluster\_dbscan_inner.obj /Zm1000" failed with exit status 2

This is the full output.

running bdist_egg
running egg_info
writing lexnlp.egg-info\PKG-INFO
writing dependency_links to lexnlp.egg-info\dependency_links.txt
writing requirements to lexnlp.egg-info\requires.txt
writing top-level names to lexnlp.egg-info\top_level.txt
reading manifest file 'lexnlp.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.xml' under directory 'lexnlp\extract\en\addresses'
writing manifest file 'lexnlp.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
creating build\bdist.win-amd64\egg
creating build\bdist.win-amd64\egg\lexnlp
creating build\bdist.win-amd64\egg\lexnlp\config
creating build\bdist.win-amd64\egg\lexnlp\config\en
copying build\lib\lexnlp\config\en\company_types.csv -> build\bdist.win-amd64\egg\lexnlp\config\en
copying build\lib\lexnlp\config\en\company_types.py -> build\bdist.win-amd64\egg\lexnlp\config\en
copying build\lib\lexnlp\config\en\geoentities_config.py -> build\bdist.win-amd64\egg\lexnlp\config\en
copying build\lib\lexnlp\config\en\__init__.py -> build\bdist.win-amd64\egg\lexnlp\config\en
copying build\lib\lexnlp\config\stanford.py -> build\bdist.win-amd64\egg\lexnlp\config
copying build\lib\lexnlp\config\__init__.py -> build\bdist.win-amd64\egg\lexnlp\config
creating build\bdist.win-amd64\egg\lexnlp\extract
creating build\bdist.win-amd64\egg\lexnlp\extract\en
creating build\bdist.win-amd64\egg\lexnlp\extract\en\addresses
copying build\lib\lexnlp\extract\en\addresses\addresses.py -> build\bdist.win-amd64\egg\lexnlp\extract\en\addresses
copying build\lib\lexnlp\extract\en\addresses\addresses_clf.pickle -> build\bdist.win-amd64\egg\lexnlp\extract\en\addresses
copying build\lib\lexnlp\extract\en\addresses\address_features.py -> build\bdist.win-amd64\egg\lexnlp\extract\en\addresses
creating build\bdist.win-amd64\egg\lexnlp\extract\en\addresses\data
copying build\lib\lexnlp\extract\en\addresses\data\building_suffixes.csv -> build\bdist.win-amd64\egg\lexnlp\extract\en\addresses\data
copying build\lib\lexnlp\extract\en\addresses\data\nltk_pos_tag_indexes.json -> build\bdist.win-amd64\egg\lexnlp\extract\en\addresses\data
copying build\lib\lexnlp\extract\en\addresses\data\provinces.txt -> build\bdist.win-amd64\egg\lexnlp\extract\en\addresses\data
copying build\lib\lexnlp\extract\en\addresses\data\street_directions.csv -> build\bdist.win-amd64\egg\lexnlp\extract\en\addresses\data
copying build\lib\lexnlp\extract\en\addresses\data\street_suffixes.csv -> build\bdist.win-amd64\egg\lexnlp\extract\en\addresses\data
copying build\lib\lexnlp\extract\en\addresses\__init__.py -> build\bdist.win-amd64\egg\lexnlp\extract\en\addresses
copying build\lib\lexnlp\extract\en\amounts.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\citations.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\conditions.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\constraints.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
creating build\bdist.win-amd64\egg\lexnlp\extract\en\contracts
creating build\bdist.win-amd64\egg\lexnlp\extract\en\contracts\data
copying build\lib\lexnlp\extract\en\contracts\data\d2v_all_size100_window10.model.part.aa -> build\bdist.win-amd64\egg\lexnlp\extract\en\contracts\data
copying build\lib\lexnlp\extract\en\contracts\data\d2v_all_size100_window10.model.part.ab -> build\bdist.win-amd64\egg\lexnlp\extract\en\contracts\data
copying build\lib\lexnlp\extract\en\contracts\data\is_contract_classifier.pickle -> build\bdist.win-amd64\egg\lexnlp\extract\en\contracts\data
copying build\lib\lexnlp\extract\en\contracts\detector.py -> build\bdist.win-amd64\egg\lexnlp\extract\en\contracts
copying build\lib\lexnlp\extract\en\contracts\__init__.py -> build\bdist.win-amd64\egg\lexnlp\extract\en\contracts
copying build\lib\lexnlp\extract\en\copyright.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\courts.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\dates.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\date_model.pickle -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\definitions.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\dict_entities.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\distances.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\durations.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
creating build\bdist.win-amd64\egg\lexnlp\extract\en\entities
copying build\lib\lexnlp\extract\en\entities\nltk_maxent.py -> build\bdist.win-amd64\egg\lexnlp\extract\en\entities
copying build\lib\lexnlp\extract\en\entities\nltk_re.py -> build\bdist.win-amd64\egg\lexnlp\extract\en\entities
copying build\lib\lexnlp\extract\en\entities\stanford_ner.py -> build\bdist.win-amd64\egg\lexnlp\extract\en\entities
copying build\lib\lexnlp\extract\en\entities\__init__.py -> build\bdist.win-amd64\egg\lexnlp\extract\en\entities
copying build\lib\lexnlp\extract\en\geoentities.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\money.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\percents.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\pii.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\ratios.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\regulations.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\trademarks.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\urls.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\utils.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\en\__init__.py -> build\bdist.win-amd64\egg\lexnlp\extract\en
copying build\lib\lexnlp\extract\__init__.py -> build\bdist.win-amd64\egg\lexnlp\extract
creating build\bdist.win-amd64\egg\lexnlp\nlp
creating build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\collocation_bigrams_100.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\collocation_bigrams_1000.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\collocation_bigrams_10000.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\collocation_bigrams_100000.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\collocation_bigrams_50000.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\collocation_trigrams_100.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\collocation_trigrams_1000.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\collocation_trigrams_10000.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\collocation_trigrams_100000.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\collocation_trigrams_50000.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en
creating build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\pages.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\page_segmenter.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\paragraphs.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\paragraph_segmenter.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\sections.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\section_segmenter.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\sentences.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\sentence_segmenter.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\titles.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\title_locator.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\utils.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\segments\__init__.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en\segments
copying build\lib\lexnlp\nlp\en\stanford.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\stopwords.pickle -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\en\tokens.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en
creating build\bdist.win-amd64\egg\lexnlp\nlp\en\transforms
copying build\lib\lexnlp\nlp\en\transforms\characters.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en\transforms
copying build\lib\lexnlp\nlp\en\transforms\tokens.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en\transforms
copying build\lib\lexnlp\nlp\en\transforms\__init__.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en\transforms
copying build\lib\lexnlp\nlp\en\__init__.py -> build\bdist.win-amd64\egg\lexnlp\nlp\en
copying build\lib\lexnlp\nlp\__init__.py -> build\bdist.win-amd64\egg\lexnlp\nlp
creating build\bdist.win-amd64\egg\lexnlp\utils
copying build\lib\lexnlp\utils\decorators.py -> build\bdist.win-amd64\egg\lexnlp\utils
creating build\bdist.win-amd64\egg\lexnlp\utils\unicode
copying build\lib\lexnlp\utils\unicode\unicode_character_categories.pickle -> build\bdist.win-amd64\egg\lexnlp\utils\unicode
copying build\lib\lexnlp\utils\unicode\unicode_character_category_mapping.pickle -> build\bdist.win-amd64\egg\lexnlp\utils\unicode
copying build\lib\lexnlp\utils\unicode\unicode_character_top_category_mapping.pickle -> build\bdist.win-amd64\egg\lexnlp\utils\unicode
copying build\lib\lexnlp\utils\unicode\unicode_lookup.py -> build\bdist.win-amd64\egg\lexnlp\utils\unicode
copying build\lib\lexnlp\utils\unicode\__init__.py -> build\bdist.win-amd64\egg\lexnlp\utils\unicode
copying build\lib\lexnlp\utils\__init__.py -> build\bdist.win-amd64\egg\lexnlp\utils
copying build\lib\lexnlp\__init__.py -> build\bdist.win-amd64\egg\lexnlp
byte-compiling build\bdist.win-amd64\egg\lexnlp\config\en\company_types.py to company_types.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\config\en\geoentities_config.py to geoentities_config.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\config\en\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\config\stanford.py to stanford.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\config\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\addresses\addresses.py to addresses.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\addresses\address_features.py to address_features.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\addresses\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\amounts.py to amounts.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\citations.py to citations.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\conditions.py to conditions.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\constraints.py to constraints.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\contracts\detector.py to detector.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\contracts\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\copyright.py to copyright.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\courts.py to courts.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\dates.py to dates.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\definitions.py to definitions.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\dict_entities.py to dict_entities.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\distances.py to distances.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\durations.py to durations.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\entities\nltk_maxent.py to nltk_maxent.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\entities\nltk_re.py to nltk_re.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\entities\stanford_ner.py to stanford_ner.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\entities\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\geoentities.py to geoentities.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\money.py to money.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\percents.py to percents.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\pii.py to pii.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\ratios.py to ratios.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\regulations.py to regulations.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\trademarks.py to trademarks.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\urls.py to urls.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\utils.py to utils.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\en\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\extract\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\segments\pages.py to pages.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\segments\paragraphs.py to paragraphs.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\segments\sections.py to sections.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\segments\sentences.py to sentences.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\segments\titles.py to titles.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\segments\utils.py to utils.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\segments\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\stanford.py to stanford.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\tokens.py to tokens.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\transforms\characters.py to characters.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\transforms\tokens.py to tokens.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\transforms\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\en\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\nlp\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\utils\decorators.py to decorators.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\utils\unicode\unicode_lookup.py to unicode_lookup.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\utils\unicode\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\utils\__init__.py to __init__.cpython-37.pyc
byte-compiling build\bdist.win-amd64\egg\lexnlp\__init__.py to __init__.cpython-37.pyc
creating build\bdist.win-amd64\egg\EGG-INFO
copying lexnlp.egg-info\PKG-INFO -> build\bdist.win-amd64\egg\EGG-INFO
copying lexnlp.egg-info\SOURCES.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying lexnlp.egg-info\dependency_links.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying lexnlp.egg-info\requires.txt -> build\bdist.win-amd64\egg\EGG-INFO
copying lexnlp.egg-info\top_level.txt -> build\bdist.win-amd64\egg\EGG-INFO
zip_safe flag not set; analyzing archive contents...
lexnlp.__pycache__.__init__.cpython-37: module references __file__
lexnlp.config.en.__pycache__.company_types.cpython-37: module references __file__
lexnlp.extract.en.__pycache__.dates.cpython-37: module references __file__
lexnlp.extract.en.addresses.__pycache__.address_features.cpython-37: module references __file__
lexnlp.extract.en.addresses.__pycache__.addresses.cpython-37: module references __file__
lexnlp.extract.en.contracts.__pycache__.detector.cpython-37: module references __file__
lexnlp.nlp.en.__pycache__.tokens.cpython-37: module references __file__
lexnlp.nlp.en.segments.__pycache__.pages.cpython-37: module references __file__
lexnlp.nlp.en.segments.__pycache__.paragraphs.cpython-37: module references __file__
lexnlp.nlp.en.segments.__pycache__.sections.cpython-37: module references __file__
lexnlp.nlp.en.segments.__pycache__.sentences.cpython-37: module references __file__
lexnlp.nlp.en.segments.__pycache__.titles.cpython-37: module references __file__
lexnlp.nlp.en.transforms.__pycache__.characters.cpython-37: module references __file__
lexnlp.nlp.en.transforms.__pycache__.tokens.cpython-37: module references __file__
lexnlp.utils.unicode.__pycache__.unicode_lookup.cpython-37: module references __file__
creating 'dist\lexnlp-0.2.2-py3.7.egg' and adding 'build\bdist.win-amd64\egg' to it
removing 'build\bdist.win-amd64\egg' (and everything under it)
Processing lexnlp-0.2.2-py3.7.egg
removing 'c:\users\eucje\anaconda3\lib\site-packages\lexnlp-0.2.2-py3.7.egg' (and everything under it)
creating c:\users\eucje\anaconda3\lib\site-packages\lexnlp-0.2.2-py3.7.egg
Extracting lexnlp-0.2.2-py3.7.egg to c:\users\eucje\anaconda3\lib\site-packages
lexnlp 0.2.2 is already the active version in easy-install.pth

Installed c:\users\eucje\anaconda3\lib\site-packages\lexnlp-0.2.2-py3.7.egg
Processing dependencies for lexnlp==0.2.2
Searching for scikit-learn==0.19.1
Reading https://pypi.org/simple/scikit-learn/
Downloading https://files.pythonhosted.org/packages/f5/2c/5edf2488897cad4fb8c4ace86369833552615bf264460ae4ef6e1f258982/scikit-learn-0.19.1.tar.gz#sha256=5ca0ad32ee04abe0d4ba02c8d89d501b4e5e0304bdf4d45c2e9875a735b323a0
Best match: scikit-learn 0.19.1
Processing scikit-learn-0.19.1.tar.gz
Writing C:\Users\eucje\AppData\Local\Temp\easy_install-ngq_579w\scikit-learn-0.19.1\setup.cfg
Running scikit-learn-0.19.1\setup.py -q bdist_egg --dist-dir C:\Users\eucje\AppData\Local\Temp\easy_install-ngq_579w\scikit-learn-0.19.1\egg-dist-tmp-yuvo953l
Partial import of sklearn during the build process.
Could not locate executable g77
Could not locate executable f77
Could not locate executable ifort
Could not locate executable ifl
Could not locate executable f90
Could not locate executable DF
Could not locate executable efl
C:\Users\eucje\Anaconda3\lib\site-packages\numpy\distutils\system_info.py:625: UserWarning:
    Atlas (http://math-atlas.sourceforge.net/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [atlas]) or by setting
    the ATLAS environment variable.
  self.calc_info()
C:\Users\eucje\Anaconda3\lib\site-packages\numpy\distutils\system_info.py:625: UserWarning:
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.
  self.calc_info()
C:\Users\eucje\Anaconda3\lib\site-packages\numpy\distutils\system_info.py:625: UserWarning:
    Blas (http://www.netlib.org/blas/) sources not found.
    Directories to search for the sources can be specified in the
    numpy/distutils/site.cfg file (section [blas_src]) or by setting
    the BLAS_SRC environment variable.
  self.calc_info()
sklearn\setup.py:72: UserWarning:
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.
  warnings.warn(BlasNotFoundError.__doc__)
Missing compiler_cxx fix for MSVCCompiler
Missing compiler_cxx fix for MSVCCompiler
_dbscan_inner.cpp
c:\users\eucje\anaconda3\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(12) : Warning Msg: Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
sklearn\cluster\_dbscan_inner.cpp(5960): error C2039: 'exc_type': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(5961): error C2039: 'exc_value': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(5962): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(5969): error C2039: 'exc_type': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(5970): error C2039: 'exc_value': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(5971): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(5972): error C2039: 'exc_type': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(5973): error C2039: 'exc_value': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(5974): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(6029): error C2039: 'exc_type': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(6030): error C2039: 'exc_value': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(6031): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(6032): error C2039: 'exc_type': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(6033): error C2039: 'exc_value': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
sklearn\cluster\_dbscan_inner.cpp(6034): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\eucje\anaconda3\include\pystate.h(209): note: see declaration of '_ts'
error: Setup script exited with error: Command "C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\eucje\Anaconda3\lib\site-packages\numpy\core\include -IC:\Users\eucje\Anaconda3\lib\site-packages\numpy\core\include -IC:\Users\eucje\Anaconda3\include -IC:\Users\eucje\Anaconda3\include -I"C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\include" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tpsklearn\cluster\_dbscan_inner.cpp /Fobuild\temp.win-amd64-3.7\Release\sklearn\cluster\_dbscan_inner.obj /Zm1000" failed with exit status 2```

Unable to extract address from text file or string

I wrote like this.
from lexnlp.extract.en.addresses import address_feature
str = "Vistra Corporate Services Centre Wickhams Cay II Road Town Tortola VG1110 British Virgin Islands"
print("address:", list(lexnlp.extract.en.addresses.address_features.get_word_features(str,part_of_speech="NP")))
but am getting results like this: address: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Can you help this, how to extract address for strings or text files???what i need to pass in the palce of part_of_speech???? what is the result am geeting it is binary i can't understand that??.bcz am bigginer to nlp thing.am waiting for your positive response.
(sorry for my english words).

Which api i should use to identify clause type

Hi,
In the read me file it was said "Pre-trained classifiers for document type and clause type". can anyone from LexNLP help me to find what API to use and also what kind of clauses it will find in the legal document (file).

get_companies not working

Two problems:

  1. running the example in your docs returns faulty results
import lexnlp.extract.en.entities.nltk_re
text = "This is Deutsche Bank Securities Inc."
print(list(lexnlp.extract.en.entities.nltk_re.get_companies(text)))
# [('This is Deutsche Bank Securities', 'Inc', 'Bank')]

I didn't expect "This is" to be included in the result. That's of no value at.

  1. running the example with different input doesn't work at all
import lexnlp.extract.en.entities.nltk_re
text = "Google acquired Tableau"
print(list(lexnlp.extract.en.entities.nltk_re.get_companies(text)))
# [ ]

The result is empty. I expected to see Google and Tableau.

Link to faulty example: https://lexpredict-lexnlp.readthedocs.io/en/latest/modules/extract/en/companies.html#extract-en-companies

ValueError - sections.get_sections()

Encountering a ValueError when running get_sections()

Specifically:

Number of features of the model must match the input

Number Features in Test: 313
Features in Model: 369

It seems not all the same features are being captured within build_section_break_features()?

Paragraph segmentation does not work

Hi, I tried long string to convert it to paragraphs but it fails even changing parameters.

Here is the test sample.

import lexnlp.nlp.en.segments.paragraphs as p
paras = p.get_paragraph_list("We will host your documentation for free, forever. There are no tricks. We help over 100,000 open source projects share their docs, including a custom domain and theme. Whenever you push code to your favorite version control service, whether that is GitHub, BitBucket, or GitLab, we will automatically build your docs so your code and documentation are never out of sync. We build and host your docs for the web, but they are also viewable as PDFs, as single page HTML, and for eReaders. No additional configuration is required. We can host and build multiple versions of your docs so having a 1.0 version of your docs and a 2.0 version of your docs is as easy as having a separate branch or tag in your version control system. Read the Docs simplifies software documentation by automating building, versioning, and hosting of your docs for you. We fund our operations through advertising, corporate-hosted documentation with Read the Docs for Business, donations, and we are supported by a number of generous sponsors. Read the Docs is open source and community supported. It depends on users like you to contribute to development, support, and operations. You can learn more about how to contribute in our docs. Thanks so much to our wonderful team who helps us run the site. Read the Docs wouldn't be possible without them.")

for para in paras:
    print(para)

and the output is a single string.

We will host your documentation for free, forever. There are no tricks. We help over 100,000 open source projects share their docs, including a custom domain and theme. Whenever you push code to your favorite version control service, whether that is GitHub, BitBucket, or GitLab, we will automatically build your docs so your code and documentation are never out of sync. We build and host your docs for the web, but they are also viewable as PDFs, as single page HTML, and for eReaders. No additional configuration is required. We can host and build multiple versions of your docs so having a 1.0 version of your docs and a 2.0 version of your docs is as easy as having a separate branch or tag in your version control system. Read the Docs simplifies software documentation by automating building, versioning, and hosting of your docs for you. We fund our operations through advertising, corporate-hosted documentation with Read the Docs for Business, donations, and we are supported by a number of generous sponsors. Read the Docs is open source and community supported. It depends on users like you to contribute to development, support, and operations. You can learn more about how to contribute in our docs. Thanks so much to our wonderful team who helps us run the site. Read the Docs wouldn't be possible without them.

Am I missing something here? I tried different parameter values.

score_threshold=0.1
score_threshold=0.2
score_threshold=0.3
score_threshold=0.5
score_threshold=0.7
# window
window_pre=3
window_pre=2
window_pre=1
window_pre=5
window_pre=7
window_post=1
window_post=2
window_post=3
window_post=4
window_post=5
window_post=7

The result is the same.

Is there a collection of n-grams/common legal phrases, with capitalisation and punctuation intact?

Hi there,

I'm creating an app to accelerate legal document writing by utilising predictive lookahead. See here for an example: http://bit.ly/2LRQ5Nc

I'm using data from the trigrams collocation pickle file to help pre-populate suggested phrases. It integrates really nicely, but from a pragmatic viewpoint there's a couple of issues.

  • It appears these trigrams have (understandably) been normalised. E.g. A sample string from the array is united states dollars. This is problematic as you'd obviously want the app to autocomplete this phrase as United States dollars.
    The only way I could rectify this is by manually inspecting all 100k trigrams; otherwise hopefully you have a file somewhere with non-normalised legal phrases you'd be willing to share? 😄

  • While the trigrams are a good start, 4- 5- or 6- grams that can hold some legal-specific context would be ideal... Any suggestions on where these could be obtained?

Btw, much respect for the work you guys have done. And if anyone's interested in the source code for my app I'll make a git repo for it 👍

Upgrade to Python 3.8

Hello,
Can we update the requirements.txt to support python 3.8 because some of the new features are in new packages can help this project a lot.

Help with datasets & retraining models

Hi there,
I would love to know how the lexnlp was trained and which dataset was used so we can understand the whole project in order to contribute properly to further improvements of the project. Any help would be much appreciated.

word2vec legal model

As stated in the paper the pre-trained models will be available from Oct 2017, but I could not find any of them in the repo. Can someone help me out it locating it.

Pickle Error

Receiving the following error where any pickle file is loaded _pickle.UnpicklingError: invalid load key, '\x02'

Eg. Line 35 sections.py

AttributeError: type object 'sklearn.tree._criterion.array' has no attribute '__reduce_cython__'

Hello,
I am evaluating your api and trying to use function get_sections but getting error. I get this error simply importing. Would you please advise

from lexnlp.nlp.en.segments.sections import get_sections

Same issue
from lexnlp.nlp.en.segments.titles import get_titles

I've installed dependencies according to: https://github.com/LexPredict/lexpredict-lexnlp/blob/master/python-requirements-full.txt

Error
~/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/tree/tree.py in
37 from ..utils.validation import check_is_fitted
38
---> 39 from ._criterion import Criterion
40 from ._splitter import Splitter
41 from ._tree import DepthFirstTreeBuilder

~/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/tree/_criterion.cpython-36m-darwin.so in init sklearn.tree._criterion()

AttributeError: type object 'sklearn.tree._criterion.array' has no attribute 'reduce_cython'

Dependencies
deps.txt

Extract only month or year but not full date

Hi
If text containing word like OCTOBER it is converting it into datetime.date(2022, 10, 1) such that it is converting it into current year and current date which I do not want. SO if text is having only month present then it should only return me the month name. How can I achieve this.

Importing lexnlp.extract.en.entities.stanford_ner

Executing the above line produces an error:

ImportError: cannot import name 'StanfordTokenizer'

which seems related to this issue: epfml/sent2vec#25

The solution is probably to modify the following line in nlp\en\stanford.py:

from nltk.tokenize import StanfordTokenizer

to the following:

from nltk.tokenize.stanford import StanfordTokenizer

Thank you!

Not able to Extract "multiple" dates using get_date

>>>import lexnlp.extract.en.dates
>>> text = "This agreement is dated on 15th july 2018. This agreement shall terminate on the 15th day of March, 2020. "
>>> print(list(lexnlp.extract.en.dates.get_dates(text)))
[datetime.date(2020, 3, 15)]

currently the get_dates, get_raw_date_list method giving me only the last occurrence of date entity. In above text, i expected 15th july 2018 along with 15th march 2020.

Is there a way to grab all dates from a text/sentence?

Edit:
Probably the issue is: the first date in my text was not recognized hence not extracted. Here is the example:

>>> text = "AUTO XX IF SSR TKNA/E OR FA NOT RCVD BY RJ BY 29MAY19 1350 DOH LT,REF IATA PRVD PAX"
>>> list(lexnlp.extract.en.dates.get_raw_dates(text))
[]

"Failed building wheel for pandas" while installing lexnlp with pip

Python version - 3.8.6
lexnlp version - 1.8.0
Windows Server 2016

I've created a fresh virtual environment and ran pip install lexnlp with -vvv flag for detailed information about the installation process. Please find the complete verbose in the text file below.
https://drive.google.com/file/d/1v9S7C9Ht0lwxffyiSTrcgncPASLWv2x6/view?usp=sharing

Initially, I realized that my server doesn't have Microsoft visual c++ 14.0 plus, I've installed Visual Studio Build Tools 2019. It kind of stopped giving that error and started to give new error. Please refer the text file in the link provided.

Install failure for 2.1.0 and 2.2.0 on Win10/Python 3.9 due to sklearn==0.23.1

Attempting to pip install lexnlp currently pulls 2.1.0 from pypi. This fails to install on Win10/Python 3.9 and apparently M1 MacBooks. Downloading the current master and installing from zip encounters similar issues.

The issue is scikit learn version 0.23.1 failing to install due to changes made in numpy, resulting in the below error even when a sufficient numpy is installed.

Importing the numpy c-extensions failed.
[...]
      ImportError: numpy is not installed.
      scikit-learn requires numpy >= 1.13.3.
      Installation instructions are available on the scikit-learn website: http://scikit-learn.org/stable/install.html

Was able to workaround and run two test examples in the docs, but havent fully tested, by installing current master with requirements set to the following in setup.py

...
python_requires='>=3.6',
...
        'cloudpickle==2.1.0',
        'dateparser==1.1.1',
        'gensim==4.1.2',
        'joblib==1.1.0',
        'nltk==3.7',
        'num2words==0.5.10',
        'numpy>=1.13.1',
        'pandas>=1.1.5',
        'pycountry==22.3.5',
        'regex==2022.3.2',
        'reporters-db==3.2.18',
        'requests==2.27.1',
        'scipy==1.8.1',
        'scikit-learn==0.24.2',
        'tzlocal==2.1',
        'tqdm>=4.36.0',
        'Unidecode==1.3.4',
        'us==2.0.2',
        'zahlwort2num==0.3.0'

Can I suggest using less rigid requirements? This package is often going to be use as part of a workflow, and rigidly pinning not only causes install issues when those deps start to age (sklearn 0.23.1 is 2 years old) but it also unnecessarily forces your package to be the driver of install requirements for the system its a part of.

EDIT: This doesnt work as there are breaking changes from sklearn 0.23.1 -> 0.24, in particular when loading the pickle from addresses.py sklearn 0.24 throws the error:
ModuleNotFoundError: No module named 'sklearn.tree.tree'

Install on MacBook Pro MI consistently fails at the same point

Here is the traceback. Less than one year old MacBook Pro M1. Have tried uninstalling and reinstalling scipy, numpy and other packages, no luck. Apparently, I cannot use LexNLP on a Mac M1 then?

Error message below:

Collecting scipy==1.5.1
Using cached scipy-1.5.1.tar.gz (25.6 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... error
ERROR: Command errored out with exit status 1:
command: /Users/johntaylor/Programming/PycharmProjects/Python_code_Examples/ORC_Scrape/virtual/bin/python3 /Users/johntaylor/Programming/PycharmProjects/Python_code_Examples/ORC_Scrape/virtual/lib/python3.9/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /var/folders/lk/0t8r88js023ddfc2z8s74v8m0000gn/T/tmpeedpvclp
cwd: /private/var/folders/lk/0t8r88js023ddfc2z8s74v8m0000gn/T/pip-install-dfxm71d1/scipy
Complete output (37 lines):
setup.py:460: UserWarning: Unrecognized setuptools command ('dist_info --egg-base /private/var/folders/lk/0t8r88js023ddfc2z8s74v8m0000gn/T/pip-modern-metadata-p8tcxpn6'), proceeding with generating Cython sources and expanding templates
warnings.warn("Unrecognized setuptools command ('{}'), proceeding with "
Running from SciPy source directory.
/private/var/folders/lk/0t8r88js023ddfc2z8s74v8m0000gn/T/pip-build-env-6k0_ynk3/overlay/lib/python3.9/site-packages/numpy/distutils/system_info.py:1712: UserWarning:
Lapack (http://www.netlib.org/lapack/) libraries not found.
Directories to search for the libraries can be specified in the
numpy/distutils/site.cfg file (section [lapack]) or by setting
the LAPACK environment variable.
if getattr(self, 'calc_info{}'.format(lapack))():
/private/var/folders/lk/0t8r88js023ddfc2z8s74v8m0000gn/T/pip-build-env-6k0_ynk3/overlay/lib/python3.9/site-packages/numpy/distutils/system_info.py:1712: UserWarning:
Lapack (http://www.netlib.org/lapack/) sources not found.
Directories to search for the sources can be specified in the
numpy/distutils/site.cfg file (section [lapack_src]) or by setting
the LAPACK_SRC environment variable.
if getattr(self, 'calc_info{}'.format(lapack))():
Traceback (most recent call last):
File "/Users/johntaylor/Programming/PycharmProjects/Python_code_Examples/ORC_Scrape/virtual/lib/python3.9/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in
main()
File "/Users/johntaylor/Programming/PycharmProjects/Python_code_Examples/ORC_Scrape/virtual/lib/python3.9/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/Users/johntaylor/Programming/PycharmProjects/Python_code_Examples/ORC_Scrape/virtual/lib/python3.9/site-packages/pip/_vendor/pep517/_in_process.py", line 133, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/private/var/folders/lk/0t8r88js023ddfc2z8s74v8m0000gn/T/pip-build-env-6k0_ynk3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 174, in prepare_metadata_for_build_wheel
self.run_setup()
File "/private/var/folders/lk/0t8r88js023ddfc2z8s74v8m0000gn/T/pip-build-env-6k0_ynk3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 267, in run_setup
super(_BuildMetaLegacyBackend,
File "/private/var/folders/lk/0t8r88js023ddfc2z8s74v8m0000gn/T/pip-build-env-6k0_ynk3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 158, in run_setup
exec(compile(code, file, 'exec'), locals())
File "setup.py", line 583, in
setup_package()
File "setup.py", line 579, in setup_package
setup(**metadata)
File "/private/var/folders/lk/0t8r88js023ddfc2z8s74v8m0000gn/T/pip-build-env-6k0_ynk3/overlay/lib/python3.9/site-packages/numpy/distutils/core.py", line 137, in setup
config = configuration()
File "setup.py", line 477, in configuration
raise NotFoundError(msg)
numpy.distutils.system_info.NotFoundError: No lapack/blas resources found. Note: Accelerate is no longer supported.

Stopwords

Hello,

I'm doing a NLP project and trying to use your tool which seems very interesting.

I was analysing the stopwords (stopwords.pickle) in the nlp folder here and couldn't find a difference between them and the stopwords in nitk. Am I looking in the wrong folder?

Thank you!

Get_dates expecting DateFinder to have attribute EXTRA_TOKENS_PATTERNS

I was running some tests to familiarize myself with the library and when trying dates I was getting an error In dates.py, the line:

For extra_token in date_finder.EXTRA_TOKENS_PATTERN.split(‘|’)

Generates the error in the trace of:

‘DateFinder’ object has no attribute ‘EXTRA_TOKENS_PATTERN’

When looking in more detail and looking at the installed files I notice that when I call pip list I see the following

  • datefinder 0.7.0
  • datefinder-lexpredict 0.6.2

Upon inspection of the code I see that the hyphenated version is a change. Given that I am reading that you cannot import hyphenated names, can I simply edit the code, or are there any dependencies on the hyphenated name

Clarity on dataset used for pre-training

According to the published paper LexNLP: Natural language processing and information extraction for legal and regulatory texts, the abstract states that LexNLP includes pre-trained models based on real documents from SEC EDGAR database. I want to clarify, does it mean that LexNLP captures entites such as Acts, Regulations and Citations based on the knowledge from pre-training ? Becasue I want to use LexNLP to extract these entities from documents belonging to categories like Abortion, Bankruptcy, Sentencing, Environmental Law etc. but after knowing the SEC EDGAR database is something that emphasizes on data related to investment, finance and capitalization, I am skeptical if LexNLP can extract legal entites from off domain categories mentioned above.

Not able to import lexnlp packages in eclipse

Hi, I am able to install lexnlp as mentioned in #6 for Python 3.6.8 and able to run sample code in the python interpreter but when I try to use the import the same packages in eclipse. In eclipse preference->pydev->python interpreter, I added the ~/.pyenv/versions/lexnlp/lib/python3.6/site-packages to the library but none of the the packages from lexnlp is not able to import. Can someone please hep

Lemmatization example

I am not able to locate the lemmatization code in the repository, can someone kindly let me know under which folder is it present.

lexnlp.extract.en.geoentities.get_geoentity_annotations returning the wrong location indexes

>>> import lexnlp.extract.en.geoentities
>>> text = "This Contract (“Contract”) is entered into by and between the City of Detroit, a Michigan municipal corporation"
>>> for geoentity in lexnlp.extract.en.geoentities.get_geoentity_annotations(text, _CONFIG):
>>> print(geoentity)
Michigan [geoentity] at (86..95), loc: en

Currently the get_geoentity_annotations is returning the wrong location indexes as shown in the example above, the right location indexes should be Michigan [geoentity] at (82..91), loc: en. I noticed that this behavior comes when the text variable contains ponctuations signs, so each time the get_geoentity_annotations parser face a ponctuation sign (eg. ,, (, ), , ) the location index is incremented by +2, in this way any geoentity occurs first before any ponctuation signs have got the right location indexes, on the other hand the ones that occur after have got the wrong location indexes.

How to install lexnlp in python 3.8

numpy.distutils.system_info.NotFoundError: no lapack/blas resources found

Rolling back uninstall of scipy
Moving to /home/user/python_projects/test/venv/lib/python3.8/site-packages/scipy->1.4.1.dist-info/
from /home/user/python_projects/test/venv/lib/python3.8/site-packages/~cipy-1.4.1.dist-> info
Moving to /home/user/python_projects/test/venv/lib/python3.8/site-packages/scipy/
from /home/user/python_projects/test/venv/lib/python3.8/site-packages/~cipy

Command "/home/user/python_projects/test/venv/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-install-fcs98xlz/scipy/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-record-ara2_jj3/install-record.txt --single-version-externally-managed --compile --install-headers /home/user/python_projects/test/venv/include/site/python3.8/scipy" failed with error code 1 in /tmp/pip-install-fcs98xlz/scipy/

lexpredict-lexnlp install failing

I am running pip install -r python-requirements.txt but it's throwing following error message 16 errors generated. error: Command gcc ... failed with exit status 1. I tried on both osx and ubuntu 16.04. Any pointer to fixing this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.