Code Monkey home page Code Monkey logo

philologic5's People

Contributors

clovis avatar pajusmar avatar pleonard212 avatar rwhaling avatar vincent-ferotin avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

philologic5's Issues

difficulties to install this last version

In a dockerfile build with the philologic Dockerfile (ubuntu:22.04) and wget https://github.com/ARTFL-Project/PhiloLogic5/archive/refs/heads/main.zip

Configuration:
Python 3.10.12

Package Version


gyp 0.1
pip 22.0.2
setuptools 59.6.0
wheel 0.37.1

i obtain this error:
bash install.sh
\n## INSTALLING PYTHON LIBRARY ##
Processing /tmp/PhiloLogic5-main/python
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [43 lines of output]
/usr/lib/python3/dist-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
warnings.warn(
/tmp/PhiloLogic5-main/python/.eggs/setuptools_scm-8.0.4-py3.10.egg/setuptools_scm/_integration/setuptools.py:30: RuntimeWarning:
ERROR: setuptools==59.6.0 is used in combination with setuptools_scm>=8.x

  Your build configuration is incomplete and previously worked by accident!
  setuptools_scm requires setuptools>=61
  
  Suggested workaround if applicable:
   - migrating from the deprecated setup_requires mechanism to pep517/518
     and using a pyproject.toml to declare build dependencies
     which are reliably pre-installed before running the build tools
  
    warnings.warn(
  WARNING setuptools_scm.pyproject_reading toml section missing 'pyproject.toml does not contain a tool.setuptools_scm section'
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/tmp/PhiloLogic5-main/python/setup.py", line 9, in <module>
      setup(
    File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
      return distutils.core.setup(**attrs)
    File "/usr/lib/python3.10/distutils/core.py", line 108, in setup
      _setup_distribution = dist = klass(attrs)
    File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 459, in __init__
      _Distribution.__init__(
    File "/usr/lib/python3.10/distutils/dist.py", line 292, in __init__
      self.finalize_options()
    File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 837, in finalize_options
      ep(self)
    File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 858, in _finalize_setup_keywords
      ep.load()(self, ep.name, value)
    File "/tmp/PhiloLogic5-main/python/.eggs/setuptools_scm-8.0.4-py3.10.egg/setuptools_scm/_integration/setuptools.py", line 101, in version_keyword
      _assign_version(dist, config)
    File "/tmp/PhiloLogic5-main/python/.eggs/setuptools_scm-8.0.4-py3.10.egg/setuptools_scm/_integration/setuptools.py", line 56, in _assign_version
      _version_missing(config)
    File "/tmp/PhiloLogic5-main/python/.eggs/setuptools_scm-8.0.4-py3.10.egg/setuptools_scm/_get_version_impl.py", line 112, in _version_missing
      raise LookupError(
  LookupError: setuptools-scm was unable to detect version for /tmp/PhiloLogic5-main.
  
  Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.
  
  For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.

Any idea to correct this error ?

Collocates filtered by word attribute

This entails:

  • Listing all word attributes and associated values present in database in web_config.
  • Make those attributes and values accessible in web_config object return at web app startup
  • Add filtering mechanism in search form for collocations
  • Add filtering in collocation report
  • Construct correct query from clicking on collocate links on the web

Add SpaCy as a load filter

Create a load filter to run Spacy against individual sentences to add various word-level properties such as lemma and pos.

Add end byte to philo id

This would extend current philo ids to 10 32 bit integers instead of 9. This shouldn't be difficult to implement, but requires many small changes across the Python code base. Too risky for 5.0. Delay to 5.1

Advantages:

  • easier highlighting
  • Tokens stored in the index can be multi-word tokens (e.g named-entities)

Downsides:

  • 4 bytes of extra disk space per token
  • Tiny impact on disk reads (probably not measurable on SSD/NVME drives)
  • Potential for bugs if we miss sections of the code that assume 9 32 bit ints for philo ids.

Add a reverse lookup table for lemma facets

To enable lemma frequencies on word facets we need a reverse lookup table where keys are philo_ids expressed as packed 32 bit ints (like in the standard inverted index), and the values are the lemma in the form "lemma:word"

Cosine similarity of sub-corpus collocates

When under collocations results, compare the current set of collocates to all other sets of collocates based on a particular metadata field.

Use case:
I'm looking at collocates of "sentiment" in Rousseau, I would to find which authors have the most similar collocate distribution

We should only be able to compare against one metadata field, e.g. author, or title. The fields available should be based on the default object level set in db.locals. The way it would work is that you retrieve all values for the selected metadata field, and then grab the collocates for that value.

Compare collocations over time

Create a time series for collocations. We could do this in different ways:

  • Get collocations for all periods to get a comparison point. Then compare to collocations for different periods. We can then show in a bar chart which period diverges the most from the comparison point.
  • Grab collocations for each period, then each period is compared to the previous to get a sense of the shift between each period. We then graph that.

Detect exact phrase search in web UI

Should happen before the query itself is sent to server:

  • check if query is two words or more
  • if so, check there are just two double quotes in string
  • if so check if double quotes are at the start and end of string
  • if so, set query type to exact_phrase

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.