Code Monkey home page Code Monkey logo

cozy-data-indexer's Introduction

Description

Little API that provides indexation and full-text search features to the Cozy Cloud Platform. It is based on Whoosh a Python indexation library.

Install / Hack

Get build dependencies

sudo apt-get install python python-pip python-dev libxml2-dev libxslt1-dev

Setup your virtual environment:

sudo pip install virtualenv
virtualenv virtualenv
. virtualenv/bin/activate

Install dependencies:

pip install -r requirements/common.txt

Start the server:

python server.py

Contribution

  • Bring Whoosh features to the REST API.
  • Pick and solve an issue

Tests

Build Status

Install development dependencies

pip install -r requirements/dev.txt

Run tests

lettuce tests

License

Cozy Data Indexer is developed by Cozy Cloud and distributed under the AGPL v3 license.

What is Cozy?

Cozy Logo

Cozy is a platform that brings all your web services in the same private space. With it, your web apps and your devices can share data easily, providing you with a new experience. You can install Cozy on your own hardware where no one profiles you.

Community

You can reach the Cozy Community by:

  • Chatting with us on IRC #cozycloud on irc.freenode.net
  • Posting on our Forum
  • Posting issues on the Github repos
  • Mentioning us on Twitter

cozy-data-indexer's People

Contributors

aenario avatar frankrousseau avatar jsilvestre avatar kloadut avatar krichtof avatar nledez avatar nono avatar obigroup avatar seeker89 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cozy-data-indexer's Issues

Improvements

After a couple jours spent in Whoosh documentation, I suggest the following:

Indexation

  • marking tags as lowercase=true because I can already feel we are going to use tags search and and be mad because of this (http://pythonhosted.org/Whoosh/api/fields.html#whoosh.fields.KEYWORD)
  • also, we should mark tags as scorable=True so we can search them later on (according to the doc)
  • adding tags as proper keywords. Tags are passed as-is to the indexer and

"[the keyword] type is designed for space- or comma-separated keywords."

  • adding the doctype to the tags list

Search

Search result pagination

s.search_page(query, 5, pagelen=20)

Where 5 is the requested page and pagelen the number of results by page. page would default to 1 and pagelen would default to 10 (which is already its default in Whoosh).

Doctype restriction

Enable tag searching

  • searches currently don't look at tags

Security

  • adding basic authentication to the server so only the Data System can request it. (unimplemented change for now because it's breaking)
  • hiding bank credentials in the logs when a request fails !!

Some changes are tied to the Data System so I also consider changing the Data System and the jugglingdb-cozy-adapter.
All the changes I'm going to make / I've done are fully backwards compatible.

Other suggested improvements (feedback needed)

  • there is no way to say "exclude the documents of those doctypes off the results", an inverted filter (called mask in Whoosh). I'm not sure of the best way to do it: using the same "docType" field but with a "!" at the beginning to say it is a "not", considering there won't be a doctype name starting with "!" ? Or using another field ?

Auto python packages upgrade

I've seen that weboob import wasn't working anymore a few days ago. I was about to submit an issue for "please update weboob-modules", but I saw you actually updated them in your personal repo weboob-modules. So I've manually done:

  • log in as cozy (sudo su cozy -p)
  • update requirements in the virtualenv (. ./virtualenv/bin/activate && pip install -r requirements/common.txt -U)
  • update repo (git pull)

Two questions here:

  • Are all these steps done automatically during an update? If not, that would be nice to do so.
  • Could cozy-data-indexer be added to the list of system applications (i.e. next to Data-System / Home / Proxy / Controller in the "manage your apps" tab), for getting updates notifications and auto-updates?

Failure when installing indexer for the second time

I managed to install this module with the command

fab -H user@ip install_indexer

but if I issue this command another time I have an error

IOError: invalid Python installation: unable to open /usr/cozy-indexer/cozy-data-indexer/virtualenv/lib/python2.7/config/Makefile (No such file or directory)

But the file exists

Here is the complete stack trace

sudo: pip install --use-mirrors -r /usr/local/cozy-indexer/cozy-data-indexer/requirements/common.txt
out: sudo password:
out: Traceback (most recent call last):
out:   File "/usr/local/cozy-indexer/cozy-data-indexer/virtualenv/bin/pip", line 9, in <module>
out:     load_entry_point('pip==1.4', 'console_scripts', 'pip')()
out:   File "/usr/local/cozy-indexer/cozy-data-indexer/virtualenv/local/lib/python2.7/site-packages/pkg_resources.py", line 378, in load_entry_point
out:     return get_distribution(dist).load_entry_point(group, name)
out:   File "/usr/local/cozy-indexer/cozy-data-indexer/virtualenv/local/lib/python2.7/site-packages/pkg_resources.py", line 2566, in load_entry_point
out:     return ep.load()
out:   File "/usr/local/cozy-indexer/cozy-data-indexer/virtualenv/local/lib/python2.7/site-packages/pkg_resources.py", line 2260, in load
out:     entry = __import__(self.module_name, globals(),globals(), ['__name__'])
out:   File "/usr/local/cozy-indexer/cozy-data-indexer/virtualenv/local/lib/python2.7/site-packages/pip/__init__.py", line 11, in <module>
out:     from pip.vcs import git, mercurial, subversion, bazaar  # noqa
out:   File "/usr/local/cozy-indexer/cozy-data-indexer/virtualenv/local/lib/python2.7/site-packages/pip/vcs/subversion.py", line 4, in <module>
out:     from pip.index import Link
out:   File "/usr/local/cozy-indexer/cozy-data-indexer/virtualenv/local/lib/python2.7/site-packages/pip/index.py", line 32, in <module>
out:     from pip.wheel import Wheel, wheel_ext, wheel_setuptools_support, setuptools_requirement
out:   File "/usr/local/cozy-indexer/cozy-data-indexer/virtualenv/local/lib/python2.7/site-packages/pip/wheel.py", line 18, in <module>
out:     from pip import pep425tags
out:   File "/usr/local/cozy-indexer/cozy-data-indexer/virtualenv/local/lib/python2.7/site-packages/pip/pep425tags.py", line 98, in <module>
out:     supported_tags = get_supported()
out:   File "/usr/local/cozy-indexer/cozy-data-indexer/virtualenv/local/lib/python2.7/site-packages/pip/pep425tags.py", line 61, in get_supported
out:     soabi = sysconfig.get_config_var('SOABI')
out:   File "/usr/lib/python2.7/sysconfig.py", line 577, in get_config_var
out:     return get_config_vars().get(name)
out:   File "/usr/lib/python2.7/sysconfig.py", line 476, in get_config_vars
out:     _init_posix(_CONFIG_VARS)
out:   File "/usr/lib/python2.7/sysconfig.py", line 344, in _init_posix
out:     raise IOError(msg)
out: IOError: invalid Python installation: unable to open /usr/cozy-indexer/cozy-data-indexer/virtualenv/lib/python2.7/config/Makefile (No such file or directory)
out:

Update weboob

Hi,

TL;DR

Please update weboob to a more recent version.

The Longer One

Weboob got some updates that make it safer. For instance I have been hitting an issue that my bank account couldn't be checked. Digging up further, weboob used to check that the SSL certificate used when connecting to a known website (in this case, my bank website) has the same hash that one which is stored in raw in the weboob module of the bank. That's pretty bad, because it means everytime the bank website updates its SSL certificate (which happened a few days ago, because Heartbleed), you have to rewrite a new version of the weboob module. They apparently removed that restriction in most recent versions of weboob, so it'd be nice if the used version of weboob could be updated too.

Cheers!

Limitations

Here is a list of the limitations I've encountered:

  • you can't search a specific field (because of the indexer's data model) [this one is hard, https://pythonhosted.org/Whoosh/schema.html#dynamic-fields could be the solution]
  • only string fields can be indexed (I think it's a bug actually)
  • apps can't know how many relevant results there are so they can't paginate properly (should be an option to prevent BC break)

Add basic authentication

We should add basic authentication to the server so only the Data System can request it. It's unimplemented for now because it's breaking).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.