Code Monkey home page Code Monkey logo

text-sherlock's Issues

Excluding directories?

I've got a bunch of git repositories, and I'd like to set up a code search engine to, well, search them. Text Sherlock looks like a good candidate, but unfortunately I can't figure out how to exclude the .git folder from the index - as it's slowing everything down and cluttering the search results.

Allow for partial text searching

Currently, you can only search for nearly-exact string matches. For example, if I have a file with the string veryLongString, and I search verylong, it will show up as if there are no results. Only if I search for veryLongString will any results show up.

Update search logging

  • Place the search times in the log
  • Implement a setting to allow search logs to be sent to a database and not just to stdout.

error while updating index

After installing, I ran
python main.py --index update
and got

Exception: settings.INCLUDE_FILE_SUFFIX must be a tuple or None, found: <type 'list'>

is it known issue?
Thanks

DocNotFoundError during index update

During main.py --index update got error:

Cleaning index
Traceback (most recent call last):
  File "main.py", line 118, in <module>
    run()
  File "main.py", line 108, in run
    indexer.index_path(path)
  File "/opt/text-sherlock/core/sherlock/indexer.py", line 39, in index_path
    idxr.clean_index()
  File "/opt/text-sherlock/core/sherlock/indexer.py", line 99, in clean_index
    self._index.clean_index()
  File "/opt/text-sherlock/core/sherlock/backends/xapian_backend.py", line 104, in clean_index
    self.index.delete_document(record.id)
xapian.DocNotFoundError: Can't delete non-existent document #242

commit 166 new error :/

shalini@shalini-Iappy:/text1$ source sherlock_env/bin/activate
(sherlock_env)shalini@shalini-Iappy:
/text1$ python main.py --index update
No yaml config
Setup the local_settings.yml config.
Traceback (most recent call last):
File "main.py", line 12, in
from webapp import server
File "/home/shalini/text1/webapp/server.py", line 2, in
from core import flask
File "/home/shalini/text1/core/init.py", line 22, in
import settings
File "/home/shalini/text1/settings.py", line 36, in
config = yaml.load(open(yaml_path, 'r'))
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/init.py", line 71, in load
return loader.get_single_data()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/constructor.py", line 37, in get_single_data
node = self.get_single_node()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 127, in compose_mapping_node
while not self.check_event(MappingEndEvent):
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/parser.py", line 428, in parse_block_mapping_key
if self.check_token(KeyToken):
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/scanner.py", line 115, in check_token
while self.need_more_tokens():
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/scanner.py", line 149, in need_more_tokens
self.stale_possible_simple_keys()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/scanner.py", line 289, in stale_possible_simple_keys
"could not found expected ':'", self.get_mark())
yaml.scanner.ScannerError: while scanning a simple key
in "/home/shalini/text1/local_settings.yml", line 23, column 1
could not found expected ':'
in "/home/shalini/text1/local_settings.yml", line 26, column 2

patch for current version

the indexer fails to start indexing (i guess the peewee API changed a bit). Here's a patch that resolves this issue:

diff --git a/core/sherlock/db.py b/core/sherlock/db.py
index ccdc596..e3bf3cd 100644
--- a/core/sherlock/db.py
+++ b/core/sherlock/db.py
@@ -65,7 +65,7 @@ def is_file_updated(filepath, check_file_exists=False, update_db=False):

     # get db record
     record = None
-    query = IndexerMeta.select().where(path=filepath)
+    query = IndexerMeta.select().where(IndexerMeta.path == filepath)
     if query.exists():
         # get the one record
         record = [q for q in query][0]
diff --git a/main.py b/main.py
index f79653d..267f9db 100755
--- a/main.py
+++ b/main.py
@@ -74,7 +74,7 @@ def run():
         tests.run_all()
     elif options.show_version:
         pyver = sys.version_info
-        print '  Python: v%d.%d.%d' % (pyver.major, pyver.minor, pyver.micro)
+        print '  Python: v%d.%d.%d' % (pyver[0], pyver[1], pyver[2])
         print 'Sherlock: v' + get_version_info('sherlock')
         print '   Flask: v' + get_version_info('flask')
         print 'Pygments: v' + get_version_info('pygments')

i removed the .major ect because it's new in python 2.7. I tested it on python 2.6 and it doesn't have that named access.

right now i can't figure how to force sherlock to index recursively (the setting is set to True of course ;)) but i hope i'll able to figure that out.

best regards,
toudi.

TypeError: coercing to Unicode: need string or buffer, NoneType found

python main.py -r
Traceback (most recent call last):
File "main.py", line 12, in
from webapp import server
File "/home/user/text-sherlock/webapp/server.py", line 2, in
from core import flask
File "/home/user/text-sherlock/core/init.py", line 22, in
import settings
File "/home/user/text-sherlock/settings.py", line 22, in
if not os.path.isfile(yaml_path):
File "/usr/lib/python2.7/genericpath.py", line 29, in isfile
st = os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found

"python main.py --test" error if "log_path" configured

contents of local_settings.yml:

log_path: '/home/borisov/log/'

error:

(sherlock_env) borisov@deli:~/src/text-sherlock $ python main.py --test
Loaded Sherlock config settings from /home/borisov/src/text-sherlock/local_settings.yml
Xapian backend support unavailable
Running sherlock...
Traceback (most recent call last):
  File "main.py", line 93, in <module>
    run()
  File "main.py", line 77, in run
    import tests
  File "/home/borisov/src/text-sherlock/tests/__init__.py", line 17, in <module>
    hdlr = logging.FileHandler(os.path.join(settings.LOG_PATH, filename, __name__))
  File "/usr/lib/python2.7/logging/__init__.py", line 905, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/lib/python2.7/logging/__init__.py", line 935, in _open
    stream = open(self.baseFilename, self.mode)
IOError: [Errno 2] No such file or directory: '/home/borisov/log/sherlock.tests.log.txt/tests'

Error while trying to reindex

I'm using python 2.7 with versions:

Xapian backend support unavailable
Running sherlock...
  Python: v2.7.2
Sherlock: v0.7.2
   Flask: v0.9
Pygments: v1.5
  Whoosh: v2.4.1
CherryPy: v3.2.2
sherlock done.

And i'm getting the following error:

(sherlock_env)[code@rainbowdash text-sherlock]$ python main.py --index rebuild
Xapian backend support unavailable
Running sherlock...
Indexing path: /home/code/src/
Reindexing everything!
Waiting 5s for interrupt...
creating index at /home/code/text-sherlock/data/indexes/main
Checking directory: /home/code/src/
Traceback (most recent call last):
  File "main.py", line 122, in <module>
    run()
  File "main.py", line 112, in run
    indexer.index_path(path)
  File "/home/code/text-sherlock/core/sherlock/indexer.py", line 37, in index_path
    idxr.index_text(path)
  File "/home/code/text-sherlock/core/sherlock/indexer.py", line 133, in index_text
    self.__index_path(path)
  File "/home/code/text-sherlock/core/sherlock/indexer.py", line 141, in __index_path
    self.__index_dir(path)
  File "/home/code/text-sherlock/core/sherlock/indexer.py", line 202, in __index_dir
    self.__index_file(path)
  File "/home/code/text-sherlock/core/sherlock/indexer.py", line 208, in __index_file
    has_file_changed, db_record = self._index.has_file_updated(filepath)
  File "/home/code/text-sherlock/core/sherlock/backends/base.py", line 66, in has_file_updated
    return db.is_file_updated(filepath, update_db=True)
  File "/home/code/text-sherlock/core/sherlock/db.py", line 68, in is_file_updated
    query = IndexerMeta.select().where(path=filepath)
  File "/home/code/text-sherlock/sherlock_env/lib/python2.7/site-packages/peewee.py", line 1090, in inner
    func(clone, *args, **kwargs)
TypeError: where() got an unexpected keyword argument 'path'

How i can resolve this problem ?

Performance issue

Hi good people,

I've setup a sherlock instance on a quite large codebase at [1]. The indexes took several days to build, it worked well, but now the interface is so slow that it can hardly be used [2].

Is it considered normal, or can I do anything to improve the speed of the UI? Is it just the host's perfs (vm with 8 procs and 16GB of RAM)?

I could not find anything about perfs on the wiki, and cannot find a mailing list to ask questions on. Hence this information request. Thanks in advance, have a wonderful end of year! :-)

[1] http://ci3.castalia.camp:7777/
[2] http://ci3.castalia.camp:7777/search?q=IFrame

--
boris

error while trying to run the server

Traceback (most recent call last):
File "main.py", line 12, in
from webapp import server
File "/Users/Chris/text-sherlock/webapp/server.py", line 2, in
from core import flask
File "/Users/Chris/text-sherlock/core/init.py", line 13, in
from cherrypy import wsgiserver as cherrypy_wsgiserver
ImportError: cannot import name wsgiserver

any solutions ?

Integrate indexing/searching of PDFs and other textual docs

Features overview:

  • Search results should provide the file and page (if avail).
  • User should be able to open or download the file from the search results
  • organize docs by projects or folders
  • manage indexed files via a backend web interface
  • store username (if available) for any changes

Install instructions unclear

Turns out I had to do the following:

cd setup
sh virtualenv-setup.sh

because the script depends on the current directory being the setup directory, and the shell script isn't executable.

"Total documents indexed: 0" issue

After I "Copy example.local_settings.yml to local_settings.yml" and run "python main.py --index rebuild",
it seems that no file indexed:

(sherlock_env)zhangclb@zhangclb2:~/sandbox/sherlock/opt/text-sherlock$ python main.py --index rebuild
Loaded Sherlock config settings from /home/zhangclb/sandbox/sherlock/opt/text-sherlock/local_settings.yml
Xapian backend support unavailable
Running sherlock...
Indexing path: /home/zhangclb/sandbox/sherlock/opt/text-sherlock/tests/text/
Reindexing everything!
Waiting 5s for interrupt...
Indexing started.
Available indexer backends: whoosh
Available searcher backends: whoosh
Current backend: whoosh
Total documents indexed: 0
Index Database: /home/zhangclb/sandbox/sherlock/opt/text-sherlock/data/indexes/main-index.db
Indexing done.

I searched "stringBuffer" in sherlock and got nothing, which contains in the file of tests/text/example.c

Allow ranges to be entered for highlighting multiple lines

In addition to being able to provide a delimited list of line numbers, a nice feature would be to allow those delimited values to allow for ranges of line numbers to be selected instead of just individual lines.

For example:

...&hl=5,6,7,8,9,11,13,14,15

Could be re-written with ranges like this:

...&hl=5-9,11,13-15

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.