Code Monkey home page Code Monkey logo

text-sherlock's Introduction

Text Sherlock (or Sherlock)

Provides a fast, easy to install and simple to use search engine for text, but optimized for source code. An alternative, OpenGrok, requires too much time to install (though it may be worth it for some), but is more feature rich. Text Sherlock will give you a much easier setup, a text indexer, and a web app interface for searching with very little effort.

Soli Deo Gloria

Basic Setup

Instructions:

  1. Download Sherlock source from GitHub.
  2. Extract/place the Sherlock source code in the desired (install) directory. This will be where Sherlock lives.
  3. Run sh setup/virtualenv-setup.sh to setup an isolated environment and download core packages.
  4. Configure settings. The defaults in settings.py provide documentation for each setting.
    • Copy example.local_settings.yml to local_settings.yml.
    • Override/copy any setting from settings.py to local_settings.yml (change the values as needed). All YAML keys/options must be lowercase.
  5. Run source sherlock_env/bin/activate to enter the virtual environment.
  6. Run python main.py --index update or --index rebuild to index the path specified in the settings. Watch indexing output.
  7. Run python main.py --runserver to start the web server.
  8. Go to http://localhost:7777 to access the web interface. Uses the Bootstrap toolkit for it's UI.

You may need to install some packages before a Ubuntu installation will run without error.

  • Install curl: sudo apt-get install curl
  • Install uuid libs: sudo apt-get install uuid-dev
  • Install python dev: sudo apt-get install python-dev

Includes:

  • Settings/Configuration
  • Setup script (read contents of script for more information)
  • Main controller script
    • Run main.py -h for more information.
  • End-to-end interface
    • Indexing and searching text (source code). Built-in support for whoosh (fast searching) or xapian (much faster searching).
      • Easily extend indexing or searching via custom backends.
    • Front end web app served using werkzeug or cheroot.
      • werkzeug is for development to small traffic.
      • cheroot is the high-performance, pure-Python HTTP server used by CherryPy.
    • Settings and configuration using Python.

Web Interface

Features:

Append to document URL.

  • To highlight lines, append to URL: &hl=3,7,12-14,21
  • To jump to a line, append to end of URL: #line-3

screenshot

screenshot

Using other backends

In settings.py:

  • Change the default_indexer and default_searcher values to match the name given to the backend.
    • Possible values:
      • whoosh the default, no extra work needed.
      • xapian must be installed separately using the included setup/install-xapian.sh setup script.

Using other web servers

Text Sherlock has built-in support for werkzeug and cheroot WSGI compliant servers.

In settings.py:

  • Change the server_type value to one of the available server types.
    • Possible values:
      • default, werkzeug web server (default).
      • cheroot, production ready web server.

Core packages

Requires Python 3.5+

Other References

Project Goals

  1. Provide an easy to setup, fast, and adequate text search engine solution.
  2. Be a respectable alternative to OpenGrok.
  3. Influence the OpenGrok contributors to provide a simpler setup process.
    • I successfully setup two installations of OpenGrok on CentOS and Ubuntu 11.x. Each time it took more than two hours. Text Sherlock setup takes less than 5 minutes (excluding package download time).

Contributors

text-sherlock's People

Contributors

boidolr avatar cbess avatar dependabot[bot] avatar estebank avatar ihgann avatar rpavlik avatar stmuk avatar zhangclb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

text-sherlock's Issues

error while trying to run the server

Traceback (most recent call last):
File "main.py", line 12, in
from webapp import server
File "/Users/Chris/text-sherlock/webapp/server.py", line 2, in
from core import flask
File "/Users/Chris/text-sherlock/core/init.py", line 13, in
from cherrypy import wsgiserver as cherrypy_wsgiserver
ImportError: cannot import name wsgiserver

any solutions ?

error while updating index

After installing, I ran
python main.py --index update
and got

Exception: settings.INCLUDE_FILE_SUFFIX must be a tuple or None, found: <type 'list'>

is it known issue?
Thanks

Allow for partial text searching

Currently, you can only search for nearly-exact string matches. For example, if I have a file with the string veryLongString, and I search verylong, it will show up as if there are no results. Only if I search for veryLongString will any results show up.

Excluding directories?

I've got a bunch of git repositories, and I'd like to set up a code search engine to, well, search them. Text Sherlock looks like a good candidate, but unfortunately I can't figure out how to exclude the .git folder from the index - as it's slowing everything down and cluttering the search results.

DocNotFoundError during index update

During main.py --index update got error:

Cleaning index
Traceback (most recent call last):
  File "main.py", line 118, in <module>
    run()
  File "main.py", line 108, in run
    indexer.index_path(path)
  File "/opt/text-sherlock/core/sherlock/indexer.py", line 39, in index_path
    idxr.clean_index()
  File "/opt/text-sherlock/core/sherlock/indexer.py", line 99, in clean_index
    self._index.clean_index()
  File "/opt/text-sherlock/core/sherlock/backends/xapian_backend.py", line 104, in clean_index
    self.index.delete_document(record.id)
xapian.DocNotFoundError: Can't delete non-existent document #242

TypeError: coercing to Unicode: need string or buffer, NoneType found

python main.py -r
Traceback (most recent call last):
File "main.py", line 12, in
from webapp import server
File "/home/user/text-sherlock/webapp/server.py", line 2, in
from core import flask
File "/home/user/text-sherlock/core/init.py", line 22, in
import settings
File "/home/user/text-sherlock/settings.py", line 22, in
if not os.path.isfile(yaml_path):
File "/usr/lib/python2.7/genericpath.py", line 29, in isfile
st = os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found

patch for current version

the indexer fails to start indexing (i guess the peewee API changed a bit). Here's a patch that resolves this issue:

diff --git a/core/sherlock/db.py b/core/sherlock/db.py
index ccdc596..e3bf3cd 100644
--- a/core/sherlock/db.py
+++ b/core/sherlock/db.py
@@ -65,7 +65,7 @@ def is_file_updated(filepath, check_file_exists=False, update_db=False):

     # get db record
     record = None
-    query = IndexerMeta.select().where(path=filepath)
+    query = IndexerMeta.select().where(IndexerMeta.path == filepath)
     if query.exists():
         # get the one record
         record = [q for q in query][0]
diff --git a/main.py b/main.py
index f79653d..267f9db 100755
--- a/main.py
+++ b/main.py
@@ -74,7 +74,7 @@ def run():
         tests.run_all()
     elif options.show_version:
         pyver = sys.version_info
-        print '  Python: v%d.%d.%d' % (pyver.major, pyver.minor, pyver.micro)
+        print '  Python: v%d.%d.%d' % (pyver[0], pyver[1], pyver[2])
         print 'Sherlock: v' + get_version_info('sherlock')
         print '   Flask: v' + get_version_info('flask')
         print 'Pygments: v' + get_version_info('pygments')

i removed the .major ect because it's new in python 2.7. I tested it on python 2.6 and it doesn't have that named access.

right now i can't figure how to force sherlock to index recursively (the setting is set to True of course ;)) but i hope i'll able to figure that out.

best regards,
toudi.

"Total documents indexed: 0" issue

After I "Copy example.local_settings.yml to local_settings.yml" and run "python main.py --index rebuild",
it seems that no file indexed:

(sherlock_env)zhangclb@zhangclb2:~/sandbox/sherlock/opt/text-sherlock$ python main.py --index rebuild
Loaded Sherlock config settings from /home/zhangclb/sandbox/sherlock/opt/text-sherlock/local_settings.yml
Xapian backend support unavailable
Running sherlock...
Indexing path: /home/zhangclb/sandbox/sherlock/opt/text-sherlock/tests/text/
Reindexing everything!
Waiting 5s for interrupt...
Indexing started.
Available indexer backends: whoosh
Available searcher backends: whoosh
Current backend: whoosh
Total documents indexed: 0
Index Database: /home/zhangclb/sandbox/sherlock/opt/text-sherlock/data/indexes/main-index.db
Indexing done.

I searched "stringBuffer" in sherlock and got nothing, which contains in the file of tests/text/example.c

Integrate indexing/searching of PDFs and other textual docs

Features overview:

  • Search results should provide the file and page (if avail).
  • User should be able to open or download the file from the search results
  • organize docs by projects or folders
  • manage indexed files via a backend web interface
  • store username (if available) for any changes

Performance issue

Hi good people,

I've setup a sherlock instance on a quite large codebase at [1]. The indexes took several days to build, it worked well, but now the interface is so slow that it can hardly be used [2].

Is it considered normal, or can I do anything to improve the speed of the UI? Is it just the host's perfs (vm with 8 procs and 16GB of RAM)?

I could not find anything about perfs on the wiki, and cannot find a mailing list to ask questions on. Hence this information request. Thanks in advance, have a wonderful end of year! :-)

[1] http://ci3.castalia.camp:7777/
[2] http://ci3.castalia.camp:7777/search?q=IFrame

--
boris

Allow ranges to be entered for highlighting multiple lines

In addition to being able to provide a delimited list of line numbers, a nice feature would be to allow those delimited values to allow for ranges of line numbers to be selected instead of just individual lines.

For example:

...&hl=5,6,7,8,9,11,13,14,15

Could be re-written with ranges like this:

...&hl=5-9,11,13-15

commit 166 new error :/

shalini@shalini-Iappy:/text1$ source sherlock_env/bin/activate
(sherlock_env)shalini@shalini-Iappy:
/text1$ python main.py --index update
No yaml config
Setup the local_settings.yml config.
Traceback (most recent call last):
File "main.py", line 12, in
from webapp import server
File "/home/shalini/text1/webapp/server.py", line 2, in
from core import flask
File "/home/shalini/text1/core/init.py", line 22, in
import settings
File "/home/shalini/text1/settings.py", line 36, in
config = yaml.load(open(yaml_path, 'r'))
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/init.py", line 71, in load
return loader.get_single_data()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/constructor.py", line 37, in get_single_data
node = self.get_single_node()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 127, in compose_mapping_node
while not self.check_event(MappingEndEvent):
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/parser.py", line 428, in parse_block_mapping_key
if self.check_token(KeyToken):
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/scanner.py", line 115, in check_token
while self.need_more_tokens():
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/scanner.py", line 149, in need_more_tokens
self.stale_possible_simple_keys()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/scanner.py", line 289, in stale_possible_simple_keys
"could not found expected ':'", self.get_mark())
yaml.scanner.ScannerError: while scanning a simple key
in "/home/shalini/text1/local_settings.yml", line 23, column 1
could not found expected ':'
in "/home/shalini/text1/local_settings.yml", line 26, column 2

"python main.py --test" error if "log_path" configured

contents of local_settings.yml:

log_path: '/home/borisov/log/'

error:

(sherlock_env) borisov@deli:~/src/text-sherlock $ python main.py --test
Loaded Sherlock config settings from /home/borisov/src/text-sherlock/local_settings.yml
Xapian backend support unavailable
Running sherlock...
Traceback (most recent call last):
  File "main.py", line 93, in <module>
    run()
  File "main.py", line 77, in run
    import tests
  File "/home/borisov/src/text-sherlock/tests/__init__.py", line 17, in <module>
    hdlr = logging.FileHandler(os.path.join(settings.LOG_PATH, filename, __name__))
  File "/usr/lib/python2.7/logging/__init__.py", line 905, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/lib/python2.7/logging/__init__.py", line 935, in _open
    stream = open(self.baseFilename, self.mode)
IOError: [Errno 2] No such file or directory: '/home/borisov/log/sherlock.tests.log.txt/tests'

Update search logging

  • Place the search times in the log
  • Implement a setting to allow search logs to be sent to a database and not just to stdout.

Install instructions unclear

Turns out I had to do the following:

cd setup
sh virtualenv-setup.sh

because the script depends on the current directory being the setup directory, and the shell script isn't executable.

Error while trying to reindex

I'm using python 2.7 with versions:

Xapian backend support unavailable
Running sherlock...
  Python: v2.7.2
Sherlock: v0.7.2
   Flask: v0.9
Pygments: v1.5
  Whoosh: v2.4.1
CherryPy: v3.2.2
sherlock done.

And i'm getting the following error:

(sherlock_env)[code@rainbowdash text-sherlock]$ python main.py --index rebuild
Xapian backend support unavailable
Running sherlock...
Indexing path: /home/code/src/
Reindexing everything!
Waiting 5s for interrupt...
creating index at /home/code/text-sherlock/data/indexes/main
Checking directory: /home/code/src/
Traceback (most recent call last):
  File "main.py", line 122, in <module>
    run()
  File "main.py", line 112, in run
    indexer.index_path(path)
  File "/home/code/text-sherlock/core/sherlock/indexer.py", line 37, in index_path
    idxr.index_text(path)
  File "/home/code/text-sherlock/core/sherlock/indexer.py", line 133, in index_text
    self.__index_path(path)
  File "/home/code/text-sherlock/core/sherlock/indexer.py", line 141, in __index_path
    self.__index_dir(path)
  File "/home/code/text-sherlock/core/sherlock/indexer.py", line 202, in __index_dir
    self.__index_file(path)
  File "/home/code/text-sherlock/core/sherlock/indexer.py", line 208, in __index_file
    has_file_changed, db_record = self._index.has_file_updated(filepath)
  File "/home/code/text-sherlock/core/sherlock/backends/base.py", line 66, in has_file_updated
    return db.is_file_updated(filepath, update_db=True)
  File "/home/code/text-sherlock/core/sherlock/db.py", line 68, in is_file_updated
    query = IndexerMeta.select().where(path=filepath)
  File "/home/code/text-sherlock/sherlock_env/lib/python2.7/site-packages/peewee.py", line 1090, in inner
    func(clone, *args, **kwargs)
TypeError: where() got an unexpected keyword argument 'path'

How i can resolve this problem ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.