cbess / text-sherlock Goto Github PK
View Code? Open in Web Editor NEWText (source code) search engine with indexer and a front end web interface to search. Uses Python 3.
License: Other
Text (source code) search engine with indexer and a front end web interface to search. Uses Python 3.
License: Other
Change the INDEX_PATH
setting to a tuple/list to allow multiple paths to be used.
I've got a bunch of git repositories, and I'd like to set up a code search engine to, well, search them. Text Sherlock looks like a good candidate, but unfortunately I can't figure out how to exclude the .git
folder from the index - as it's slowing everything down and cluttering the search results.
Encase someone runs the scripts as root or another user, check the permissions when starting to prevent errors from confusing users during the setup process.
It would be nice if there were the ability to also browse the indexed source tree.
Currently, you can only search for nearly-exact string matches. For example, if I have a file with the string veryLongString
, and I search verylong
, it will show up as if there are no results. Only if I search for veryLongString
will any results show up.
This will help the transition to pypi, by adding a --config
option
Add line numbers to search results. Also clicking a line should take you to the line or context of the result.
Line numbering example ref:
https://github.com/elijahr/lk/blob/master/lk.py#L271
After installing, I ran
python main.py --index update
and got
Exception: settings.INCLUDE_FILE_SUFFIX must be a tuple or None, found: <type 'list'>
is it known issue?
Thanks
During main.py --index update
got error:
Cleaning index
Traceback (most recent call last):
File "main.py", line 118, in <module>
run()
File "main.py", line 108, in run
indexer.index_path(path)
File "/opt/text-sherlock/core/sherlock/indexer.py", line 39, in index_path
idxr.clean_index()
File "/opt/text-sherlock/core/sherlock/indexer.py", line 99, in clean_index
self._index.clean_index()
File "/opt/text-sherlock/core/sherlock/backends/xapian_backend.py", line 104, in clean_index
self.index.delete_document(record.id)
xapian.DocNotFoundError: Can't delete non-existent document #242
shalini@shalini-Iappy:/text1$ source sherlock_env/bin/activate/text1$ python main.py --index update
(sherlock_env)shalini@shalini-Iappy:
No yaml config
Setup the local_settings.yml config.
Traceback (most recent call last):
File "main.py", line 12, in
from webapp import server
File "/home/shalini/text1/webapp/server.py", line 2, in
from core import flask
File "/home/shalini/text1/core/init.py", line 22, in
import settings
File "/home/shalini/text1/settings.py", line 36, in
config = yaml.load(open(yaml_path, 'r'))
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/init.py", line 71, in load
return loader.get_single_data()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/constructor.py", line 37, in get_single_data
node = self.get_single_node()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/composer.py", line 127, in compose_mapping_node
while not self.check_event(MappingEndEvent):
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/parser.py", line 428, in parse_block_mapping_key
if self.check_token(KeyToken):
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/scanner.py", line 115, in check_token
while self.need_more_tokens():
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/scanner.py", line 149, in need_more_tokens
self.stale_possible_simple_keys()
File "/home/shalini/text1/sherlock_env/local/lib/python2.7/site-packages/yaml/scanner.py", line 289, in stale_possible_simple_keys
"could not found expected ':'", self.get_mark())
yaml.scanner.ScannerError: while scanning a simple key
in "/home/shalini/text1/local_settings.yml", line 23, column 1
could not found expected ':'
in "/home/shalini/text1/local_settings.yml", line 26, column 2
the indexer fails to start indexing (i guess the peewee API changed a bit). Here's a patch that resolves this issue:
diff --git a/core/sherlock/db.py b/core/sherlock/db.py
index ccdc596..e3bf3cd 100644
--- a/core/sherlock/db.py
+++ b/core/sherlock/db.py
@@ -65,7 +65,7 @@ def is_file_updated(filepath, check_file_exists=False, update_db=False):
# get db record
record = None
- query = IndexerMeta.select().where(path=filepath)
+ query = IndexerMeta.select().where(IndexerMeta.path == filepath)
if query.exists():
# get the one record
record = [q for q in query][0]
diff --git a/main.py b/main.py
index f79653d..267f9db 100755
--- a/main.py
+++ b/main.py
@@ -74,7 +74,7 @@ def run():
tests.run_all()
elif options.show_version:
pyver = sys.version_info
- print ' Python: v%d.%d.%d' % (pyver.major, pyver.minor, pyver.micro)
+ print ' Python: v%d.%d.%d' % (pyver[0], pyver[1], pyver[2])
print 'Sherlock: v' + get_version_info('sherlock')
print ' Flask: v' + get_version_info('flask')
print 'Pygments: v' + get_version_info('pygments')
i removed the .major ect because it's new in python 2.7. I tested it on python 2.6 and it doesn't have that named access.
right now i can't figure how to force sherlock to index recursively (the setting is set to True of course ;)) but i hope i'll able to figure that out.
best regards,
toudi.
python main.py -r
Traceback (most recent call last):
File "main.py", line 12, in
from webapp import server
File "/home/user/text-sherlock/webapp/server.py", line 2, in
from core import flask
File "/home/user/text-sherlock/core/init.py", line 22, in
import settings
File "/home/user/text-sherlock/settings.py", line 22, in
if not os.path.isfile(yaml_path):
File "/usr/lib/python2.7/genericpath.py", line 29, in isfile
st = os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found
contents of local_settings.yml:
log_path: '/home/borisov/log/'
error:
(sherlock_env) borisov@deli:~/src/text-sherlock $ python main.py --test
Loaded Sherlock config settings from /home/borisov/src/text-sherlock/local_settings.yml
Xapian backend support unavailable
Running sherlock...
Traceback (most recent call last):
File "main.py", line 93, in <module>
run()
File "main.py", line 77, in run
import tests
File "/home/borisov/src/text-sherlock/tests/__init__.py", line 17, in <module>
hdlr = logging.FileHandler(os.path.join(settings.LOG_PATH, filename, __name__))
File "/usr/lib/python2.7/logging/__init__.py", line 905, in __init__
StreamHandler.__init__(self, self._open())
File "/usr/lib/python2.7/logging/__init__.py", line 935, in _open
stream = open(self.baseFilename, self.mode)
IOError: [Errno 2] No such file or directory: '/home/borisov/log/sherlock.tests.log.txt/tests'
I'm using python 2.7 with versions:
Xapian backend support unavailable
Running sherlock...
Python: v2.7.2
Sherlock: v0.7.2
Flask: v0.9
Pygments: v1.5
Whoosh: v2.4.1
CherryPy: v3.2.2
sherlock done.
And i'm getting the following error:
(sherlock_env)[code@rainbowdash text-sherlock]$ python main.py --index rebuild
Xapian backend support unavailable
Running sherlock...
Indexing path: /home/code/src/
Reindexing everything!
Waiting 5s for interrupt...
creating index at /home/code/text-sherlock/data/indexes/main
Checking directory: /home/code/src/
Traceback (most recent call last):
File "main.py", line 122, in <module>
run()
File "main.py", line 112, in run
indexer.index_path(path)
File "/home/code/text-sherlock/core/sherlock/indexer.py", line 37, in index_path
idxr.index_text(path)
File "/home/code/text-sherlock/core/sherlock/indexer.py", line 133, in index_text
self.__index_path(path)
File "/home/code/text-sherlock/core/sherlock/indexer.py", line 141, in __index_path
self.__index_dir(path)
File "/home/code/text-sherlock/core/sherlock/indexer.py", line 202, in __index_dir
self.__index_file(path)
File "/home/code/text-sherlock/core/sherlock/indexer.py", line 208, in __index_file
has_file_changed, db_record = self._index.has_file_updated(filepath)
File "/home/code/text-sherlock/core/sherlock/backends/base.py", line 66, in has_file_updated
return db.is_file_updated(filepath, update_db=True)
File "/home/code/text-sherlock/core/sherlock/db.py", line 68, in is_file_updated
query = IndexerMeta.select().where(path=filepath)
File "/home/code/text-sherlock/sherlock_env/lib/python2.7/site-packages/peewee.py", line 1090, in inner
func(clone, *args, **kwargs)
TypeError: where() got an unexpected keyword argument 'path'
How i can resolve this problem ?
Provide option to skip indexing if performing the index would wipe out an existing index, because the directory is now empty.
Hi good people,
I've setup a sherlock instance on a quite large codebase at [1]. The indexes took several days to build, it worked well, but now the interface is so slow that it can hardly be used [2].
Is it considered normal, or can I do anything to improve the speed of the UI? Is it just the host's perfs (vm with 8 procs and 16GB of RAM)?
I could not find anything about perfs on the wiki, and cannot find a mailing list to ask questions on. Hence this information request. Thanks in advance, have a wonderful end of year! :-)
[1] http://ci3.castalia.camp:7777/
[2] http://ci3.castalia.camp:7777/search?q=IFrame
--
boris
Traceback (most recent call last):
File "main.py", line 12, in
from webapp import server
File "/Users/Chris/text-sherlock/webapp/server.py", line 2, in
from core import flask
File "/Users/Chris/text-sherlock/core/init.py", line 13, in
from cherrypy import wsgiserver as cherrypy_wsgiserver
ImportError: cannot import name wsgiserver
any solutions ?
Features overview:
Turns out I had to do the following:
cd setup
sh virtualenv-setup.sh
because the script depends on the current directory being the setup directory, and the shell script isn't executable.
After I "Copy example.local_settings.yml to local_settings.yml" and run "python main.py --index rebuild",
it seems that no file indexed:
(sherlock_env)zhangclb@zhangclb2:~/sandbox/sherlock/opt/text-sherlock$ python main.py --index rebuild
Loaded Sherlock config settings from /home/zhangclb/sandbox/sherlock/opt/text-sherlock/local_settings.yml
Xapian backend support unavailable
Running sherlock...
Indexing path: /home/zhangclb/sandbox/sherlock/opt/text-sherlock/tests/text/
Reindexing everything!
Waiting 5s for interrupt...
Indexing started.
Available indexer backends: whoosh
Available searcher backends: whoosh
Current backend: whoosh
Total documents indexed: 0
Index Database: /home/zhangclb/sandbox/sherlock/opt/text-sherlock/data/indexes/main-index.db
Indexing done.
I searched "stringBuffer" in sherlock and got nothing, which contains in the file of tests/text/example.c
In addition to being able to provide a delimited list of line numbers, a nice feature would be to allow those delimited values to allow for ranges of line numbers to be selected instead of just individual lines.
For example:
...&hl=5,6,7,8,9,11,13,14,15
Could be re-written with ranges like this:
...&hl=5-9,11,13-15
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.