Code Monkey home page Code Monkey logo

luceneplusplus's Introduction

Lucene++

Welcome to lucene++ version 3.0.9.

Lucene++ is a C++ port of the popular Java Lucene library, a high-performance, full-featured text search engine.

Lucene++ Components

  • liblucene++ library
  • liblucene++-contrib library
  • lucene++-tester (unit tester)
  • deletefiles (demo)
  • indexfiles (demo)
  • searchfiles (demo)

For information on building the Lucene++ suite, please read doc/BUILDING.md

Useful Resources

Official Java Lucene - useful links and documentation relevant to Lucene and lucene++. Lucene in Action by Otis Gospodnetic and Erik Hatcher.

To run unit test suite

lucene_tester is built using the Google Testing Framework. you can run the test suite on unix with the following command run from the repository root::

    $ build/src/test/lucene++-tester

the test suite can also be run from the repository root on NT systems, but the required DLL files must manually be copied into the test binary path before executing, otherwise you will recieve errors telling you that required libraries cannot be found.

    $ build/src/test/lucene++-tester

Command options can be discovered by supplying --help.

To run the demos

Start by indexing a directory of files - open a command prompt and run

    ./indexfiles <directory to index> <directory to store index>

Once the indexer has finished, you can query the index using searchfiles

    ./searchfiles -index <directory you stored the index in>

This uses an interactive command for you to enter queries, type a query to search the index press enter and you'll see the results.

Acknowledgements

  • Ben van Klinken and contributors to the CLucene project for inspiring this project.
  • md5 Copyright (C) 1999, 2000, 2002 Aladdin Enterprises
  • `Unicode character properties (guniprop)[http://library.gnome.org/devel/glib/] Copyright (C) 1999 Tom Tromey, Copyright (C) 2000 Red Hat, Inc.
  • `Cotire (compile time reducer)[https://github.com/sakra/cotire] by Sascha Kratky.

luceneplusplus's People

Contributors

alanw avatar artob avatar barracuda156 avatar berolinux avatar chenyang8094 avatar elfring avatar fish2000 avatar fishermanzzhang avatar fwuehr95 avatar gongheng2017 avatar hasselmm avatar hillwoodroc avatar josh-stoddard-tanium avatar kakueeen avatar kmatheussen avatar locutusofborg avatar merwaaan avatar revl avatar rsravanreddy avatar segv01 avatar theonering avatar vadz avatar vslavik avatar xhochy avatar zsims avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

luceneplusplus's Issues

Is stemmer required?

Hi again, can you please clarify how and where stemmer is required?

no header files are included, I tried to build against the debian libstemmer-dev package (trivial patch in src/contrib/CMakeLists.txt: target_link_libraries and removed some inclusion of the stemmer subdir) and it has beeen built and linked successfully.

I don't see any *stemmer.h inclusion, and the -Wl,--as-needed gcc flag stripped them from the library.
So I think this library is actually not required, am I wrong somewhere?
thanks

(I can provide a patch for letting the user choose the system or the embedded copy of stemmer/snowball library)

Segfault after usage of Lucene::SnapshotDeletionPolicy snapshot / release

Hi,

I"ve a bug I try to spot.
I can't reproduce from a simple test now. What I see is a crash after this operation:
Lucene::SnapshotDeletionPolicy snapshot
Lucene::SnapshotDeletionPolicy release
If I close the writer now, and repopen no problem.
But If I don't close it, and delete a document, I hit an exception as soon I've a writer->commit operation:

It's like the commit work on an old list of files, already deleted so it crash when it go in deref.
#0 0x00007f8a5bff73d5 in raise () from /lib64/libc.so.6
#1 0x00007f8a5bff8858 in abort () from /lib64/libc.so.6
#2 0x00007f8a5bff02e2 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007f8a5bff0392 in __assert_fail () from /lib64/libc.so.6
#4 0x00007f8a5e63c765 in Lucene::RefCount::DecRef (this=) at /home/albert/albert/LucenePlusPlus/src/core/index/IndexFileDeleter.cpp:444
#5 0x00007f8a5e63e783 in Lucene::IndexFileDeleter::decRef (this=0x7f8a4c031fd0, fileName=...)

at /home/albert/albert/LucenePlusPlus/src/core/index/IndexFileDeleter.cpp:341

#6 0x00007f8a5e640ca6 in Lucene::IndexFileDeleter::deleteCommits (this=0x7f8a4c031fd0) at /home/albert/albert/LucenePlusPlus/src/core/index/IndexFileDeleter.cpp:181
#7 0x00007f8a5e642980 in Lucene::IndexFileDeleter::checkpoint (this=0x7f8a4c031fd0, segmentInfos=..., isCommit=)

at /home/albert/albert/LucenePlusPlus/src/core/index/IndexFileDeleter.cpp:279

#8 0x00007f8a5e6fc5b1 in Lucene::IndexWriter::finishCommit (this=0x7f8a4c02c7c0) at /home/albert/albert/LucenePlusPlus/src/core/index/IndexWriter.cpp:2216
#9 0x00007f8a5e6f65a7 in Lucene::IndexWriter::commit (this=0x7f8a4c02c7c0,

commitUserData=<error reading variable: DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjunction with DW_OP_piece or DW_OP_bit_piece.>) at /home/albert/albert/LucenePlusPlus/src/core/index/IndexWriter.cpp:2195

#10 0x00007f8a5e6fae76 in Lucene::IndexWriter::commit (this=0x4d0f) at /home/albert/albert/LucenePlusPlus/src/core/index/IndexWriter.cpp:2170

If I do the same without snapshot / release of course no problem. Any clue ?

Thanks !

Segfault with copy of compound file up to > 2Go

Hi,

Just a TODO I will track in the future because I see this a lot of time if I let compound file actived with big base.
#6 arrayCopy<uint8_t const*, unsigned char*> (this=0x36427360,

b=0x33cb3f00 "\376\377\377\377\017\f\fami:document\001\aami:url\001\005docid\001\004crcs\001\vAMI_IDX_REF\021\bami:text\001\004lang\001\005depthQ\nami:author\021\asuccess\001\005ddateQ\rfacebook_type\021", offset=<value optimized out>, length=40) at /home/albert/albert/LucenePlusPlus/include/MiscUtils.h:84

#7 Lucene::BufferedIndexOutput::writeBytes (this=0x36427360,

b=0x33cb3f00 "\376\377\377\377\017\f\fami:document\001\aami:url\001\005docid\001\004crcs\001\vAMI_IDX_REF\021\bami:text\001\004lang\001\005depthQ\nami:author\021\asuccess\001\005ddateQ\rfacebook_type\021", offset=<value optimized out>, length=40) at /home/albert/albert/LucenePlusPlus/src/core/store/BufferedIndexOutput.cpp:39

#8 0x00007fdf40cf8044 in Lucene::CompoundFileWriter::copyFile (this=0x341f25c0, source=..., os=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece.

)
at /home/albert/albert/LucenePlusPlus/src/core/index/CompoundFileWriter.cpp:159
#9 0x00007fdf40cf8aff in Lucene::CompoundFileWriter::close (this=0x341f25c0) at /home/albert/albert/LucenePlusPlus/src/core/index/CompoundFileWriter.cpp:105
#10 0x00007fdf40cd59e6 in Lucene::SegmentMerger::createCompoundFile (this=, fileName=)

at /home/albert/albert/LucenePlusPlus/src/core/index/SegmentMerger.cpp:166

#11 0x00007fdf40c71883 in Lucene::IndexWriter::mergeMiddle (this=, merge=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece.

)
at /home/albert/albert/LucenePlusPlus/src/core/index/IndexWriter.cpp:3115
#12 0x00007fdf40c5fff1 in Lucene::IndexWriter::merge (this=0x3316fef0, merge=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece.

) at /home/albert/albert/LucenePlusPlus/src/core/index/IndexWriter.cpp:2660
#13 0x00007fdf40b95aeb in Lucene::SerialMergeScheduler::merge (this=, writer=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece.

)
at /home/albert/albert/LucenePlusPlus/src/core/index/SerialMergeScheduler.cpp:25
#14 0x00007fdf40c53251 in Lucene::IndexWriter::maybeMerge (this=, maxNumSegmentsOptimize=, optimize=)

at /home/albert/albert/LucenePlusPlus/src/core/index/IndexWriter.cpp:1332

#15 0x00007fdf40c62070 in Lucene::IndexWriter::prepareCommit (this=0x3316fef0, commitUserData=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece.

lucene++ 3.0.5 claims to be 3.0.3.4

I tried to build poedit which states liblucene++ 3.0.5 as a build dependency, so I built lucene++ from the 3.0.5 release and installed. However poedit still refused to build as it said that lucene only was version 3.0.3.4.

In the CMakeLists.txt for the 3.0.5 release there are indeed these numbers:

SET(LUCENE++_VERSION_MAJOR "3")
SET(LUCENE++_VERSION_MINOR "0")
SET(LUCENE++_VERSION_REVISION "3")
SET(LUCENE++_VERSION_PATCH "4")
...
SET(LUCENE++_VERSION "${LUCENE++_VERSION_MAJOR}.${LUCENE++_VERSION_MINOR}.${LUCENE++_VERSION_REVISION}.${LUCENE++_VERSION_PATCH}")

As I can see that this is no longer the issue in git master, it is possible to solve this by either releasing a 3.0.5.1 that just fixes the version number or release git master as 3.0.6.

Clarify usage for "int64_t"?

How should the following error message be resolved?

elfring@Sonne:~/Projekte/Bau/Lucene++> LANG=C make
[  1%] Built target googletest
[  1%] Building CXX precompiled header src/core/cotire/lucene++_CXX_prefix.hxx.gch
<command-line>:0:7: warning: missing whitespace after the macro name [enabled by default]
In file included from /home/elfring/Projekte/Lucene++/lokal/src/core/include/LuceneInc.h:18:0,
                 from /home/elfring/Projekte/Bau/Lucene++/src/core/cotire/lucene++_CXX_prefix.hxx:4:
/home/elfring/Projekte/Lucene++/lokal/include/Lucene.h:33:14: error: 'int64_t' is already declared in this scope
 using boost::int64_t;
              ^
CMake Error at /home/elfring/Projekte/Lucene++/lokal/cmake/cotire.cmake:1522 (message):
  Error 1 precompiling
  /home/elfring/Projekte/Bau/Lucene++/src/core/cotire/lucene++_CXX_prefix.hxx.
Call Stack (most recent call first):
  /home/elfring/Projekte/Lucene++/lokal/cmake/cotire.cmake:2892 (cotire_precompile_prefix_header)


make[2]: *** [src/core/cotire/lucene++_CXX_prefix.hxx.gch] Error 1
make[1]: *** [src/core/CMakeFiles/lucene++.dir/all] Error 2
make: *** [all] Error 2

Missing StringUtils::toString(genA) in SegmentInfos.cpp

Hi,
I think I found an error in your code which was causing a Segfault for a certain index size, in

LucenePlusPlus/src/core/index/SegmentInfos.cpp:425

is:
segmentInfos->message(L"directory listing genA=" + genA);

probably should be:
segmentInfos->message(L"directory listing genA=" + StringUtils::toString(genA));

BTW, great tool.

Please install doc in a more general directory

Hi, do you have any good rationale for installing lucene in this way?
INSTALL(DIRECTORY "${PROJECT_BINARY_DIR}/doc/html/" DESTINATION share/doc/lucene++-${lucene++_VERSION})

I think something like usr/share/doc/liblucene++-doc/html is more general and appropriate...

3.0.6 cotire and headers issues

When I compile rel3.0.6:

https://gist.github.com/rezso/40dc1591d2826fab2b0d

After compiling and installing, I tried to build poedit 1.6.5:

tm/transmem.cpp:43:20: fatal error: Lucene.h: No such file or directory

So I "installed" the headers from top-level include directory:
cp include/*.h /usr/include/lucene++

After this, I tried to build poedit again:

In file included from tm/transmem.cpp:43:0:
/usr/include/lucene++/Lucene.h:10:20: fatal error: Config.h: No such file or directory

This issue comes from lucene++ 3.0.6, I can compile poedit with lucene++ 3.0.5 without any errors.

MultiLevelSkipListReader doesn't handle large offsets even with cast removed

Hi,

I use a big base with 200 go of data & 80 000 000 of docs. I hit segfault during search, with phrase, even with last fix on MultiLevelSkipListReader applied. Sound like another 32/64 bits limits but can't find it. It try too to change pos to int64_t but this doesn't resolve my problem. Someone, a little idea ? Thanks !
#0 0x00007ff0ea44d725 in operator at /home/albert/albert/LucenePlusPlus/include/Array.h:116
#1 Lucene::SkipBuffer::readByte (this=0x3175ba40) at /home/albert/albert/LucenePlusPlus/src/core/index/MultiLevelSkipListReader.cpp:222
#2 0x00007ff0ea4b3825 in Lucene::IndexInput::readLong (this=0x3175ba40) at /home/albert/albert/LucenePlusPlus/src/core/store/IndexInput.cpp:54
#3 0x00007ff0ea3249c0 in Lucene::DefaultSkipListReader::readSkipData (this=0x7ff0e67ccb00, level=2, skipStream=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece or DW_OP_bit_piece.

)
at /home/albert/albert/LucenePlusPlus/src/core/index/DefaultSkipListReader.cpp:90
#4 0x00007ff0ea44e317 in Lucene::MultiLevelSkipListReader::loadNextSkip (this=0x3175bf90, level=2)

at /home/albert/albert/LucenePlusPlus/src/core/index/MultiLevelSkipListReader.cpp:105
#5 0x00007ff0ea44d8cf in Lucene::MultiLevelSkipListReader::skipTo (this=0x3175bf90, target=20491)

at /home/albert/albert/LucenePlusPlus/src/core/index/MultiLevelSkipListReader.cpp:73
#6 0x00007ff0ea478106 in operator-> (this=0x3175abd0, target=20491) at /usr/local/include/boost/smart_ptr/shared_ptr.hpp:418
#7 Lucene::SegmentTermDocs::skipTo (this=0x3175abd0, target=20491) at /home/albert/albert/LucenePlusPlus/src/core/index/SegmentTermDocs.cpp:223
#8 0x00007ff0ea472cc8 in shared_count (this=0x3175ba40, target=2) at /usr/local/include/boost/smart_ptr/detail/shared_count.hpp:223
#9 shared_ptr (this=0x3175ba40, target=2) at /usr/local/include/boost/smart_ptr/shared_ptr.hpp:169
#10 top (this=0x3175ba40, target=2) at /home/albert/albert/LucenePlusPlus/include/PriorityQueue.h:126
#11 Lucene::MultipleTermPositions::skipTo (this=0x3175ba40, target=2) at /home/albert/albert/LucenePlusPlus/src/core/index/MultipleTermPositions.cpp:69
#12 0x00007ff0ea147b93 in Lucene::PhrasePositions::skipTo (this=0x66c9b0, target=2) at /home/albert/albert/LucenePlusPlus/src/core/search/PhrasePositions.cpp:43
#13 0x00007ff0ea145d17 in Lucene::PhraseScorer::doNext (this=0x66c8c0) at /home/albert/albert/LucenePlusPlus/src/core/search/PhraseScorer.cpp:72
#14 0x00007ff0ea1461e8 in Lucene::PhraseScorer::nextDoc (this=0x66c8c0) at /home/albert/albert/LucenePlusPlus/src/core/search/PhraseScorer.cpp:61
#15 0x00007ff0ea1436c0 in Lucene::Scorer::score (this=0x66c8c0, collector=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece or DW_OP_bit_piece.

) at /home/albert/albert/LucenePlusPlus/src/core/search/Scorer.cpp:31
#16 0x00007ff0ea106c61 in Lucene::IndexSearcher::search (this=0x667210, weight=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece or DW_OP_bit_piece.

) at /home/albert/albert/LucenePlusPlus/src/core/search/IndexSearcher.cpp:131
#17 0x00007ff0ea10797e in Lucene::IndexSearcher::search (this=0x667210, weight=, filter=, n=)

at /home/albert/albert/LucenePlusPlus/src/core/search/IndexSearcher.cpp:106
#18 0x00007ff0ea1adc30 in Lucene::Searcher::search (this=0x667210, query=, filter=, n=100)

at /home/albert/albert/LucenePlusPlus/src/core/search/Searcher.cpp:41

Something like 200 Go of data for 70 000 000 docs.

-rw-rw-rw- 1 alian users 148641417159 22 janv. 16:11 _h75q.fdt
-rw-rw-rw- 1 alian users 559054868 22 janv. 16:11 _h75q.fdx
-rw-rw-rw- 1 alian users 136 22 janv. 15:18 _h75q.fnm
-rw-rw-rw- 1 alian users 6262872761 22 janv. 16:56 _h75q.frq
-rw-rw-rw- 1 alian users 8735241 28 janv. 11:24 _h75q_h.del
-rw-rw-rw- 1 alian users 628936726 22 janv. 16:56 _h75q.nrm
-rw-rw-rw- 1 alian users 3391246405 22 janv. 16:56 _h75q.prx
-rw-rw-rw- 1 alian users 36898360 22 janv. 16:56 _h75q.tii
-rw-rw-rw- 1 alian users 3166106102 22 janv. 16:56 _h75q.tis
-rw-rw-rw- 1 alian users 7837 28 janv. 07:39 _h8fg_9.del
-rw-rw-rw- 1 alian users 149718875 22 janv. 21:15 _h8fg.cfs
-rw-rw-rw- 1 alian users 7409 28 janv. 07:39 _hbv1_6.del
-rw-rw-rw- 1 alian users 142550728 23 janv. 11:13 _hbv1.cfs
-rw-rw-rw- 1 alian users 7278 28 janv. 07:39 _hhji_4.del
-rw-rw-rw- 1 alian users 140427386 23 janv. 22:51 _hhji.cfs
-rw-rw-rw- 1 alian users 7425 26 janv. 07:17 _hk3j_4.del
-rw-rw-rw- 1 alian users 142586864 24 janv. 05:13 _hk3j.cfs
-rw-rw-rw- 1 alian users 50 26 janv. 19:06 _hosg_4.del
-rw-rw-rw- 1 alian users 149351158 24 janv. 07:42 _hosg.cfs
-rw-rw-rw- 1 alian users 7344 28 janv. 07:39 _hut8_4.del
-rw-rw-rw- 1 alian users 140612269 25 janv. 16:08 _hut8.cfs
-rw-rw-rw- 1 alian users 7140 28 janv. 07:39 _j619_4.del
-rw-rw-rw- 1 alian users 136371245 25 janv. 18:33 _j619.cfs
-rw-rw-rw- 1 alian users 7331 28 janv. 07:39 _jh0e_3.del
-rw-rw-rw- 1 alian users 144042436 25 janv. 19:05 _jh0e.cfs
-rw-rw-rw- 1 alian users 7055 28 janv. 07:39 _jp6n_4.del
-rw-rw-rw- 1 alian users 135794813 26 janv. 09:21 _jp6n.cfs
-rw-rw-rw- 1 alian users 7160 28 janv. 07:39 _jv9s_1.del
-rw-rw-rw- 1 alian users 134653421 27 janv. 06:36 _jv9s.cfs
-rw-rw-rw- 1 alian users 32 28 janv. 11:24 _jy5d_1.del
-rw-rw-rw- 1 alian users 147401672 28 janv. 07:26 _jy5d.cfs
-rw-rw-rw- 1 alian users 1024353 28 janv. 11:23 _jy6q.cfs
-rw-rw-rw- 1 alian users 9 28 janv. 11:24 _jy6r_1.del
-rw-rw-rw- 1 alian users 2914 28 janv. 11:24 _jy6r.cfs
-rw-rw-rw- 1 alian users 207 28 janv. 17:20 _jy6z.cfs
-rw-rw-rw- 1 alian users 198 29 janv. 10:23 _jy73.cfs
-rw-rw-rw- 1 alian users 20 29 janv. 10:23 segments.gen
-rw-rw-rw- 1 alian users 1955 29 janv. 10:23 segments_nw
-rw-rw-rw- 1 alian users 0 29 janv. 11:24 write.lock

Don't take dependencies on unnecessary Boost libraries

In order to reduce compiled size, we need to minimize Boost usage.

The general guideline should be: use Boost for X if we use this a lot, Boost is the only reliable way of doing X, or it is easier with Boost and it does not add significant overhead to the project.

In any case, try to reference headers (as opposed to link to binaries), and don't let contribs or addons cause the core to be more dependent than it should be.

Examples for this is boost_regex, which is hardly used but is a strong dependency, and boost_unit_test_framework which shouldn't be tied with the core, only the test suite.

/cc @ustramooner

Add remaining contrib language analysers

Bulgarian, Catalan, Danish, English, Spanish, Basque, Finnish, Galician, Hindi, Hungarian, Armenian, Indonesian, Indic, Italian, Norwegian, Portuguese, Romanian, Swedish, Turkish

Remove #ifdef _DEBUG from header files

Could be potentially be dangerous - If you don't match up the definitions for the compiled version and the app you link to, then you can have problems.

Build issue using Visual Studio

As per mentioned on build section I tried to build luceneplusplus using VS solutions given under folder src\msvc, But I was getting error for Config.h then I got answer that I need to use cmake to generate Config.h file. I have used two methods for that, first I used command line and second I used CMake GUI. In both case I am able to see Config.h. Now when I try to build using VS solutions given at above mentioned location I am getting error "c1xx : fatal error C1083: Cannot open source file: '..\util\nedmalloc\nedmalloc.c': No such file or directory". I can see lucene++.sln file in root directory also and I tried to use that to build only lucene++ project but I was getting lot of linking error for .lib files. I added those in Input properties of Visual Studio Project and resolved those linking error but I couldn't see .dll or .lib file for lucene++ at location mentioned in Visual Studio Project. Please let me know how to successfully build lucene++ on windows using Visual Studio Solutions. Let me know if more detail is required.

Slim down the shared library size

Building HEAD (at the time of writing, revision 277b8d1) results in a 130 MB shared library:

arto@ubuntu:/src/LucenePlusPlus$ ls -l bin
total 161008
-rw-rw-r-- 1 arto arto    275976 Sep 12 03:15 liblucene++-c.a
lrwxrwxrwx 1 arto arto        24 Sep 12 03:34 liblucene++-contrib.so -> liblucene++-contrib.so.0
lrwxrwxrwx 1 arto arto        30 Sep 12 03:34 liblucene++-contrib.so.0 -> liblucene++-contrib.so.3.0.3.4
-rwxrwxr-x 1 arto arto  25568778 Sep 12 03:34 liblucene++-contrib.so.3.0.3.4
lrwxrwxrwx 1 arto arto        16 Sep 12 03:31 liblucene++.so -> liblucene++.so.0
lrwxrwxrwx 1 arto arto        22 Sep 12 03:31 liblucene++.so.0 -> liblucene++.so.3.0.3.4
-rwxrwxr-x 1 arto arto 136893535 Sep 12 03:31 liblucene++.so.3.0.3.4

This was built on an Ubuntu 12.10 (x86-64) host with GCC 4.7.2, libstdc++, and Boost 1.50 using cmake . && make.

130 MB seems a tad large, on the order of an XXXL size, for a shared library. Any ideas as to what might be contributing to bloating up the library so much, and how it might be slimmed back down? I'd be happy to contribute patches if there is something to be done about it.

Lucene new release

Hi, I don't know how to run tests, in debian seems that make test doesn't work

" dh_auto_test -O-- -O--parallel -O--fail-missing
make[1]: Entering directory '/tmp/buildd/lucene++-3.0.6/obj-x86_64-linux-gnu'
Running tests...
/usr/bin/ctest --force-new-ctest-process -j1
Test project /tmp/buildd/lucene++-3.0.6/obj-x86_64-linux-gnu
No tests were found!!!
"

Anyway there still is an issue with e license that is preventing the package from reaching debian:
Files: src/core/util/unicode/*
Copyright: 1999 Tom Tromey
2000 Red Hat, Inc.
2009-2011 Alan Wright
License: Apache-2.0 or LGPL-3+ or LGPL-2+

Apache and LGPL are NOT compatible
http://www.apache.org/legal/resolved.html
GNU LGPL
The LGPL is ineligible primarily due to the restrictions it places on larger works, violating the third license criterion. Therefore, LGPL-licensed works must not be included in Apache products.

StandardAnalyzer exception/segfault

There is a bug in the StandardAnalyzer, with a 50/50 chance of causing either an exception or a segfault.
The exception that is thrown is "could not match input". Seems to be triggered by arabian letters.
To reproduce the error, unpack the .gz and put it alongside the .cpp file linked below.
Verified on redhat, debian and ubuntu.

http://pastebin.com/bdN5tBtc
https://www.dropbox.com/s/lp0nz4im2v3pvp9/wiki_subset.gz

Core was generated by `./wikitest'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fbe43ddf7d2 in Lucene::StandardTokenizerImpl::getNextToken() () from /free/sdk/lucenepp-3.0.6/lib/liblucene++.so.0
(gdb) bt
#0  0x00007fbe43ddf7d2 in Lucene::StandardTokenizerImpl::getNextToken() () from /free/sdk/lucenepp-3.0.6/lib/liblucene++.so.0
#1  0x00007fbe43ddbb14 in Lucene::StandardTokenizer::incrementToken() () from /free/sdk/lucenepp-3.0.6/lib/liblucene++.so.0
#2  0x00007fbe43ddcead in Lucene::StandardFilter::incrementToken() () from /free/sdk/lucenepp-3.0.6/lib/liblucene++.so.0
#3  0x00007fbe43ded809 in Lucene::LowerCaseFilter::incrementToken() () from /free/sdk/lucenepp-3.0.6/lib/liblucene++.so.0
#4  0x00007fbe43df53da in Lucene::StopFilter::incrementToken() () from /free/sdk/lucenepp-3.0.6/lib/liblucene++.so.0
#5  0x00007fbe43f3d25c in Lucene::DocInverterPerField::processFields(Lucene::Collection<boost::shared_ptr<Lucene::Fieldable> >, int) () from /free/sdk/lucenepp-3.0.6/lib/liblucene++.so.0
#6  0x00007fbe43f4364d in Lucene::DocFieldProcessorPerThread::processDocument() () from /free/sdk/lucenepp-3.0.6/lib/liblucene++.so.0
#7  0x00007fbe43f98ea0 in Lucene::DocumentsWriter::updateDocument(boost::shared_ptr<Lucene::Document> const&, boost::shared_ptr<Lucene::Analyzer> const&, boost::shared_ptr<Lucene::Term> const&) ()
   from /free/sdk/lucenepp-3.0.6/lib/liblucene++.so.0
   #8  0x00007fbe43f9925e in Lucene::DocumentsWriter::addDocument(boost::shared_ptr<Lucene::Document> const&, boost::shared_ptr<Lucene::Analyzer> const&) () from /free/sdk/lucenepp-3.0.6/lib/liblucene++.so.0
   #9  0x00007fbe43e7cd55 in Lucene::IndexWriter::addDocument(boost::shared_ptr<Lucene::Document> const&, boost::shared_ptr<Lucene::Analyzer> const&) () from /free/sdk/lucenepp-3.0.6/lib/liblucene++.so.0
   #10 0x0000000000427126 in index(boost::shared_ptr<Lucene::Analyzer>) ()
   #11 0x0000000000428031 in wiki() ()
   #12 0x00000000004251a2 in main ()

Please add support for debian kFreeBSD

build log

https://buildd.debian.org/status/fetch.php?pkg=lucene%2B%2B&arch=kfreebsd-amd64&ver=3.0.6-1&stamp=1412941730

../core/liblucene++.so.3.0.6: undefined reference to `Lucene::Constants::OS_NAME'

According to https://lists.debian.org/debian-bsd/2006/03/msg00127.html

maybe you just need to add "defined(FreeBSD_kernel)"
to your src/core/util/Constants.cpp file

I see two issues:

  1. a failure like this one seems to be bad, maybe just ad "unknown OS"? (seems to be just debug)
  2. why don't handle in cmake?
if(${CMAKE_SYSTEM_NAME} MATCHES "Linux")
    set(OS_LINUX 1)
elseif(${CMAKE_SYSTEM_NAME} MATCHES "FreeBSD")
    set(OS_BSD 1)
    set(OS_BSD_FREE 1)
elseif(${CMAKE_SYSTEM_NAME} MATCHES "NetBSD")
    set(OS_BSD 1)
    set(OS_BSD_NET 1)
elseif(${CMAKE_SYSTEM_NAME} MATCHES "OpenBSD")
    set(OS_BSD 1)
    set(OS_BSD_OPEN 1)
elseif(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
    set(OS_DARWIN 1)
elseif(${CMAKE_SYSTEM_NAME} MATCHES "SunOS")
    set(OS_SOLARIS 1)
elseif(${CMAKE_SYSTEM_NAME} MATCHES "GNU")
    set(OS_GNU 1)
elseif(MINGW)
    set(OS_MINGW 1)
    set(OS_WINDOWS 1)
elseif(CYGWIN)
    set(OS_CYGWIN 1)
    set(OS_WINDOWS 1)
else()
    message(FATAL_ERROR "Operating system not supported")
endif()

please look at ettercap source code
https://github.com/Ettercap/ettercap

Please clarify BSD license

./src/contrib/analyzers/common/analysis/ar/ArabicAnalyzer.cpp
./src/contrib/analyzers/common/analysis/fa/PersianAnalyzer.cpp

In order to have lucene++ in debian I would like to know the LICENSE (they both say BSD) of the two files above, and/or how to generate them at build-time.

thanks

Config.h file is missing for Visual Studio build

I have downloaded source and tried to build with visual studio solution provided with source as per mentioned in How to Build section of window. Issue is I can see Config.h.cmake file but I couldn't find generated Config.h file. Due to this I am not able to build Lucene++ DLL or static library using visual studio. Please guide me how to overcome this issue and build Lucene++ on windows.

Make demos optional

Please make building demos optional: move add_subdirectory(src/demo) inside if(ENABLE_DEMO)

Master branch missing "nedmalloc" under "utility"

If I download master branch code then "nedmalloc" is missing under "utility" folder which results into build failed for Visual Studio solution. Also some folder like "lib" and "bin" are missing at root level so post build event in visual studio to copy built .lib,.dll and .exe gets failed. So user has to create lib and bin at roo level. Either this should be documented in Build Steps or include these folder in master branch. This may be minor issue but missing "nedmalloc" in master branch can be treated is major point. Currently I have copied "nedmalloc" from "dev" branch. This resolves my issue which I raised for visual studio build.

Compare error in core\util\UTF8Stream.cpp

inline bool UTF8Decoder::isValidNext(uint32_t& cp)
{
    // Determine the sequence length based on the lead octet
    int32_t length = sequenceLength(cp);
    if (length < 1 && length > 4)    <----- ERROR - must ||

Compile errors with boost_1_51_0

Error listing from VS

Config.h

define BOOST_FILESYSTEM_VERSION 3


1>.\util\FileUtils.cpp(167) : error C2039: 'wdirectory_iterator' : is not a member of 'boost::filesystem'
1>.\util\FileUtils.cpp(207) : error C2039: 'directory_string' : is not a member of 'boost::filesystem::path'

and more...

CMake build script

Ben has a cmake build script in his branch, can it be merged in here?

Fix SIGABRT in the searchfiles demo program

Attempting to use the supplied bin/searchfiles example program results in the program aborting with SIGABRT:

arto@ubuntu:/src/LucenePlusPlus/bin$ mkdir input output
arto@ubuntu:/src/LucenePlusPlus/bin$ echo 'Hello, world' > input/hello.txt
arto@ubuntu:/src/LucenePlusPlus/bin$ ./indexfiles input output
Indexing to directory: output...
Adding [1]: hello.txt
Index time: 4 milliseconds
Optimizing...
Optimize time: 2 milliseconds
Total time: 6 milliseconds

arto@ubuntu:/src/LucenePlusPlus/bin$ ./searchfiles -index output
Enter query: hello
Searching for: hello
1 total matching documents
1. input/hello.txt
*** glibc detected *** ./searchfiles: free(): invalid pointer: 0x00007fbb260e1b40 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7fbb24a83b96]
./searchfiles[0x405dbe]
./searchfiles[0x4086ee]
./searchfiles[0x40686b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fbb24a2676d]
./searchfiles[0x407509]
======= Memory map: ========
...
Aborted

This was built on an Ubuntu 12.10 (x86-64) host with GCC 4.7.2, libstdc++, and Boost 1.50 using cmake . && make && make indexfiles searchfiles deletefiles.

Compile errors when using clang

Error Listing while ./waf build:
...............
/usr/include/wctype.h:302:60: error: invalid token after top level declarator
extern wint_t towupper_l (wint_t __wc, __locale_t __locale) __THROW;
^
;
/usr/include/wctype.h:310:8: error: unknown type name 'wint_t'
extern wint_t towctrans_l (wint_t __wc, wctrans_t __desc,
^
/usr/include/wctype.h:310:28: error: use of undeclared identifier 'wint_t'
extern wint_t towctrans_l (wint_t __wc, wctrans_t __desc,
^
/usr/include/wctype.h:311:27: error: invalid token after top level declarator
__locale_t __locale) __THROW;
^

Refactor clone() methods to use copy constructors

The current design of cloning objects is far from ideal at the moment, since mistakes can be made when attempting to clone an object that doesn't have a clone() override. As a consequence clone() can sometime return an object of a base type, and not of the object itself.

QueryParser Date Ranges don't work with locales

On my machine (ubuntu 10.10) and some other ubuntu versions that i've tried, the QueryParser fails to parse dates according to the date order of the locale.

I've tracked this down to the DateTools call to std::use_facet< std::time_get<wchar_t> >(locale).date_order(), which always returns NO_ORDER (which defaults the date order to YMD). See below for some code, for me it always returns "Date order for locale en_US.utf8 is No order". Does this work for anyone? Perhaps another approach is required for this functionality?

#include <iostream>
#include <locale>
#include <string>

using namespace std;

int main( ) {

   cin.imbue(locale("en_US.utf8"));

   const time_get<char>& dateReader =
     use_facet<time_get<char> >(cin.getloc( ));

   time_base::dateorder d = dateReader.date_order( );

   string s;

   switch (d) {
   case time_base::no_order:
     s = "No order";
     break;
   case time_base::dmy:
     s = "day/month/year";
     break;
   case time_base::mdy:
     s = "month/day/year";
     break;
   case time_base::ymd:
     s = "year/month/day";
     break;
   case time_base::ydm:
     s = "year/day/month";
     break;
   }

   cout << "Date order for locale " << cin.getloc( ).name( )
        << " is " << s << endl;
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.