pmundkur / libcrm114 Goto Github PK

C library version of CRM114, and a Python binding

Home Page: http://crm114.sourceforge.net/wiki/doku.php?id=download

Objective-C 2.27% C 78.82% C++ 9.24% Python 9.67%

libcrm114's Introduction

From http://crm114.sourceforge.net/wiki/doku.php?id=download:

CRM114 C-callable Library

This is the callable library version of CRM114. It has most of the
classifiers as the standalone language (with some significant
improvements- one alpha tester says they saw a 10x speedup in their
application). This version is LGPLed (Library GPL) so you can link it
with your own code, whether open-source or proprietary. You still need
TRE (on Fedora, “yum install tre-devel”). Note that with improvements
come costs: libcrm114 classifiers are NOT compatible with standalone
CRM114 class files (necessary, because libcrm114 classifiers can work
even on systems that don't have filesystems, like embedded
processors). The code is now pretty stable and the API solidly
entrenched by use in several real products, so the api is unlikely to
change in unpleasant ways.

Advantages of libcrm114: It's much faster; everything is
in-memory. You can call everything directly from ANSI C. Because
everything is in memory, it's good for embedded systems where you
don't _have_ a unix-style file system to talk to. No arcane language
to learn, it's all just ANSI C. You can export classifiers as ASCII
“CSV-like” format so trained classifiers are 32/64-bit portable and
cross-platform Linux/Mac/Windows portable (the internal binary
classifier format is still tied to a particular architecture, but
that's never exported any more).

Disadvantages of libcrm114: Not all classifiers are currently
supported (in particular, Neural Net, Correllator, OSBF, and Winnow
are NOT yet supported). There's no crazy language, so you need to get
your data into memory on your own. You still need TRE. You do pay a
(not horrible) startup cost loading a classifier from a an ASCII
CSV-like file, but since you can then reuse the classifier for as many
documents as you want, in the long term this cost is amortized down to
zero and you get significant speedup.


Dependencies

Debian/Ubuntu: libtre5, libtre-dev

Building

$ make && cd python && python setup build

libcrm114's People

Contributors

Stargazers

Watchers

Forkers

jflatow mschuett tmielika atiaxi alisaifee ckwang8128

libcrm114's Issues

Adding support for Control Block read and write in binary format.

The python binding should add support for binary read and dump.

As per doc/HOWTO.txt:

In either case, just write p_db->datablock_size bytes of data to the
disk when saving the db, and read the full saved file size into memory
to restore the db.  For example:

   FILE *myfile;
   myfile = fopen ("/home/me/my_db_file.crm", "wb");
   if (!myfile) { printf ("Couldn't open\n"); exit };
   fwrite (p_db, 1, p_db->datablock_size, myfile);
   fclose (myfile);

and the corresponding code for reading the db back in.

This seems to be a good idea because it would lead to a good speed-up, and less space used:

Quoting again the doc/HOWTO.txt:

    //    *** OPTIONAL *** Here's how to read and write the datablocks as 
    //    ASCII text files.  This is NOT recommended for storage (it's ~5x bigger
    //    than the actual datablock, and takes longer to read in as well, 
    //    but rather as a way to debug datablocks, or move a db portably
    //    between 32- and 64-bit machines and between Linux and Windows.

Linking fails under Mac OS X 10.7.2

Changed Makefile:
$(CC) -shared -Wl,-soname,$(LIB_NAME) -o lib/$(LIB_NAME) $(LIBOBJS)
to be:
$(CC) -shared -Wl,-dylib_install_name,$(LIB_NAME) -o lib/$(LIB_NAME) $(LIBOBJS)

While running 'make' it gives:

cc -c -g -Iinclude -fpic -std=c99 -pedantic -Wall -Wextra -Wpointer-arith -Wstrict-prototypes -Wno-sign-compare -Wno-overlength-strings -o lib/crm114_regex_tre.o lib/crm114_regex_tre.c
lib/crm114_regex_tre.c: In function ‘crm114__regfree’:
lib/crm114_regex_tre.c:61: warning: ISO C forbids conversion of function pointer to object pointer type
cc -shared -Wl,-dylib_install_name,libcrm114.so.1 -o lib/libcrm114.so.1  lib/crm114_base.o  lib/crm114_markov.o  lib/crm114_markov_microgroom.o  lib/crm114_bit_entropy.o  lib/crm114_hyperspace.o  lib/crm114_svm.o  lib/crm114_svm_lib_fncts.o  lib/crm114_svm_quad_prog.o  lib/crm114_fast_substring_compression.o  lib/crm114_pca.o  lib/crm114_pca_lib_fncts.o  lib/crm114_matrix.o  lib/crm114_matrix_util.o  lib/crm114_datalib.o  lib/crm114_vector_tokenize.o  lib/crm114_strnhash.o  lib/crm114_util.o  lib/crm114_regex_tre.o
Undefined symbols for architecture x86_64:
  "_tre_regncomp", referenced from:
      _crm114__regncomp in crm114_regex_tre.o
  "_tre_regnexec", referenced from:
      _crm114__regnexec in crm114_regex_tre.o
  "_tre_regerror", referenced from:
      _crm114__regerror in crm114_regex_tre.o
  "_tre_regfree", referenced from:
      _crm114__regfree in crm114_regex_tre.o
  "_tre_free", referenced from:
      _crm114__regfree in crm114_regex_tre.o
ld: symbol(s) not found for architecture x86_64
collect2: ld returned 1 exit status
make: *** [lib/libcrm114.so] Error 1```

I suppose there is something to be added for Makefile to make shared library (tre) to be loaded?

pmundkur / libcrm114 Goto Github PK

libcrm114's Introduction

libcrm114's People

Contributors

Stargazers

Watchers

Forkers

libcrm114's Issues

Adding support for Control Block read and write in binary format.

Linking fails under Mac OS X 10.7.2

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent