Code Monkey home page Code Monkey logo

rosie-pattern-language's People

Contributors

jamiejennings avatar pkulchenko avatar subzidion avatar veratil avatar vmorris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rosie-pattern-language's Issues

[RFC] Standard RPL Library Refactoring

Proposal

Separate the RPL library definitions into a collection of namespaces for easier use and clarity of identifiers.

Solution

Rework the current rpl files into descriptive filenames and identifiers with namespace prefixes.
Namespaces are to be short names of high level collections.

Currently we have the files:

basic.rpl
common.rpl
csv1.rpl
csv.rpl
datetime.rpl
json.rpl
network.rpl
rfc3986.rpl (URI-exact matching over generic network.rpl)
spark.rpl

along with a few others (grep.rpl, language-comments.rpl, rosie.rpl, rpl-1.0.rpl, rpl-1.1.rpl). These all define a variety of identifiers with and without namespace prefixes.

These files will be reworked into namespaces such as: net, date, time, num, str, etc.

  • network.rpl will be refined into the net namespace.
  • rfc3986.rpl will be merged under the net namespace.
  • datetime.rpl will be split into date and time namespaces, date will contain the "datetime" matching types.
  • common.rpl will be split into num, str, and whatever else namespaces.
  • csv[1].rpl, json.rpl, spark.rpl can be put under a lang namespace to further separate identifiers.
  • Any languages can be put under their own lang.[name] namespace, such as lang.c, lang.json, lang.spark, etc.

Namespace Patterns

core This is global namespace
net
str
num
os
lang
date
time
datetime Future implementation dependent on import

RPM and Debian packaging would be very helpful

In order to make this package more available to developers and people putting together containers, for example, it would be very helpful to have RPL packaged up in at least a source package format (SRPM for rpm-style packages -- I'm not sure what the equivalent is for Debian).

Once the source RPM is available, Linux distros can start making binary builds available through their own repositories, and until then, binary RPM's can be built and maintained in community sites.

Personally, I look forward to having Rosie Pattern Language as an RPM for Fedora, so that it's super easy to install and use.

Question about normalization/transformation

From https://github.com/jamiejennings/rosie-pattern-language/blob/3d85fe5350bb6caeb802b8b778d9273871a9f44d/doc/raisondetre.md, it is said Rosie can be instructed to do transformations/normalizations

"In addition to simply recognizing items of interest in the input, Rosie can be instructed to transform values, enumerate values, and perform other operations while processing each line. Normalization is one kind of transformation, e.g. timestamps in various formats may be normalized to an integer number of milliseconds since the epoch. Sanitizing, such as encryption of sensitive fields, is another kind of transformation."

For example, how to get timestamp in millis from Rosie? Could some one please provide an example?
Can that be achieved using rpl commands or do we have to write extensions in lua?

Thank you

BREW package

Two people who attended my talk at All Things Open asked about a Brew package to make it easier to install Rosie on OS X.

Support non-ASCII characters in RPL identifiers

Today, RPL identifiers can contain only posix [:alnum:]) and underscore. Expand the set of characters that can be used in identifiers to be the Unicode general character property called Letters (L), plus underscore.

RPL module system

Add these to RPL:

  1. package declaration
  2. import declaration
  3. export declaration

And reorganize the rpl directory to reflect a logical organization of different kinds of patterns.

Command line long options

I believe using --option to denote a multi-character option would be more synonymous to current standards seen everywhere rather than the -option currently.

Added benefit would be the ability to use single character options for reduced typing, such as -h instead of -help. Many people will start by typing either -h or --help (like I did) only to see it not actually print help, but an error for invalid command line option.

Suggested arguments:

  • -h, --help
  • -v, --verbose (Aside: Could use verbosity levels, more verbose with more v's, -vvv)
  • --patterns (Since this does a specific function just long option)
  • -i, --repl (or --interactive)
  • --info (It looks like this just outputs installation information, so no single option I think is needed)
  • -o, --encode (-o for output)
  • -s, --wholefile (I suggest -s for 'single string', and possible rename this long option)
  • -a, --all (This could be renamed, or possibly combined with verbose)
  • -e, --eval (I realize this conflicts with -e, but I haven't gotten this to do what it says it does yet anyway)
  • -g, --grep
  • -m, --manifest
  • -f, --load (Suggested long option from description of functionality. If using load as long option, change to -l)
  • -e, --rpl (Suggested long option from description of functionality. If using rpl as long option, change to -r. Also, will have to be changed from conflict with eval's -e)

On a somewhat similar but separate note, much of the command line option parsing could be removed in favor of lua-getopt or, what I'd suggest to help with the above, argparse.

EDIT: argparse can be found here: https://github.com/mpeterv/argparse
lua-getopt can be found here: https://github.com/2ion/lua-getopt

Support colorization and pretty-printing as modular operations

The CLI could support accepting Rosie JSON as input, with one command to colorize it, and another command to pretty-print it.

Use case: Run rosie, generate JSON output, filter the JSON using any tool at all, then want to examine the results. These results are JSON objects, 1 object per line, and hard to read. (That's why Rosie colorizes output, to make it easy to see matches.)

Solution: Expose the rosie colorizing feature such that it can take rosie JSON as input. Similarly, expose the rjsonpp pretty-printer (which is not part of rosie today) as a rosie CLI feature.

Cannot inline double square bracket (]]) with -e option

Attempting to do:

$ rosie -e 'lua_ident = {[[:alpha:]] / "_" / "." / ":"}+' -patterns
/usr/local/share/rosie/bin/lua: (command line):1: unexpected symbol near '}'

Turning on bash tracing:

$ rosie -e 'lua_ident = {[[:alpha:]] / "_" / "." / ":"}+' -patterns
+ [[ -z /usr/local/share/rosie ]]
+ [[ -n '' ]]
+ home=/usr/local/share/rosie
+ ROSIE_SCRIPT_HOME=/usr/local/share/rosie
+ shift
+ executable=/usr/local/share/rosie/bin/lua
++ ps -o args= 1892
+ cmd='bash /usr/local/bin/rosie -e lua_ident = {[[:alpha:]] / "_" / "." / ":"}+ -patterns'
+ [[ ! -d /usr/local/share/rosie ]]
+ [[ ! -x /usr/local/share/rosie/bin/lua ]]
+ i=
+ dev=false
+ [[ -e = \-\D ]]
+ export ROSIE_HOME
+ export ROSIE_SCRIPT_HOME
+ export HOSTNAME
+ export HOSTTYPE
+ export OSTYPE
++ pwd
+ export CWD=/home/klzander/git/veratil-rosie-pattern-language
+ CWD=/home/klzander/git/veratil-rosie-pattern-language
+ /usr/local/share/rosie/bin/lua -e 'ROSIE_HOME=[[/usr/local/share/rosie]]; SCRIPTNAME=[[bash /usr/local/bin/rosie -e lua_ident = {[[:alpha:]] / "_" / "." / ":"}+ -patterns]]; ROSIE_DEV=[[false]];' -- /usr/local/share/rosie/src/run.lua -e 'lua_ident = {[[:alpha:]] / "_" / "." / ":"}+' -patterns
/usr/local/share/rosie/bin/lua: (command line):1: unexpected symbol near '}'

You can see in SCRIPTNAME variable that the ]] in [[:alpha:]] conflicts with the actual long string end token.

Replacing [[:alpha:]] with letter makes the bug go away:

$ rosie -e 'lua_ident = {letter / "_" / "." / ":"}+' -patterns
---snip---
211 patterns

Ubuntu "make linux"

Ubuntu 14.04.4 LTS
clone git repo
~/Documents/Rosie/rosie-pattern-language$ make linux
...
gcc -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_LINUX -c -o loadlib.o loadlib.c
gcc -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_LINUX -c -o linit.o linit.c
ar rcu liblua.a lapi.o lcode.o lctype.o ldebug.o ldo.o ldump.o lfunc.o lgc.o llex.o lmem.o lobject.o lopcodes.o lparser.o lstate.o lstring.o ltable.o ltm.o lundump.o lvm.o lzio.o lauxlib.o lbaselib.o lbitlib.o lcorolib.o ldblib.o liolib.o lmathlib.o loslib.o lstrlib.o ltablib.o lutf8lib.o loadlib.o linit.o
ranlib liblua.a
gcc -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_LINUX -c -o lua.o lua.c
lua.c:80:31: fatal error: readline/readline.h: No such file or directory
#include <readline/readline.h>
^
compilation terminated.
make[3]: *** [lua.o] Error 1
...

Haven't done any troubleshooting yet or looked for a reference to the missing file.

Matching a pattern that is a grammar with no captures throws an error

Description: This case may be rare in practice, but there's an example in the RPL patterns that are part of the Rosie distribution: json.json_discard. The bug occurs only when the match succeeds.

Versions: This bug is present in v0.99i and appears to go all the way back to v0.99a. It was likely the result of a significant change to the compiler that occurred just prior to that.

Work around: Capture something! (E.g. the json pattern works, and it is defined as a capturing version of the alias json_discard.)

The bug looks like this:

jjennings$ rosie json.json_discard /tmp/test.json 
/Users/jjennings/Work/Dev/public/rosie-pattern-language/bin/lua: src/core/common.lua:233: bad argument #1 to 'next' (table expected, got number)
stack traceback:
	[C]: in function 'next'
	src/core/common.lua:233: in function 'common.decode_match'
	src/core/color-output.lua:164: in function 'color_string_from_leaf_nodes'
	src/core/engine.lua:177: in function <src/core/engine.lua:146>
	(...tail calls...)
	...nings/Work/Dev/public/rosie-pattern-language/src/run.lua:303: in function 'process_pattern_against_file'
	...nings/Work/Dev/public/rosie-pattern-language/src/run.lua:354: in function 'run'
	...nings/Work/Dev/public/rosie-pattern-language/src/run.lua:371: in main chunk
	[C]: in ?

Allow -o color to show whole line

It would be nice if the color option still displayed the whole line so you knew where in the line something was found.

for example, the date.any pattern will match this line

server_ip = 192.1.20.255

and with -o color specified displays

2 1 20

Trivial example, but obviously if the line were more complex and still had a match, it might be hard to figure out what was hit. This would be a better output

server_ip = 192.1.20.255

(where those bold numbers would be colored)

Implement an RPL "linter"

A useful tool would be a lint-like program that looks for probable mistakes and violations of best practices.

The linter could start very simple, without any configuration options at all. Some things it might do:

  • Warn that [^:space:] does not find chars that are not whitespace ([:^space:] does)
  • Warn that [abc0-9] does not include 1, 2, ... 8. Use [[abc][0-9]].
  • Warn about patterns defined but not used (or exported, once the module system is available)

Memory leak in librosie.c#bootstrap

Reading the librosie.c code, I think we have a memory leak by not free-ing "name" variable. I've added a fix but since I am not a C expert, I am not sure.

static int bootstrap (lua_State *L, struct rosieL_string *rosie_home) {

     const char *bootscript = "/src/core/bootstrap.lua";

     LOG("About to bootstrap\n");

     char *name = malloc(rosie_home->len + strlen(bootscript) + 1);

     memcpy(name, rosie_home->ptr, rosie_home->len);

     memcpy(name+(rosie_home->len), bootscript, strlen(bootscript)+1); /* +1 copies the NULL terminator */

     int status = luaL_loadfile(L, name);

     if (status != LUA_OK) {

    	 free(name);   // new code

    	 return status;

     }

     status = lua_pcall(L, 0, LUA_MULTRET, 0);

     free(name); // new code

     return status;
}

How to use the nodejs sample

Jamie: Thought I'd work with your nodejs sample and give you back server capable version. However your initial test version fails on line 43 where you attempt to load librosie.so. I cannot find that file on my system, nor does it appear to be created during the make process. Any ideas?

Memory leak in lua ?

Hi,

I am running a bash loop for rosie -repl and seeing the RESIDENT memory and CPU for lua increasing until the program ends. I am noticing similar constant uptick while running librosie via java.

Any reason why this could be happening ?

me@MYMACHINE:~/git/rosie-testing/rosie$ cat rosietest.sh
#!/bin/bash

read -d '' CMD1 <<'ENDCMD1'
foo = [:digit:]+ "!!"
ENDCMD1

read -d '' CMD2 <<'ENDCMD2'
.match foo "42!!"
ENDCMD2

for number in {1..10000}; do
    if [[ $number -eq 1 ]]; then
        echo "$CMD1"
    else
        echo "$CMD2"
    fi
done | rosie -repl

me@MYMACHINE:~/git/rosie-testing/rosie$ time ./rosietest.sh > /dev/null &
[1] 7893
me@MYMACHINE:~/git/rosie-testing/rosie$ This is Rosie v0.99k


me@MYMACHINE:~$ ps -ef | grep rosie | grep lua
me      7922  7919  0  2433 ?        00:00:06 /home/me/git/rosie-pattern-language/bin/lua -e ROSIE_HOME=[[/home/me/git/rosie-pattern-language]]; SCRIPTNAME=[====[bash /usr/local/bin/rosie -repl]====]; ROSIE_DEV=[[false]]; -- /home/me/git/rosie-pattern-language/src/run.lua -repl


me@MYMACHINE:~$ pidstat -p 7922 -r 10
03:44:48 PM   UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
03:44:58 PM  1000      7922      0.00      0.00 687657100 1133672   6.80  lua
03:45:08 PM  1000      7922      0.00      0.00 687819208 1295780   7.78  lua
03:45:18 PM  1000      7922      0.00      0.00 687976016 1452588   8.72  lua
03:45:28 PM  1000      7922      0.00      0.00 688079324 1555892   9.34  lua
03:45:38 PM  1000      7922      0.00      0.00 688178776 1655348   9.93  lua
03:45:48 PM  1000      7922      0.00      0.00 688283016 1759588  10.56  lua
03:45:58 PM  1000      7922      0.00      0.00 688380724 1857296  11.15  lua
03:46:08 PM  1000      7922      0.00      0.00 688481356 1957928  11.75  lua
03:46:18 PM  1000      7922      0.00      0.00 688582988 2059560  12.36  lua
03:46:28 PM  1000      7922      0.00      0.00 688683936 2160508  12.97  lua
03:46:38 PM  1000      7922      0.00      0.00 688784360 2260932  13.57  lua
03:46:48 PM  1000      7922      0.00      0.00 688869220 2345832  14.08  lua
03:46:58 PM  1000      7922      0.00      0.00 688930708 2407320  14.45  lua
03:47:08 PM  1000      7922      0.00      0.00 688981128 2457700  14.75  lua
03:47:18 PM  1000      7922      0.00      0.00 689077768 2554380  15.33  lua
03:47:28 PM  1000      7922      0.00      0.00 689132400 2608972  15.66  lua
03:47:38 PM  1000      7922      0.00      0.00 689234752 2711264  16.27  lua
03:47:48 PM  1000      7922      0.00      0.00 689282032 2758544  16.56  lua
03:47:58 PM  1000      7922      0.00      0.00 689379400 2855972  17.14  lua
03:48:08 PM  1000      7922      0.00      0.00 689427688 2904300  17.43  lua
03:48:18 PM  1000      7922      0.00      0.00 689506872 2983416  17.91  lua
03:48:28 PM  1000      7922      0.00      0.00 689575344 3051916  18.32  lua
03:48:38 PM  1000      7922      0.00      0.00 689618132 3094704  18.57  lua
03:48:48 PM  1000      7922      0.00      0.00 689721100 3197712  19.19  lua
03:48:58 PM  1000      7922      0.00      0.00 689769708 3246320  19.48  lua
03:49:08 PM  1000      7922      0.00      0.00 689866420 3343032  20.06  lua
03:49:18 PM  1000      7922      0.00      0.00 689915752 3392324  20.36  lua
03:49:28 PM  1000      7922      0.00      0.00 689963592 3440204  20.65  lua
03:49:38 PM  1000      7922      0.00      0.00 690019588 3496200  20.98  lua
03:49:48 PM  1000      7922      0.00      0.00 690113148 3589760  21.54  lua

Save/load compiled patterns

The RPL compiler is incremental. To compile a pattern, the compiler needs the AST for the pattern and an environment that maps pattern names to compiled pattern objects.

Without describing here how the pattern object will change when RPL modules are introduced, the following will remain true:

  • Most of the slots in the pattern object record can be saved and loaded as JSON
  • Two slots (peg and uncap) are Lua userdata holding lpeg patterns and we need a way to save these
  • The lpeg patterns used by Rosie should be leveraging only one Lua function: common.create_match; therefore, when saving the pattern object's lpeg userdata, we do not need to be able to save any Lua function; when we encounter common.create_match, we can save a marker of some kind, and then when reading from disk, we replace the marker with the function

When the module system is finished, it will be easier to test the saving/loading of compiled patterns. But while developing a save/load technique, something like the following can be used to verify that it is working correctly, showing that matches work before and after the save/load:

> e = lapi.new_engine()
> lapi.load_manifest(e, "$sys/MANIFEST")
true	table: 0x7fe4c634aa90	/Users/jjennings/Work/Dev/private/jones/rosie-pattern-language/MANIFEST
> for k,v in pairs(e.env["common.number"]) do print(k,v); end
original_ast	table: 0x7fe4c63d2330
ast	table: 0x7fe4c3d90480
name	choice
peg	userdata: 0x7fe4c4071e28
uncap	userdata: 0x7fe4c4000028
alias	false
raw	false
> 
> 
> -- save(e.env["common.number"], "filename")
> -- e.env["common.number"] = nil
> -- load(e.env["common.number"], "filename")
> 
> -- test that pattern here, and if working, test patterns that depend on common.number

The 'find' and 'findall' macros fail on tokenized sequences

Using the new file test/quick.txt, this is the desired output:

$ bin/rosie grep '"quick" ("brown" / "blue") "fox"' test/quick.txt 
the quick brown fox
the quick brown fox jumped over the lazy (but adorable) dog
the quick blue fox
the quick blue fox jumped over the sleeping (and adorable) dog
$

But the actual output is empty (no matches). The problem is that the input expression is a tokenized sequence (of words), and it is being passed to the findall macro as if it were a raw/untokenized sequence, as can be seen here in the last two lines of the grammar:

$ bin/rosie -o line expand 'find:("quick" ("brown" / "blue") "fox")'
Expression: 	find:("quick" ("brown" / "blue") "fox")
Parses as: 	find:("quick" ("brown" / "blue") "fox")
At top level: 	find:("quick" ("brown" / "blue") "fox")
Expands to: 	
grammar
	alias find = {<search> <anonymous>}
	alias <search> = {!{~ {"quick" {"brown" / "blue"} "fox"} ~} .}*
	<anonymous> = {{~ {"quick" {"brown" / "blue"} "fox"} ~}}
end
$

running a parallelized make (make -jN) fails

% make -j 10
git submodule init
git submodule init
Creating /home/cjashfor/rosie-pattern-language/bin/rosie
Creating /home/cjashfor/rosie-pattern-language/rosie.lua
error: could not lock config file .git/config: File exists
fatal: Failed to register url for submodule path 'submodules/argparse'
Submodule 'submodules/argparse' (https://github.com/mpeterv/argparse.git) registered for path 'submodules/argparse'
Makefile:115: recipe for target 'submodules/argparse/src/argparse.lua' failed
make: *** [submodules/argparse/src/argparse.lua] Error 128
make: *** Waiting for unfinished jobs....
Submodule 'submodules/lua-readline' (https://github.com/jamiejennings/lua-readline.git) registered for path 'submodules/lua-readline'
git submodule update
Cloning into '/home/cjashfor/rosie-pattern-language/submodules/argparse'...
READLINE TEST: libreadline and readline.h appear to be installed
Cloning into '/home/cjashfor/rosie-pattern-language/submodules/lua-readline'...
Submodule path 'submodules/argparse': checked out 'a40458fdc1507e44b6a829b6c6b969b500e1c337'
Submodule path 'submodules/lua-cjson': checked out 'fac2934964e5aaed298daa105e5664d3311edf07'
Submodule path 'submodules/lua-readline': checked out '4fdbf5cc2a58d0442770f52b1bed174805737f9c'
make: *** wait: No child processes. Stop.

This is easy to work around.. just drop the -j option. This is a low priority bug because Rosie doesn't take long to build without using a parallel build.

"make install" creates symlinks, does not copy files

In most open source projects, "make install" will copy all of the files needed into the system's root (or other selected location). Those files, once installed, must be capable of running there stand-alone without using any files from the original build location. The same goes for any doc files that are associated with the project.

The current code in rosie's Makefile simply creates a soft link to a file that was constructed in the build area. And some of the files in the build area have hard-coded paths to the build area.

This is also affecting RPM packaging, because the standard way RPM spec files are constructed is to rely on having the files installed into the root in a standard way.

The wrapper scripts should be changed so that they don't use hard-coded paths, if possible, or at least that their paths can be updated during the "make install" execution. Further, as stated above, the Makefile should be changed so that all needed files are copied to the installation root, so that they are completely independent of the build boot.

Enable REPL mode input history

Currently when in REPL mode you cannot press UP to repeat your previously entered line. This requires you to either copy/paste with a mouse or retype the entire thing.

I have looked at src/core/repl.lua and noticed it's hardcoded io.stdin:read(). Since the readline (readline-devel) package is already required, it would be useful to exploit this for REPL mode if we can.

Question on calling v1.0.0 librosie using C

Hi, Jamie,
I am trying to call v1.0.0 librosie with the following C Program:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "librosie.h"
#define ROSIE_HOME "/Users/syau/dev/log-analytics/rosie-pattern-language"

str new_rstr(char s){
str *p;
p=(str *)malloc(sizeof(str));
p->len=strlen(s);
p->ptr=(byte_ptr)s;
return p;
}

void free_rstr(str p){
/
zero memory first before freeing */
memset(p->ptr,0,p->len);
free(p);
}

int main() {
int rc;
str errors;
errors=new_rstr("ok");
/
initialize Rosie engine /
void engine=rosie_new(errors);
printf("
* Instantiate\n");
printf("Instantiate Rosie engine error:%s\n",errors->ptr);
/*
int rosie_import(void *L, int *ok, str *pkgname, str *as, str *errors);
int rosie_compile(void *L, str *expression, int *pat, str *errors);
int rosie_match(void *L, int pat, int start, char *encoder, str *input, match *match);
/
/
import package /
str pkgname;
pkgname=new_rstr("all");
str as;
as=new_rstr("");
int ok;
rc=rosie_import(engine, &ok, pkgname, as, errors);
printf("
Import\n");
printf("Import RPL error:%s\n",errors->ptr);

    str *expression;
    expression=new_rstr("\"all.things11\"");
    int pat=0;
    errors=new_rstr("ok2");
    rc=rosie_compile(engine, expression, &pat, errors);
    printf("*** Compile\n");
    printf("Compile pattern error:%s\n",errors->ptr);
    printf("pat:%i\n",pat);

    str *input;
    input=new_rstr("\"1234\"");
    match r;
    rc=rosie_match(engine, pat, 1, "json", input, &r);
    printf("*** Match\n");
    printf("match rc:%i\n",rc);
    printf("leftover:%i\n",r.leftover);
    printf("ttotal:%i\n",r.ttotal);
    printf("tmatch:%i\n",r.tmatch);
    printf("data length:%i\n",r.data.len);
    printf("data%c\n",*(r.data.ptr));

}

I compile with the following:
/Applications/Xcode.app/Contents/Developer/usr/bin/make librosie.so SYSCFLAGS="-DLUA_USE_MACOSX" SYSLDFLAGS="-dynamiclib" CC=clang
clang -o librosie.o -c librosie.c -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_MACOSX -I/Users/syau/dev/log-analytics/rosie-pattern-language/submodules/lua/include
clang -o crosie crosie.c -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_MACOSX -std=gnu99 -I${ROSIE_HOME}/submodules/lua/include -L. -lrosie

When I run the program, I got:
Kwans-MBP:librosie syau$ ./crosie
*** Instantiate
Instantiate Rosie engine error:ok
*** Import
Import RPL error:{}
*** Compile
Compile pattern error:{}
pat:1
*** Match
match rc:0
leftover:6
ttotal:3
tmatch:3
data length:0
Segmentation fault: 11

It looks like all but the rosie_match() call had no error. I don't know what am I missing or done wrong here. Appreciate your advice. Many thanks.
-Steve
PS- I am just referring to the ffi.cdef in rosie.py and test.py to arrive at the above C program.

Lightweight test capability should look for `syntax_error`

The rpl that defines rpl includes a definition for syntax_error. A match can succeed and have within the returned data structure a syntax_error node.

I expect this to be a common idiom among Rosie users (to define some kind of error pattern).

Wrt running tests: when a match succeeds, but the match contains an error node, this is a 3rd kind of result. I.e. we could have "match failed", "match succeeded with no error nodes", and "match succeeded with one or more error nodes in the output".

Design questions:

  1. Which error nodes should the test system look for? Just syntax_error? (Leaning to YES.)
  2. Should the names of error nodes be configurable? If so, then how? (Prob should defer this.)
  3. Should a new verb be introduced for the detection of syntax_error? (Leaning to YES and perhaps it should be 'errors' or 'errors_on', as in test language_decl errors_on "rpl 1.a".

New RPL operator for "look behind"

There's a restriction in lpeg that prevents the lpeg "look behind" operator, lpeg.B will not accept a pattern that has captures (nor patterns that lack a fixed length).

To implement a "look behind" operator in RPL, then, the compiler must strip out all the captures from the argument of the operator. And we must check for fixed length (or let lpeg do it for us).

Related: Once we have a compiler function that takes a pattern and produces a new pattern that is identical except that the new one captures nothing, we may want to expose this in RPL. It would be exposed as an operator that strips all captures out of a pattern, returning a new pattern. (Note that this is different from an alias, which merely does not add a new capture when defining an identifier.)

Can't use "look behind" on patterns that have captures.  (Found this while reading the lpeg code.)  E.g.
	> repl(e)
	Rosie> foo = "xyz"
	Rosie> .match foo "xyz"
	{"foo": 
	   {"text": "xyz", 
	    "pos": 1.0}}
	Rosie> 
	Exiting
	> foo = e.env["foo"].peg
	> (lpeg.P(3)*lpeg.B(foo)):match("xyz")
	stdin:1: bad argument #1 to 'B' (pattern have captures)
	stack traceback:
		[C]: in function 'lpeg.B'
		stdin:1: in main chunk
		[C]: in ?
	> 

README.md Typo

Within the "Building" section there is a typo:
instalaltion

Change to:
installation

Re-organize the standard library

The pattern files that ship with rosie in rpl/*.rpl have accumulated some crud. The module (file) names themselves are not all mnemonic or appropriate, and the contents needs reorganization.

  1. Better module names. Maybe basic should become col for "collection"?
  2. Better pattern names. Let's not use "pattern" in a pattern name. (my bad!)
  3. Edit comments that are no longer needed or no longer apply.
  4. Ideally, we should have some simple test cases included. Any that are already present in the form of comments should be converted to the automated format.

Rosie plugin for Logstash (ELK stack)

Rosie would be very useful for parsing log files, which is what Grok does today in ELK. A Rosie plugin could be used in addition to or instead of Grok.

Logstash plugins are written in Ruby and packaged as Ruby gems.

Windows Build Infrastructure and Support

I see build support using Linux (including Ubuntu) and make is supported.

  1. Is Windows expected as an eventual development environment?
  2. Would support and documentation of development under Windows be a significant added value for the project as a whole, possibly as making the open aspect more accessible?

Note: This is essentially a feature request I expect as an FAQ item. I personally have a Linux machine I can use instead of my main Windows machine. I may also be able to help with adding this functionality/documentation if it represents a large value added. A wait and see approach might be best unless this is already on a list of needed project enhancements.

Case-insensitive mode

Hi!

I'm looking for case-insensitive mode for string literals like (?i: string) or /string/i in other regexes. Is it possible to use this mode in RPL? Or should I combine list character classes like [Ss][Tt] etc.?

Thanks!

Problem with RosieL_match_file() call from Python

Hi, Jamie,
I tried calling RosieL_match_file() from Python but did not get the expected result.
Here's what I did:

  1. Added the following to rosie-pattern-language/ffi/samples/python/rosie.py:
    def match_file(self, input, output, error, wholefileflag):
    return self._match_file(input, output, error, wholefileflag, self.rosie.rosie.rosieL_match_file)

    def _match_file(self, input, output, error, wholefileflag, operation):
    input_file_string = to_cstr_ptr(self.rosie.rosie, input)
    output_file_string = to_cstr_ptr(self.rosie.rosie, output)
    error_file_string = to_cstr_ptr(self.rosie.rosie, error)
    flag_string = to_cstr_ptr(self.rosie.rosie, wholefileflag)
    r = operation(self.engine, input_file_string, output_file_string, error_file_string, flag_string);
    retvals = self.rosie.get_retvals(r)
    return retvals

  2. In my python program "test.py" , I called match_file():
    import os, json, sys
    import rosie
    ........
    Rosie = rosie.initialize(ROSIE_HOME, ROSIE_HOME + "/ffi/librosie/librosie.so")
    engine = Rosie.engine()
    r = engine.load_manifest("$sys/MANIFEST")
    config = json.dumps( {'expression': 'linuxsys.matchall', 'encode':'json'} )
    r = engine.configure(config)
    r = engine.match_file("testfile", "", "std.err", "true")

"testfile" has two lines:
Aug 20 18:18:10 beatrice-desktop dhclient[1354]: DHCPREQUEST of 192.168.8.14 on enp0s31f6 to 192.168.8.1 port 67 (xid=0x2830f2c)
Aug 20 20:14:09 beatrice-desktop dhclient[1354]: DHCPACK of 192.168.8.14 from 192.168.8.1

When I execute test.py, it seems to match only the first line and then stop. i.e. it didn't iterate through the rest of the file:
$ python test.py
{"linuxsys.matchall":{"subs":[{"linuxsys.log36":{"subs":[{"linuxsys.prefix":{"subs":[{"basic.datetime_patterns":{"subs":[{"datetime.shortdate":{"text":"Aug 20 ","pos":1}}],"text":"Aug 20 ","pos":1}},{"basic.datetime_patterns":{"subs":[{"datetime.simple_time":{"text":"18:18:10 ","pos":8}}],"text":"18:18:10 ","pos":8}},{"common.identifier_not_word":{"text":"beatrice-desktop","pos":17}}],"text":"Aug 20 18:18:10 beatrice-desktop","pos":1}},{"linuxsys.l5":{"text":"dhclient","pos":34}},{"basic.punctuation":{"text":"[","pos":42}},{"common.number":{"subs":[{"common.int":{"text":"1354","pos":43}}],"text":"1354","pos":43}},{"basic.punctuation":{"text":"]","pos":47}},{"basic.punctuation":{"text":":","pos":48}},{"common.maybe_identifier":{"text":"DHCPREQUEST","pos":50}},{"linuxsys.l6":{"text":"of","pos":62}},{"basic.network_patterns":{"subs":[{"network.ip_address":{"text":"192.168.8.14","pos":65}}],"text":"192.168.8.14","pos":65}},{"linuxsys.l7":{"text":"on","pos":78}},{"common.identifier_not_word":{"text":"enp0s31f6","pos":81}},{"linuxsys.l8":{"text":"to","pos":91}},{"basic.network_patterns":{"subs":[{"network.ip_address":{"text":"192.168.8.1","pos":94}}],"text":"192.168.8.1","pos":94}},{"linuxsys.l9":{"text":"port","pos":106}},{"common.number":{"subs":[{"common.int":{"text":"67","pos":111}}],"text":"67","pos":111}},{"basic.punctuation":{"text":"(","pos":114}},{"linuxsys.l10":{"text":"xid","pos":115}},{"basic.punctuation":{"text":"=","pos":118}},{"common.number":{"subs":[{"common.denoted_hex":{"subs":[{"common.hex":{"text":"2830f2c","pos":121}}],"text":"0x2830f2c","pos":119}}],"text":"0x2830f2c","pos":119}},{"basic.punctuation":{"text":")","pos":128}}],"text":"Aug 20 18:18:10 beatrice-desktop dhclient[1354]: DHCPREQUEST of 192.168.8.14 on enp0s31f6 to 192.168.8.1 port 67 (xid=0x2830f2c)","pos":1}}],"text":"Aug 20 18:18:10 beatrice-desktop dhclient[1354]: DHCPREQUEST of 192.168.8.14 on enp0s31f6 to 192.168.8.1 port 67 (xid=0x2830f2c)","pos":1}}
Garbage collecting engine 118c5a0

When I use the CLI by "rosie linuxsys.matchall testfile", it matches all the lines:
$ rosie linuxsys.matchall testfile
Aug 20 18:18:10 beatrice-desktop dhclient [ 1354 ] : DHCPREQUEST of 192.168.8.14 on enp0s31f6 to 192.168.8.1 port 67 ( xid = 2830f2c )
Aug 20 20:14:09 beatrice-desktop dhclient [ 1354 ] : DHCPACK of 192.168.8.14 from 192.168.8.1

I expected when I set "wholefileflag" to true in r = engine.match_file("testfile", "", "std.err", "true"), it would iterate through all the lines of "testfile". Somehow, it didn't.

I can't figure out what I have missed. Appreciate your advice. Many thanks!

Readline support is a little wonky

.match command does not enter the history
.patterns does not search for when it is a number
tab completes file names, but at the repl prompt it should complete commands
tab could complete pattern names after a command is typed

Lua api (single Lua state)

Create a rosie interface for Lua, where Rosie runs in the same Lua instance as the user program. E.g.

r = require("rosie")
r.initialize(rosie_home)
e = r.new_engine()
...

Color output re-implementation

The current color output code is a proof-of-concept that has not been rewritten. Some thoughts:

  • Output encoders should all be functions that take a Lua table and return a string.
  • An output encoder needs to be able to log errors and warnings (while still returning a string).
  • The color output encoder should rely on a mapping of pattern labels to colors, where a pattern label is the name that appears in the output tree (Lua table).
  • There should be CRUD operations for the color map (which could be done by having the user edit a config file).
  • One way to structure the generation of color output is to walk the output tree (Lua table) depth first, looking up each pattern label (there is one at each node) in the color map. When a color is found, no further descent in that subtree is needed.

Current production deployments?

Hi, we are looking at evaluating Rosie for a current need that will require running in a production environment. We would prefer not to be on the bleeding edge if possible. Would you be able to provide a few examples of production deployments?

Thanks!

Add a "source reference" to the Rosie output

IF THIS PROPOSAL IS TO BE IMPLEMENTED AFTER VERSION 1.0, then the change in output format should be made in Version 1.0. (I.e. pos should be replaced with src, a table that contains pos and nothing else. The other fields in src can appear in Version 1.x.)

Design ideas:

  • A source reference is a triple: name, line, position
  • It indicates where a match was found (today there is only position which is in the pos field)
  • The caller to librosie should be able to set name to be any string (default null and does not appear in Rosie output)
  • Rosie should increment the line counter each time match is called
  • The caller to librosie should be able to reset the line counter (to any reasonable integer)
  • The Rosie CLI should set name to the current input data file name (or null for stdin)
  • The -wholefile flag of the CLI requires no special treatment (the line will be 1 for all matches)
  • The Rosie output format should be restructured, replacing pos with src; the src field is a table containing the source reference
  • When Rosie prints out a source reference for human consumption, it should follow the (de facto) standard format of name:line:position

Reorganize core code into proper Lua modules

The current code evolved organically without strict use of Lua modules. Strict module usage has the form foo = require("foo") where this statement does not introduce any new globals.

Moving to more strict Lua module usage will require reorganizing some of the core code, giving a more clear dependency picture.

Eval function output as JSON

It would be great if the eval function would output as JSON so the failure's explanation can be stepped through programmatically.

Test a multi-threaded client of librosie

librosie was designed to be used with pthreads where each thread will initialize its own instance of Rosie using initialize().

This capability needs to be tested.

Note: Check to see if cjson will cause a problem. Do we need to use cjson.safe and cjson.new?

A useful test would ensure the independence of the engines and be invokable from the command line so it can be used in the Rosie automated test suite. It should return an exit status of 0 for success, and other values for failures.

Lightweight pattern testing facility

Pattern writers need to test their patterns and it makes sense to automate this. I propose starting with a lightweight testing capability that checks patterns against sample input and looks for only accepts and rejects outcomes. (In future, more information about a match could be tested.)

Design ideas:

  • Let the pattern writer put tests into their rpl files as comments (see below)
  • Implement a rosie test <filename> command that parses the tests out of an rpl file and runs them
  • When reporting results, a test should be considered a failure if there were leftover characters
  • Many features could be added, but at this stage it is more important to keep it very simple

E.g. rpl file snippet:

common.word = [:alpha:]+
— common.word accepts “foo”
— common.word rejects “12356”,  "  ", "#!"

Display of an identifier's binding should be inverse of read operation

When an engine environment is queried (with an identifier), the engine returns information about the binding (if any) to that identifier.

One of the attributes returned is the value to which the identifier is bound. The value, an RPL pattern, is given in human readable form using the parse.reveal functions.

First, there are some bugs in parse.reveal, which reconstructs the RPL by walking the parse tree that generated the pattern. Since this output was meant for humans only, and is not validated in any way, errors like this can occur: (note the parens where there should be braces {})

Rosie> foo = [:digit:]+
Rosie> foo
assignment foo = ([[:digit:]])+
Rosie> 

Second, it is very valuable to have a function that is the inverse of read. The read operation accepts RPL input and generates an AST. This is currently an internal operation -- it is part of the load capability of an engine, which reads RPL input and parses it, then compiles it.

The (new in v1-tranche-1) lookup function of engines produces the output shown in the repl snippet above by calling parse.reveal. As shown, not only is the output is sometimes incorrect, but it's not even valid RPL. (E.g. the word "assignment" in the output shown above.)

The parse.reveal functions should return valid RPL which, if parsed, should generate an equivalent AST. (It may not be an identical AST because the original AST contains meta-data about the original RPL that generated it, and this meta-data is going to be different in the reconstruction.)

To sum up: The "binding" output of lookup (which also returns the assigned color, etc.) should be valid RPL that could be loaded into another Rosie to produce an equivalent pattern (assuming dependencies are loaded into that other Rosie, of course).

backtrack stack overflow (current limit is 400)

Hi, Jamie,
I encountered the following error when loading MANIFEST:
Traceback (most recent call last):
File "/opt/log-analytics/bin/mytest", line 33, in
extract()
File "/opt/log-analytics/bin/mytest", line 31, in extract
r = engine.load_manifest("$sys/vmware.MANIFEST")
File "/opt/log-analytics/module/rosie.py", line 154, in load_manifest
retvals = self.rosie.get_retvals(r)
File "/opt/log-analytics/module/rosie.py", line 102, in get_retvals
return self._get_retvals(messages, self.rosie.rosieL_free_stringArray)
File "/opt/log-analytics/module/rosie.py", line 113, in _get_retvals
raise RuntimeError(retvals[1]) # exception indicating that the call failed
RuntimeError: src/core/engine.lua:121: backtrack stack overflow (current limit is 400)

It seems this happens when I have a large number of patterns defined in a number of rpl files.
Could you advise whether there is a way to workaround this?
Appreciate your advice. Many thanks.
-Steve

RPL file extension conflicts with RPL/2 language

I noticed while looking at and writing new expressions that Vim had a syntax highlighting already made for the rpl extension. This brought me to find the Reverse Polish Lisp/2 language. Their files use the rpl extension. I'd like to suggest changing the file extensions to something not conflicting so that any editor that already has syntax coloring for RPL/2 doesn't highlight incorrectly.

Possible extensions:

  • .rosie
  • .rgd (Rosie Grammar Definition[s])
  • .rpd (Rosie Pattern Definition[s])

url pattern is incomplete

The provided url pattern covers very simple URL's, but doesn't handle the username, password, query, and fragment fields. See https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax

Ideally, the pattern could breakdown the query into individual attribute/value pairs in a list. Maybe not all URL's follow this query format, so there could be a fallback "just-a-long-string" for the query portion if it doesn't match the typical encoding pattern.

Lua interface that mirrors the interface for other languages

The ffi interface to Rosie allows multi-threaded programs in languages like Go, Python, js, and more to use Rosie. Each thread gets its own Rosie instance (they are small), so they won't interfere with each other, and each Rosie could be configured differently if needed.

This request is to create a Lua api that works just like the ffi-based api. Each Rosie instance will have its own Lua state, and the user program will be in its own Lua state, possibly running a different version of Lua (or lpeg, etc.).

r = require("rosie")
e = new_engine(rosie_home)
...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.