jamiejennings / rosie-pattern-language Goto Github PK
View Code? Open in Web Editor NEWRosie Pattern Language (RPL) and the Rosie Pattern Engine have MOVED!
Home Page: https://gitlab.com/rosie-pattern-language/rosie
Rosie Pattern Language (RPL) and the Rosie Pattern Engine have MOVED!
Home Page: https://gitlab.com/rosie-pattern-language/rosie
Separate the RPL library definitions into a collection of namespaces for easier use and clarity of identifiers.
Rework the current rpl files into descriptive filenames and identifiers with namespace prefixes.
Namespaces are to be short names of high level collections.
Currently we have the files:
basic.rpl
common.rpl
csv1.rpl
csv.rpl
datetime.rpl
json.rpl
network.rpl
rfc3986.rpl (URI-exact matching over generic network.rpl)
spark.rpl
along with a few others (grep.rpl, language-comments.rpl, rosie.rpl, rpl-1.0.rpl, rpl-1.1.rpl
). These all define a variety of identifiers with and without namespace prefixes.
These files will be reworked into namespaces such as: net
, date
, time
, num
, str
, etc.
network.rpl
will be refined into the net
namespace.rfc3986.rpl
will be merged under the net
namespace.datetime.rpl
will be split into date
and time
namespaces, date
will contain the "datetime" matching types.common.rpl
will be split into num
, str
, and whatever else namespaces.csv[1].rpl
, json.rpl
, spark.rpl
can be put under a lang
namespace to further separate identifiers.lang.[name]
namespace, such as lang.c
, lang.json
, lang.spark
, etc.core This is global namespace
net
str
num
os
lang
date
time
datetime Future implementation dependent on import
In order to make this package more available to developers and people putting together containers, for example, it would be very helpful to have RPL packaged up in at least a source package format (SRPM for rpm-style packages -- I'm not sure what the equivalent is for Debian).
Once the source RPM is available, Linux distros can start making binary builds available through their own repositories, and until then, binary RPM's can be built and maintained in community sites.
Personally, I look forward to having Rosie Pattern Language as an RPM for Fedora, so that it's super easy to install and use.
From https://github.com/jamiejennings/rosie-pattern-language/blob/3d85fe5350bb6caeb802b8b778d9273871a9f44d/doc/raisondetre.md, it is said Rosie can be instructed to do transformations/normalizations
"In addition to simply recognizing items of interest in the input, Rosie can be instructed to transform values, enumerate values, and perform other operations while processing each line. Normalization is one kind of transformation, e.g. timestamps in various formats may be normalized to an integer number of milliseconds since the epoch. Sanitizing, such as encryption of sensitive fields, is another kind of transformation."
For example, how to get timestamp in millis from Rosie? Could some one please provide an example?
Can that be achieved using rpl commands or do we have to write extensions in lua?
Thank you
Two people who attended my talk at All Things Open asked about a Brew package to make it easier to install Rosie on OS X.
Today, RPL identifiers can contain only posix [:alnum:]
) and underscore. Expand the set of characters that can be used in identifiers to be the Unicode general character property called Letters (L), plus underscore.
Add these to RPL:
package
declarationimport
declarationexport
declarationAnd reorganize the rpl
directory to reflect a logical organization of different kinds of patterns.
I believe using --option to denote a multi-character option would be more synonymous to current standards seen everywhere rather than the -option currently.
Added benefit would be the ability to use single character options for reduced typing, such as -h instead of -help. Many people will start by typing either -h or --help (like I did) only to see it not actually print help, but an error for invalid command line option.
Suggested arguments:
On a somewhat similar but separate note, much of the command line option parsing could be removed in favor of lua-getopt or, what I'd suggest to help with the above, argparse.
EDIT: argparse can be found here: https://github.com/mpeterv/argparse
lua-getopt can be found here: https://github.com/2ion/lua-getopt
Since Windows is not officially supported, this could be seen as an enhancement request, but it's really a bug. It should not matter whether the line endings are lf or crlf.
The CLI could support accepting Rosie JSON as input, with one command to colorize it, and another command to pretty-print it.
Use case: Run rosie, generate JSON output, filter the JSON using any tool at all, then want to examine the results. These results are JSON objects, 1 object per line, and hard to read. (That's why Rosie colorizes output, to make it easy to see matches.)
Solution: Expose the rosie colorizing feature such that it can take rosie JSON as input. Similarly, expose the rjsonpp pretty-printer (which is not part of rosie today) as a rosie CLI feature.
Attempting to do:
$ rosie -e 'lua_ident = {[[:alpha:]] / "_" / "." / ":"}+' -patterns
/usr/local/share/rosie/bin/lua: (command line):1: unexpected symbol near '}'
Turning on bash tracing:
$ rosie -e 'lua_ident = {[[:alpha:]] / "_" / "." / ":"}+' -patterns
+ [[ -z /usr/local/share/rosie ]]
+ [[ -n '' ]]
+ home=/usr/local/share/rosie
+ ROSIE_SCRIPT_HOME=/usr/local/share/rosie
+ shift
+ executable=/usr/local/share/rosie/bin/lua
++ ps -o args= 1892
+ cmd='bash /usr/local/bin/rosie -e lua_ident = {[[:alpha:]] / "_" / "." / ":"}+ -patterns'
+ [[ ! -d /usr/local/share/rosie ]]
+ [[ ! -x /usr/local/share/rosie/bin/lua ]]
+ i=
+ dev=false
+ [[ -e = \-\D ]]
+ export ROSIE_HOME
+ export ROSIE_SCRIPT_HOME
+ export HOSTNAME
+ export HOSTTYPE
+ export OSTYPE
++ pwd
+ export CWD=/home/klzander/git/veratil-rosie-pattern-language
+ CWD=/home/klzander/git/veratil-rosie-pattern-language
+ /usr/local/share/rosie/bin/lua -e 'ROSIE_HOME=[[/usr/local/share/rosie]]; SCRIPTNAME=[[bash /usr/local/bin/rosie -e lua_ident = {[[:alpha:]] / "_" / "." / ":"}+ -patterns]]; ROSIE_DEV=[[false]];' -- /usr/local/share/rosie/src/run.lua -e 'lua_ident = {[[:alpha:]] / "_" / "." / ":"}+' -patterns
/usr/local/share/rosie/bin/lua: (command line):1: unexpected symbol near '}'
You can see in SCRIPTNAME variable that the ]]
in [[:alpha:]]
conflicts with the actual long string end token.
Replacing [[:alpha:]]
with letter
makes the bug go away:
$ rosie -e 'lua_ident = {letter / "_" / "." / ":"}+' -patterns
---snip---
211 patterns
Ubuntu 14.04.4 LTS
clone git repo
~/Documents/Rosie/rosie-pattern-language$ make linux
...
gcc -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_LINUX -c -o loadlib.o loadlib.c
gcc -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_LINUX -c -o linit.o linit.c
ar rcu liblua.a lapi.o lcode.o lctype.o ldebug.o ldo.o ldump.o lfunc.o lgc.o llex.o lmem.o lobject.o lopcodes.o lparser.o lstate.o lstring.o ltable.o ltm.o lundump.o lvm.o lzio.o lauxlib.o lbaselib.o lbitlib.o lcorolib.o ldblib.o liolib.o lmathlib.o loslib.o lstrlib.o ltablib.o lutf8lib.o loadlib.o linit.o
ranlib liblua.a
gcc -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_LINUX -c -o lua.o lua.c
lua.c:80:31: fatal error: readline/readline.h: No such file or directory
#include <readline/readline.h>
^
compilation terminated.
make[3]: *** [lua.o] Error 1
...
Haven't done any troubleshooting yet or looked for a reference to the missing file.
Description: This case may be rare in practice, but there's an example in the RPL patterns that are part of the Rosie distribution: json.json_discard
. The bug occurs only when the match succeeds.
Versions: This bug is present in v0.99i and appears to go all the way back to v0.99a. It was likely the result of a significant change to the compiler that occurred just prior to that.
Work around: Capture something! (E.g. the json
pattern works, and it is defined as a capturing version of the alias json_discard
.)
The bug looks like this:
jjennings$ rosie json.json_discard /tmp/test.json
/Users/jjennings/Work/Dev/public/rosie-pattern-language/bin/lua: src/core/common.lua:233: bad argument #1 to 'next' (table expected, got number)
stack traceback:
[C]: in function 'next'
src/core/common.lua:233: in function 'common.decode_match'
src/core/color-output.lua:164: in function 'color_string_from_leaf_nodes'
src/core/engine.lua:177: in function <src/core/engine.lua:146>
(...tail calls...)
...nings/Work/Dev/public/rosie-pattern-language/src/run.lua:303: in function 'process_pattern_against_file'
...nings/Work/Dev/public/rosie-pattern-language/src/run.lua:354: in function 'run'
...nings/Work/Dev/public/rosie-pattern-language/src/run.lua:371: in main chunk
[C]: in ?
It would be nice if the color option still displayed the whole line so you knew where in the line something was found.
for example, the date.any pattern will match this line
server_ip = 192.1.20.255
and with -o color specified displays
2 1 20
Trivial example, but obviously if the line were more complex and still had a match, it might be hard to figure out what was hit. This would be a better output
server_ip = 192.1.20.255
(where those bold numbers would be colored)
A useful tool would be a lint-like program that looks for probable mistakes and violations of best practices.
The linter could start very simple, without any configuration options at all. Some things it might do:
[^:space:]
does not find chars that are not whitespace ([:^space:]
does)[abc0-9]
does not include 1, 2, ... 8. Use [[abc][0-9]]
.Reading the librosie.c code, I think we have a memory leak by not free-ing "name" variable. I've added a fix but since I am not a C expert, I am not sure.
static int bootstrap (lua_State *L, struct rosieL_string *rosie_home) {
const char *bootscript = "/src/core/bootstrap.lua";
LOG("About to bootstrap\n");
char *name = malloc(rosie_home->len + strlen(bootscript) + 1);
memcpy(name, rosie_home->ptr, rosie_home->len);
memcpy(name+(rosie_home->len), bootscript, strlen(bootscript)+1); /* +1 copies the NULL terminator */
int status = luaL_loadfile(L, name);
if (status != LUA_OK) {
free(name); // new code
return status;
}
status = lua_pcall(L, 0, LUA_MULTRET, 0);
free(name); // new code
return status;
}
Jamie: Thought I'd work with your nodejs sample and give you back server capable version. However your initial test version fails on line 43 where you attempt to load librosie.so. I cannot find that file on my system, nor does it appear to be created during the make process. Any ideas?
Hi,
I am running a bash loop for rosie -repl and seeing the RESIDENT memory and CPU for lua increasing until the program ends. I am noticing similar constant uptick while running librosie via java.
Any reason why this could be happening ?
me@MYMACHINE:~/git/rosie-testing/rosie$ cat rosietest.sh
#!/bin/bash
read -d '' CMD1 <<'ENDCMD1'
foo = [:digit:]+ "!!"
ENDCMD1
read -d '' CMD2 <<'ENDCMD2'
.match foo "42!!"
ENDCMD2
for number in {1..10000}; do
if [[ $number -eq 1 ]]; then
echo "$CMD1"
else
echo "$CMD2"
fi
done | rosie -repl
me@MYMACHINE:~/git/rosie-testing/rosie$ time ./rosietest.sh > /dev/null &
[1] 7893
me@MYMACHINE:~/git/rosie-testing/rosie$ This is Rosie v0.99k
me@MYMACHINE:~$ ps -ef | grep rosie | grep lua
me 7922 7919 0 2433 ? 00:00:06 /home/me/git/rosie-pattern-language/bin/lua -e ROSIE_HOME=[[/home/me/git/rosie-pattern-language]]; SCRIPTNAME=[====[bash /usr/local/bin/rosie -repl]====]; ROSIE_DEV=[[false]]; -- /home/me/git/rosie-pattern-language/src/run.lua -repl
me@MYMACHINE:~$ pidstat -p 7922 -r 10
03:44:48 PM UID PID minflt/s majflt/s VSZ RSS %MEM Command
03:44:58 PM 1000 7922 0.00 0.00 687657100 1133672 6.80 lua
03:45:08 PM 1000 7922 0.00 0.00 687819208 1295780 7.78 lua
03:45:18 PM 1000 7922 0.00 0.00 687976016 1452588 8.72 lua
03:45:28 PM 1000 7922 0.00 0.00 688079324 1555892 9.34 lua
03:45:38 PM 1000 7922 0.00 0.00 688178776 1655348 9.93 lua
03:45:48 PM 1000 7922 0.00 0.00 688283016 1759588 10.56 lua
03:45:58 PM 1000 7922 0.00 0.00 688380724 1857296 11.15 lua
03:46:08 PM 1000 7922 0.00 0.00 688481356 1957928 11.75 lua
03:46:18 PM 1000 7922 0.00 0.00 688582988 2059560 12.36 lua
03:46:28 PM 1000 7922 0.00 0.00 688683936 2160508 12.97 lua
03:46:38 PM 1000 7922 0.00 0.00 688784360 2260932 13.57 lua
03:46:48 PM 1000 7922 0.00 0.00 688869220 2345832 14.08 lua
03:46:58 PM 1000 7922 0.00 0.00 688930708 2407320 14.45 lua
03:47:08 PM 1000 7922 0.00 0.00 688981128 2457700 14.75 lua
03:47:18 PM 1000 7922 0.00 0.00 689077768 2554380 15.33 lua
03:47:28 PM 1000 7922 0.00 0.00 689132400 2608972 15.66 lua
03:47:38 PM 1000 7922 0.00 0.00 689234752 2711264 16.27 lua
03:47:48 PM 1000 7922 0.00 0.00 689282032 2758544 16.56 lua
03:47:58 PM 1000 7922 0.00 0.00 689379400 2855972 17.14 lua
03:48:08 PM 1000 7922 0.00 0.00 689427688 2904300 17.43 lua
03:48:18 PM 1000 7922 0.00 0.00 689506872 2983416 17.91 lua
03:48:28 PM 1000 7922 0.00 0.00 689575344 3051916 18.32 lua
03:48:38 PM 1000 7922 0.00 0.00 689618132 3094704 18.57 lua
03:48:48 PM 1000 7922 0.00 0.00 689721100 3197712 19.19 lua
03:48:58 PM 1000 7922 0.00 0.00 689769708 3246320 19.48 lua
03:49:08 PM 1000 7922 0.00 0.00 689866420 3343032 20.06 lua
03:49:18 PM 1000 7922 0.00 0.00 689915752 3392324 20.36 lua
03:49:28 PM 1000 7922 0.00 0.00 689963592 3440204 20.65 lua
03:49:38 PM 1000 7922 0.00 0.00 690019588 3496200 20.98 lua
03:49:48 PM 1000 7922 0.00 0.00 690113148 3589760 21.54 lua
The RPL compiler is incremental. To compile a pattern, the compiler needs the AST for the pattern and an environment that maps pattern names to compiled pattern objects.
Without describing here how the pattern object will change when RPL modules are introduced, the following will remain true:
common.create_match
; therefore, when saving the pattern object's lpeg userdata, we do not need to be able to save any Lua function; when we encounter common.create_match
, we can save a marker of some kind, and then when reading from disk, we replace the marker with the functionWhen the module system is finished, it will be easier to test the saving/loading of compiled patterns. But while developing a save/load technique, something like the following can be used to verify that it is working correctly, showing that matches work before and after the save/load:
> e = lapi.new_engine()
> lapi.load_manifest(e, "$sys/MANIFEST")
true table: 0x7fe4c634aa90 /Users/jjennings/Work/Dev/private/jones/rosie-pattern-language/MANIFEST
> for k,v in pairs(e.env["common.number"]) do print(k,v); end
original_ast table: 0x7fe4c63d2330
ast table: 0x7fe4c3d90480
name choice
peg userdata: 0x7fe4c4071e28
uncap userdata: 0x7fe4c4000028
alias false
raw false
>
>
> -- save(e.env["common.number"], "filename")
> -- e.env["common.number"] = nil
> -- load(e.env["common.number"], "filename")
>
> -- test that pattern here, and if working, test patterns that depend on common.number
Using the new file test/quick.txt
, this is the desired output:
$ bin/rosie grep '"quick" ("brown" / "blue") "fox"' test/quick.txt
the quick brown fox
the quick brown fox jumped over the lazy (but adorable) dog
the quick blue fox
the quick blue fox jumped over the sleeping (and adorable) dog
$
But the actual output is empty (no matches). The problem is that the input expression is a tokenized sequence (of words), and it is being passed to the findall
macro as if it were a raw/untokenized sequence, as can be seen here in the last two lines of the grammar:
$ bin/rosie -o line expand 'find:("quick" ("brown" / "blue") "fox")'
Expression: find:("quick" ("brown" / "blue") "fox")
Parses as: find:("quick" ("brown" / "blue") "fox")
At top level: find:("quick" ("brown" / "blue") "fox")
Expands to:
grammar
alias find = {<search> <anonymous>}
alias <search> = {!{~ {"quick" {"brown" / "blue"} "fox"} ~} .}*
<anonymous> = {{~ {"quick" {"brown" / "blue"} "fox"} ~}}
end
$
% make -j 10
git submodule init
git submodule init
Creating /home/cjashfor/rosie-pattern-language/bin/rosie
Creating /home/cjashfor/rosie-pattern-language/rosie.lua
error: could not lock config file .git/config: File exists
fatal: Failed to register url for submodule path 'submodules/argparse'
Submodule 'submodules/argparse' (https://github.com/mpeterv/argparse.git) registered for path 'submodules/argparse'
Makefile:115: recipe for target 'submodules/argparse/src/argparse.lua' failed
make: *** [submodules/argparse/src/argparse.lua] Error 128
make: *** Waiting for unfinished jobs....
Submodule 'submodules/lua-readline' (https://github.com/jamiejennings/lua-readline.git) registered for path 'submodules/lua-readline'
git submodule update
Cloning into '/home/cjashfor/rosie-pattern-language/submodules/argparse'...
READLINE TEST: libreadline and readline.h appear to be installed
Cloning into '/home/cjashfor/rosie-pattern-language/submodules/lua-readline'...
Submodule path 'submodules/argparse': checked out 'a40458fdc1507e44b6a829b6c6b969b500e1c337'
Submodule path 'submodules/lua-cjson': checked out 'fac2934964e5aaed298daa105e5664d3311edf07'
Submodule path 'submodules/lua-readline': checked out '4fdbf5cc2a58d0442770f52b1bed174805737f9c'
make: *** wait: No child processes. Stop.
This is easy to work around.. just drop the -j option. This is a low priority bug because Rosie doesn't take long to build without using a parallel build.
In most open source projects, "make install" will copy all of the files needed into the system's root (or other selected location). Those files, once installed, must be capable of running there stand-alone without using any files from the original build location. The same goes for any doc files that are associated with the project.
The current code in rosie's Makefile simply creates a soft link to a file that was constructed in the build area. And some of the files in the build area have hard-coded paths to the build area.
This is also affecting RPM packaging, because the standard way RPM spec files are constructed is to rely on having the files installed into the root in a standard way.
The wrapper scripts should be changed so that they don't use hard-coded paths, if possible, or at least that their paths can be updated during the "make install" execution. Further, as stated above, the Makefile should be changed so that all needed files are copied to the installation root, so that they are completely independent of the build boot.
Currently when in REPL mode you cannot press UP to repeat your previously entered line. This requires you to either copy/paste with a mouse or retype the entire thing.
I have looked at src/core/repl.lua and noticed it's hardcoded io.stdin:read(). Since the readline (readline-devel) package is already required, it would be useful to exploit this for REPL mode if we can.
There are some important tests that need to be added to librosie
, such as the return values from malloc
.
Hi, Jamie,
I am trying to call v1.0.0 librosie with the following C Program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "librosie.h"
#define ROSIE_HOME "/Users/syau/dev/log-analytics/rosie-pattern-language"
str new_rstr(char s){
str *p;
p=(str *)malloc(sizeof(str));
p->len=strlen(s);
p->ptr=(byte_ptr)s;
return p;
}
void free_rstr(str p){
/ zero memory first before freeing */
memset(p->ptr,0,p->len);
free(p);
}
int main() {
int rc;
str errors;
errors=new_rstr("ok");
/ initialize Rosie engine /
void engine=rosie_new(errors);
printf("* Instantiate\n");
printf("Instantiate Rosie engine error:%s\n",errors->ptr);
/*
int rosie_import(void *L, int *ok, str *pkgname, str *as, str *errors);
int rosie_compile(void *L, str *expression, int *pat, str *errors);
int rosie_match(void *L, int pat, int start, char *encoder, str *input, match *match);
/
/ import package /
str pkgname;
pkgname=new_rstr("all");
str as;
as=new_rstr("");
int ok;
rc=rosie_import(engine, &ok, pkgname, as, errors);
printf(" Import\n");
printf("Import RPL error:%s\n",errors->ptr);
str *expression;
expression=new_rstr("\"all.things11\"");
int pat=0;
errors=new_rstr("ok2");
rc=rosie_compile(engine, expression, &pat, errors);
printf("*** Compile\n");
printf("Compile pattern error:%s\n",errors->ptr);
printf("pat:%i\n",pat);
str *input;
input=new_rstr("\"1234\"");
match r;
rc=rosie_match(engine, pat, 1, "json", input, &r);
printf("*** Match\n");
printf("match rc:%i\n",rc);
printf("leftover:%i\n",r.leftover);
printf("ttotal:%i\n",r.ttotal);
printf("tmatch:%i\n",r.tmatch);
printf("data length:%i\n",r.data.len);
printf("data%c\n",*(r.data.ptr));
}
I compile with the following:
/Applications/Xcode.app/Contents/Developer/usr/bin/make librosie.so SYSCFLAGS="-DLUA_USE_MACOSX" SYSLDFLAGS="-dynamiclib" CC=clang
clang -o librosie.o -c librosie.c -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_MACOSX -I/Users/syau/dev/log-analytics/rosie-pattern-language/submodules/lua/include
clang -o crosie crosie.c -O2 -Wall -Wextra -DLUA_COMPAT_5_2 -DLUA_USE_MACOSX -std=gnu99 -I${ROSIE_HOME}/submodules/lua/include -L. -lrosie
When I run the program, I got:
Kwans-MBP:librosie syau$ ./crosie
*** Instantiate
Instantiate Rosie engine error:ok
*** Import
Import RPL error:{}
*** Compile
Compile pattern error:{}
pat:1
*** Match
match rc:0
leftover:6
ttotal:3
tmatch:3
data length:0
Segmentation fault: 11
It looks like all but the rosie_match() call had no error. I don't know what am I missing or done wrong here. Appreciate your advice. Many thanks.
-Steve
PS- I am just referring to the ffi.cdef in rosie.py and test.py to arrive at the above C program.
The rpl that defines rpl includes a definition for syntax_error
. A match can succeed and have within the returned data structure a syntax_error
node.
I expect this to be a common idiom among Rosie users (to define some kind of error pattern).
Wrt running tests: when a match succeeds, but the match contains an error node, this is a 3rd kind of result. I.e. we could have "match failed", "match succeeded with no error nodes", and "match succeeded with one or more error nodes in the output".
Design questions:
syntax_error
? (Leaning to YES.)syntax_error
? (Leaning to YES and perhaps it should be 'errors' or 'errors_on', as in test language_decl errors_on "rpl 1.a"
.There's a restriction in lpeg that prevents the lpeg "look behind" operator, lpeg.B
will not accept a pattern that has captures (nor patterns that lack a fixed length).
To implement a "look behind" operator in RPL, then, the compiler must strip out all the captures from the argument of the operator. And we must check for fixed length (or let lpeg do it for us).
Related: Once we have a compiler function that takes a pattern and produces a new pattern that is identical except that the new one captures nothing, we may want to expose this in RPL. It would be exposed as an operator that strips all captures out of a pattern, returning a new pattern. (Note that this is different from an alias
, which merely does not add a new capture when defining an identifier.)
Can't use "look behind" on patterns that have captures. (Found this while reading the lpeg code.) E.g.
> repl(e)
Rosie> foo = "xyz"
Rosie> .match foo "xyz"
{"foo":
{"text": "xyz",
"pos": 1.0}}
Rosie>
Exiting
> foo = e.env["foo"].peg
> (lpeg.P(3)*lpeg.B(foo)):match("xyz")
stdin:1: bad argument #1 to 'B' (pattern have captures)
stack traceback:
[C]: in function 'lpeg.B'
stdin:1: in main chunk
[C]: in ?
>
Within the "Building" section there is a typo:
instalaltion
Change to:
installation
The pattern files that ship with rosie in rpl/*.rpl
have accumulated some crud. The module (file) names themselves are not all mnemonic or appropriate, and the contents needs reorganization.
basic
should become col
for "collection"?Rosie would be very useful for parsing log files, which is what Grok does today in ELK. A Rosie plugin could be used in addition to or instead of Grok.
Logstash plugins are written in Ruby and packaged as Ruby gems.
I see build support using Linux (including Ubuntu) and make is supported.
Note: This is essentially a feature request I expect as an FAQ item. I personally have a Linux machine I can use instead of my main Windows machine. I may also be able to help with adding this functionality/documentation if it represents a large value added. A wait and see approach might be best unless this is already on a list of needed project enhancements.
Hi!
I'm looking for case-insensitive mode for string literals like (?i: string) or /string/i in other regexes. Is it possible to use this mode in RPL? Or should I combine list character classes like [Ss][Tt] etc.?
Thanks!
Hi, Jamie,
I tried calling RosieL_match_file() from Python but did not get the expected result.
Here's what I did:
Added the following to rosie-pattern-language/ffi/samples/python/rosie.py:
def match_file(self, input, output, error, wholefileflag):
return self._match_file(input, output, error, wholefileflag, self.rosie.rosie.rosieL_match_file)
def _match_file(self, input, output, error, wholefileflag, operation):
input_file_string = to_cstr_ptr(self.rosie.rosie, input)
output_file_string = to_cstr_ptr(self.rosie.rosie, output)
error_file_string = to_cstr_ptr(self.rosie.rosie, error)
flag_string = to_cstr_ptr(self.rosie.rosie, wholefileflag)
r = operation(self.engine, input_file_string, output_file_string, error_file_string, flag_string);
retvals = self.rosie.get_retvals(r)
return retvals
In my python program "test.py" , I called match_file():
import os, json, sys
import rosie
........
Rosie = rosie.initialize(ROSIE_HOME, ROSIE_HOME + "/ffi/librosie/librosie.so")
engine = Rosie.engine()
r = engine.load_manifest("$sys/MANIFEST")
config = json.dumps( {'expression': 'linuxsys.matchall', 'encode':'json'} )
r = engine.configure(config)
r = engine.match_file("testfile", "", "std.err", "true")
"testfile" has two lines:
Aug 20 18:18:10 beatrice-desktop dhclient[1354]: DHCPREQUEST of 192.168.8.14 on enp0s31f6 to 192.168.8.1 port 67 (xid=0x2830f2c)
Aug 20 20:14:09 beatrice-desktop dhclient[1354]: DHCPACK of 192.168.8.14 from 192.168.8.1
When I execute test.py, it seems to match only the first line and then stop. i.e. it didn't iterate through the rest of the file:
$ python test.py
{"linuxsys.matchall":{"subs":[{"linuxsys.log36":{"subs":[{"linuxsys.prefix":{"subs":[{"basic.datetime_patterns":{"subs":[{"datetime.shortdate":{"text":"Aug 20 ","pos":1}}],"text":"Aug 20 ","pos":1}},{"basic.datetime_patterns":{"subs":[{"datetime.simple_time":{"text":"18:18:10 ","pos":8}}],"text":"18:18:10 ","pos":8}},{"common.identifier_not_word":{"text":"beatrice-desktop","pos":17}}],"text":"Aug 20 18:18:10 beatrice-desktop","pos":1}},{"linuxsys.l5":{"text":"dhclient","pos":34}},{"basic.punctuation":{"text":"[","pos":42}},{"common.number":{"subs":[{"common.int":{"text":"1354","pos":43}}],"text":"1354","pos":43}},{"basic.punctuation":{"text":"]","pos":47}},{"basic.punctuation":{"text":":","pos":48}},{"common.maybe_identifier":{"text":"DHCPREQUEST","pos":50}},{"linuxsys.l6":{"text":"of","pos":62}},{"basic.network_patterns":{"subs":[{"network.ip_address":{"text":"192.168.8.14","pos":65}}],"text":"192.168.8.14","pos":65}},{"linuxsys.l7":{"text":"on","pos":78}},{"common.identifier_not_word":{"text":"enp0s31f6","pos":81}},{"linuxsys.l8":{"text":"to","pos":91}},{"basic.network_patterns":{"subs":[{"network.ip_address":{"text":"192.168.8.1","pos":94}}],"text":"192.168.8.1","pos":94}},{"linuxsys.l9":{"text":"port","pos":106}},{"common.number":{"subs":[{"common.int":{"text":"67","pos":111}}],"text":"67","pos":111}},{"basic.punctuation":{"text":"(","pos":114}},{"linuxsys.l10":{"text":"xid","pos":115}},{"basic.punctuation":{"text":"=","pos":118}},{"common.number":{"subs":[{"common.denoted_hex":{"subs":[{"common.hex":{"text":"2830f2c","pos":121}}],"text":"0x2830f2c","pos":119}}],"text":"0x2830f2c","pos":119}},{"basic.punctuation":{"text":")","pos":128}}],"text":"Aug 20 18:18:10 beatrice-desktop dhclient[1354]: DHCPREQUEST of 192.168.8.14 on enp0s31f6 to 192.168.8.1 port 67 (xid=0x2830f2c)","pos":1}}],"text":"Aug 20 18:18:10 beatrice-desktop dhclient[1354]: DHCPREQUEST of 192.168.8.14 on enp0s31f6 to 192.168.8.1 port 67 (xid=0x2830f2c)","pos":1}}
Garbage collecting engine 118c5a0
When I use the CLI by "rosie linuxsys.matchall testfile", it matches all the lines:
$ rosie linuxsys.matchall testfile
Aug 20 18:18:10 beatrice-desktop dhclient [ 1354 ] : DHCPREQUEST of 192.168.8.14 on enp0s31f6 to 192.168.8.1 port 67 ( xid = 2830f2c )
Aug 20 20:14:09 beatrice-desktop dhclient [ 1354 ] : DHCPACK of 192.168.8.14 from 192.168.8.1
I expected when I set "wholefileflag" to true in r = engine.match_file("testfile", "", "std.err", "true"), it would iterate through all the lines of "testfile". Somehow, it didn't.
I can't figure out what I have missed. Appreciate your advice. Many thanks!
I think this would go really well in the devops section - basically just a service tile that would point them to the code a user would need to leverage this in an app
.match command does not enter the history
.patterns does not search for when it is a number
tab completes file names, but at the repl prompt it should complete commands
tab could complete pattern names after a command is typed
Create a rosie interface for Lua, where Rosie runs in the same Lua instance as the user program. E.g.
r = require("rosie")
r.initialize(rosie_home)
e = r.new_engine()
...
The current color output code is a proof-of-concept that has not been rewritten. Some thoughts:
Hi, we are looking at evaluating Rosie for a current need that will require running in a production environment. We would prefer not to be on the bleeding edge if possible. Would you be able to provide a few examples of production deployments?
Thanks!
IF THIS PROPOSAL IS TO BE IMPLEMENTED AFTER VERSION 1.0, then the change in output format should be made in Version 1.0. (I.e. pos
should be replaced with src
, a table that contains pos
and nothing else. The other fields in src
can appear in Version 1.x.)
Design ideas:
pos
field)librosie
should be able to set name
to be any string (default null and does not appear in Rosie output)match
is calledlibrosie
should be able to reset the line counter (to any reasonable integer)-wholefile
flag of the CLI requires no special treatment (the line will be 1 for all matches)pos
with src
; the src
field is a table containing the source referenceThe current code evolved organically without strict use of Lua modules. Strict module usage has the form foo = require("foo")
where this statement does not introduce any new globals.
Moving to more strict Lua module usage will require reorganizing some of the core code, giving a more clear dependency picture.
It would be great if the eval function would output as JSON so the failure's explanation can be stepped through programmatically.
librosie
was designed to be used with pthreads
where each thread will initialize its own instance of Rosie using initialize()
.
This capability needs to be tested.
Note: Check to see if cjson
will cause a problem. Do we need to use cjson.safe
and cjson.new
?
A useful test would ensure the independence of the engines and be invokable from the command line so it can be used in the Rosie automated test suite. It should return an exit status of 0 for success, and other values for failures.
Hi,
Like there are samples for Python, Ruby, Go etc. I would like to know if there is plan in the near future to add Java.
Thanks
Pattern writers need to test their patterns and it makes sense to automate this. I propose starting with a lightweight testing capability that checks patterns against sample input and looks for only accepts and rejects outcomes. (In future, more information about a match could be tested.)
Design ideas:
rosie test <filename>
command that parses the tests out of an rpl file and runs themE.g. rpl file snippet:
common.word = [:alpha:]+
— common.word accepts “foo”
— common.word rejects “12356”, " ", "#!"
When an engine environment is queried (with an identifier), the engine returns information about the binding (if any) to that identifier.
One of the attributes returned is the value to which the identifier is bound. The value, an RPL pattern, is given in human readable form using the parse.reveal
functions.
First, there are some bugs in parse.reveal
, which reconstructs the RPL by walking the parse tree that generated the pattern. Since this output was meant for humans only, and is not validated in any way, errors like this can occur: (note the parens where there should be braces {})
Rosie> foo = [:digit:]+
Rosie> foo
assignment foo = ([[:digit:]])+
Rosie>
Second, it is very valuable to have a function that is the inverse of read
. The read
operation accepts RPL input and generates an AST. This is currently an internal operation -- it is part of the load
capability of an engine, which reads RPL input and parses it, then compiles it.
The (new in v1-tranche-1) lookup
function of engines produces the output shown in the repl snippet above by calling parse.reveal
. As shown, not only is the output is sometimes incorrect, but it's not even valid RPL. (E.g. the word "assignment" in the output shown above.)
The parse.reveal
functions should return valid RPL which, if parsed, should generate an equivalent AST. (It may not be an identical AST because the original AST contains meta-data about the original RPL that generated it, and this meta-data is going to be different in the reconstruction.)
To sum up: The "binding" output of lookup
(which also returns the assigned color, etc.) should be valid RPL that could be loaded into another Rosie to produce an equivalent pattern (assuming dependencies are loaded into that other Rosie, of course).
General categories (Lu, Ll, Nd, etc.)
Script names (Latin, Greek, etc.)
Hi, Jamie,
I encountered the following error when loading MANIFEST:
Traceback (most recent call last):
File "/opt/log-analytics/bin/mytest", line 33, in
extract()
File "/opt/log-analytics/bin/mytest", line 31, in extract
r = engine.load_manifest("$sys/vmware.MANIFEST")
File "/opt/log-analytics/module/rosie.py", line 154, in load_manifest
retvals = self.rosie.get_retvals(r)
File "/opt/log-analytics/module/rosie.py", line 102, in get_retvals
return self._get_retvals(messages, self.rosie.rosieL_free_stringArray)
File "/opt/log-analytics/module/rosie.py", line 113, in _get_retvals
raise RuntimeError(retvals[1]) # exception indicating that the call failed
RuntimeError: src/core/engine.lua:121: backtrack stack overflow (current limit is 400)
It seems this happens when I have a large number of patterns defined in a number of rpl files.
Could you advise whether there is a way to workaround this?
Appreciate your advice. Many thanks.
-Steve
I noticed while looking at and writing new expressions that Vim had a syntax highlighting already made for the rpl extension. This brought me to find the Reverse Polish Lisp/2 language. Their files use the rpl extension. I'd like to suggest changing the file extensions to something not conflicting so that any editor that already has syntax coloring for RPL/2 doesn't highlight incorrectly.
Possible extensions:
The provided url pattern covers very simple URL's, but doesn't handle the username, password, query, and fragment fields. See https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax
Ideally, the pattern could breakdown the query into individual attribute/value pairs in a list. Maybe not all URL's follow this query format, so there could be a fallback "just-a-long-string" for the query portion if it doesn't match the typical encoding pattern.
The ffi interface to Rosie allows multi-threaded programs in languages like Go, Python, js, and more to use Rosie. Each thread gets its own Rosie instance (they are small), so they won't interfere with each other, and each Rosie could be configured differently if needed.
This request is to create a Lua api that works just like the ffi-based api. Each Rosie instance will have its own Lua state, and the user program will be in its own Lua state, possibly running a different version of Lua (or lpeg, etc.).
r = require("rosie")
e = new_engine(rosie_home)
...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.