weimingtom / jaql Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 64.57 MB

Automatically exported from code.google.com/p/jaql

Shell 0.31% Groff 0.09% Batchfile 0.18% Java 95.39% GAP 1.83% HTML 0.91% CSS 0.12% Python 1.17%

jaql's People

Contributors

Stargazers

Watchers

jaql's Issues

NullPointerException in convert() function

When using the convert() function in a pipe, conversion of a null value
followed by a conversion of a non-null value will lead to a
NullPointerException. 

In the following code example, the first occurrence of {a: "1"} is
converted correctly, its second occurrence produces the error.

[ { a: "1" }, {a: null}, {a: "1"} ] 
-> transform convert($, schema { a: long? });

Original issue reported on code.google.com by [email protected] on 11 Sep 2009 at 4:21

Refactoring tasks for new hadoop api's

Hadoop's new api (currently in trunk) differs greatly from current 
releases. Our approach to deal with multiple api versions is two-fold: (1) 
isolate the deps on hadoop to a small area of the code, and (2) for those 
deps that do differ across hadoop/hbase versions, take the code out of 
jaql's source tree and place it under the appropriate vendor version. 

Problems (and proposed solutions): 

(1) while hadoop deps are isolated, the situation can improve:
=> move deps out of com.ibm.jaql.io.registry and com.ibm.jaql.util into 
sub-packages that are specific to hadoop/hbase

(2) copying an entire file is brittle from a maintenance perspective.
=> use abstract classes to further isolate the parts of a class that 
depend on a particular hadoop/hbase api

A further discussion can be found on the jaql-devel mailing list: 
http://groups.google.com/group/jaql-
devel/browse_thread/thread/e312f3f979353090

Original issue reported on code.google.com by [email protected] on 21 Sep 2009 at 9:26

Make jaql script run JaqlShell

The "jaql" script currently expects a class name as its first argument. 
Since this almost always will be JaqlShell, I propose to hardcode 
JaqlShell into "jaql" script. The old functionality is retained in a 
script called "jaql-generic".

Original issue reported on code.google.com by [email protected] on 26 Mar 2009 at 8:46

Weird problem with keyLookup

What steps will reproduce the problem?

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?
jaql trunk r248

Please provide any additional information below.
There seems to be some weird problem with KeyLookup that has popped up
recently. It seems it is unable to read some of the temp file which it
itself is generating. This is new problem since I didn't face this problem
until last night. Here is how it goes.

Following is the JAQL code:
$ratings = read(hdfs('/user/sudipto/netflix/data/all/json'));
$estrate = 0;
$cust = read(hdfs('/user/sudipto/netflix/data/all/materialized/custparam'));
$movie = read(hdfs('/user/sudipto/netflix/data/all/materialized/movieparam'));

$imHashJoin = fn($outer, $okey, $inner, $ikey) (
  $build = $inner -> transform [$ikey($), $],
  $outer -> transform [$okey($), $]
         -> keyLookup($build)
         -> transform {$[1].*, $[2].*}
);

$ratings
  -> $imHashJoin(fn($r) $r.tid, $movie, fn($m) $m.movie_id)
  -> $imHashJoin(fn($r) $r.cid, $cust, fn($c) $c.cust_id)
  -> transform { $.cust_id, $.movie_id, $.rating, diff: $.rating - $estrate,
      $.cparam, $.mparam }
  -> write(hdfs('/user/sudipto/netflix/data/all/materialized/join'));

In the hash join, it first spawns up a MR job to read in the individual
inner tables, and Temps them. Then it tries to join the large outer table
and the temped inner table (This is somewhat new as I think the earlier
version of key lookup did not do so. Probably you wanted to fix the
inlining problem of this expression?). Nevertheless, when the 3rd MR job is
spawned which does the main join, it reports the following error:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
hdfs://impl00.almaden.ibm.com:9000/user/sudipto/jaql_temp_4847551314303483
         at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
         at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:210)
         at
com.ibm.jaql.io.hadoop.DefaultHadoopInputAdapter.getSplits(DefaultHadoopInputAda
pter.java:163)
         at
com.ibm.jaql.io.hadoop.DefaultHadoopInputAdapter.iter(DefaultHadoopInputAdapter.
java:184)
         at com.ibm.jaql.lang.expr.io.AbstractReadExpr$1.(AbstractReadExpr.java:100)
         at com.ibm.jaql.lang.expr.io.AbstractReadExpr.iter(AbstractReadExpr.java:99)
         at com.ibm.jaql.lang.expr.index.KeyLookupFn.iter(KeyLookupFn.java:72)
         at com.ibm.jaql.lang.expr.core.BindingExpr.iter(BindingExpr.java:209)
         at com.ibm.jaql.lang.expr.core.TransformExpr.iter(TransformExpr.java:148)
         at com.ibm.jaql.lang.expr.core.DoExpr.iter(DoExpr.java:126)
         at com.ibm.jaql.lang.core.JaqlFunction.iter(JaqlFunction.java:269)
         at com.ibm.jaql.lang.core.JaqlFunction.iter(JaqlFunction.java:350)
         at
com.ibm.jaql.lang.expr.hadoop.MapReduceBaseExpr$MapEval.run(MapReduceBaseExpr.ja
va:358)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198

Following is the explain:

(
  $fd_2 = mapReduce({
("input"):(hdfs("/user/sudipto/netflix/data/all/materialized/custparam")),
("output"):(HadoopTemp()), ("map"):(fn($mapIn) ( $mapIn
-> transform each $ ([null, [($).("cust_id"), $]]) )
) }),
  $fd_0 = mapReduce({
("input"):(hdfs("/user/sudipto/netflix/data/all/materialized/movieparam")),
("output"):(HadoopTemp()), ("map"):(fn($mapIn) ( $mapIn
-> transform each $ ([null, [($).("movie_id"), $]]) )
) }),
  write((
  $fd_1 = mapReduce({
("input"):(hdfs("/user/sudipto/netflix/data/all/json")),
("output"):(HadoopTemp()), ("map"):(fn($mapIn) ( keyLookup($mapIn
-> transform each $ ([($).("tid"), $]), read($fd_0))
-> transform each $ ([null, { (index($, 1)).*, (index($, 2)).* }]) )
) }),
  keyLookup(read($fd_1)
-> transform each $ ([($).("cid"), $]), read($fd_2))
-> transform each $ ({ (index($, 1)).*, (index($, 2)).* })
)

-> transform each $ ({ (($)).("cust_id"), (($)).("movie_id"),
(($)).("rating"), ("diff"):((($).("rating"))-(0)), (($)).("cparam"),
(($)).("mparam") }), hdfs("/user/sudipto/netflix/data/all/materialized/join"))
)


Note that this problem was encountered when the mapreduce cluster was
running under a username different from the user account used for
submitting jobs from another remote machine.

Original issue reported on code.google.com by [email protected] on 22 Jul 2009 at 2:01

Jaqlshell command history does not work in Cygwin

1. Open a jaqlshell in cygwin. 
2. Input some jaql commands.
2. After hitting UP or DOWM key, command history does not show up.

Original issue reported on code.google.com by [email protected] on 4 Sep 2009 at 7:50

Namespaces & Modules for jaql

Currently jaql does only support concatenating of files. This limits the
reusability of jaql script considerably. Jaql needs new ways to group
functionality and to import it into a file.

Original issue reported on code.google.com by [email protected] on 6 Aug 2009 at 9:20

jaqlshell hang-up after issuing quit;

Jaqlshell will hang after `quit;` is issued in jaqlshell.

Original issue reported on code.google.com by [email protected] on 2 Sep 2009 at 2:10

Consistent Arithmetics with different types

1+1 returns a JsonLong, while 1/1 results in a JsonDecimal, as well as 1/1.0

In .1/2d, the .1 gets converted from BigDecimal to double.

From the shell, you can't see whether your result is stored as long or
BigDecimal:

jaql> typeof(1);
"number"

jaql> typeof(1.1);
"number"

Original issue reported on code.google.com by [email protected] on 8 Jul 2009 at 11:21

Code review request for Support NON-JSON streaming out in Jaql Shell

Hi, Vuk. 

Could you please do this code review for me? Thanks.

====================
Review Descriptions
====================
The following points are key points in my implementation. You can  
also see my questions in TODO comments of the source code.

- Is JsonOutputStream a duplicate of JsonTextOutputStream? If 
  yes, it should be deleted. Otherwise, I need to extract the common 
  stuff of JsonOutputStream and JsonTextOutputStream to an 
  abstract class.
- printArrayClose method is added to JsonToStream. And finish 
  method is added to ClosableJsonWriter. All these are needed to 
  support the finish of writing without closing the underlying 
  STDOUT in Jaql.
- AbstractWriteExpr's eval method demands that the JSON value to 
  be printed be JSON array. Following this style, Jaql Shell also 
  prints JSON values in array mode. It means that even only one 
  JSON value is printed, it will be wrapped with "[" and "]".
- In batch mode, outputs to STDOUT are disabled. Otherwise, these 
  outputs will mess the evaluated result up.

===========
Next Steps
===========
 - Support for CSV.
 - Support for XML.
 - Add parsing mechanism for general Jaql IO descriptors.
 - Unit test and bug fixing.
Know Issues
-------------
 - "]" will be printed if the input is empty. Nothing should be 
   printed.

Original issue reported on code.google.com by [email protected] on 22 Sep 2009 at 3:55

Automatic Function Loading by using Annotations

The current process of determing the list of available functions
in jaql is to manually add them to a hash table. This process can
be simplified because  all jaql functions are marked with the 
"JaqlFn" Annotation.
The idea now is to add automatically add all classes annotated with
this Annotation to the hashtable. There are two ways to accomplish
this. First through runtime reflection of all classes or with a 
special annotation processing step during compile time. Because a
one time effort is generally better i chose to process the annotations
during compile time.

To accomplish this a custom Processor class is required that handles
the annotations. Based on the information provided by them it creates
a new class source file which contains a list of all functions. This
class is then used by FunctionLib to fill its hashtable.
Because the new class is generated after FunctionLib is compiled it
can only be indirectly referenced by using a ClassLoader. FunctionLib
also needs to know which methods the generated class provides so a new
interface was added that is implemented by the generated class.

Because the build process has changed the build.xml needs to be changed.
Two new steps where added to the compile task. After the first compilation
phase a second run of the java compiler is needed this time just with the
processor. It generates a new source file. This file is then compiled in
third compilation run.

When reviewing my code changes, please focus on:
- One big problem that is still left is that eclipse does not support that
the annotation processor is in the same project as the code it should
process. The proposed solutions to this are: 
1. A seperate project for the processor. 
2. Supply a jar containing the processor and use that for processing. This
jar needs to be recreated after each change to the processor. This should
happen only seldom.
Which way should we choose? 

- I also checked both the original function list and the generated one. The
generated one contains additionally hdfsRead / hdfsWrite. Is that a
problem?



After the review, I'll merge this patch into:
/trunk

Original issue reported on code.google.com by [email protected] on 19 Jun 2009 at 7:32

Attachments:

annotation.patch

Provide functionality to read JSON text files

We currently support reading JSON files with one JSON value per line, and
JSON files that represent arrays. Support for arbitrary JSON requires hacks
like

read(lines("file")) -> combine(strcat) -> json();

We should provide a nice API along the lines of read(jsontext("file"));

Original issue reported on code.google.com by [email protected] on 19 Sep 2009 at 12:41

Cygpath complians about the miss of hadoop log directory

What steps will reproduce the problem?
1. Open a Cygwin console
2. Type `bin/jaqlshell`

The outpout is similar to the following text.
{{{
cygpath: cannot create short name of
D:\road\hadoop\jaql\vendor\hadoop\0.18.3\logs

Initializing Jaql.
}}}

Original issue reported on code.google.com by [email protected] on 25 Aug 2009 at 11:19

Projection pushdown

1. Add generic projection support to the I/O layer
2. Do projection pushdown analysis in the compiler

Original issue reported on code.google.com by [email protected] on 20 Mar 2009 at 9:24

Initialization for user defined functions

Some user defined functions have non-trivial initialization phases
that in the ideal case, would not be repeated for each record of a 
mapper's input. Some work-arounds include the use of static variables 
(need to worry about multiple threads) and the registry infrastructure in 
jaql (as used for random number sampling). The first option may not be 
safe and the second requires jaql code to be modified which is not a good 
long option. 

While there is merit in exposing the registration infrastructure, a better 
option is to support initialization for user-defined functions.

Original issue reported on code.google.com by [email protected] on 27 Jan 2009 at 1:39

"unexpected token: j" error in cygwin

Input the following code in jaqlshell.

[
    {x: 0, text: "zero"},
    {x: 1, text: "one"},
    {x: 0, text: "two"},
    {x: 1, text: "three"},
    {x: 0, text: "four"},
    {x: 1, text: "five"},
    {x: 0, text: "six"},
    {x: 1, text: "seven"},
    {x: 0, text: "eight"}
  ]
-> write(hdfs("sample.dat"));


There is the following error.

line 59:1: unexpected token: j
line 59:1: unexpected token: j
        at com.ibm.jaql.lang.parser.JaqlParser.path(JaqlParser.java:3234)
        at com.ibm.jaql.lang.parser.JaqlParser.typeExpr(JaqlParser.java:3828)
        at com.ibm.jaql.lang.parser.JaqlParser.unaryAdd(JaqlParser.java:3762)
        at com.ibm.jaql.lang.parser.JaqlParser.multExpr(JaqlParser.java:3655)
        at com.ibm.jaql.lang.parser.JaqlParser.addExpr(JaqlParser.java:3632)
        at
com.ibm.jaql.lang.parser.JaqlParser.instanceOfExpr(JaqlParser.java:3526)
        at com.ibm.jaql.lang.parser.JaqlParser.compare(JaqlParser.java:3455)
        at com.ibm.jaql.lang.parser.JaqlParser.inExpr(JaqlParser.java:3383)
        at com.ibm.jaql.lang.parser.JaqlParser.kwTest(JaqlParser.java:3357)
        at com.ibm.jaql.lang.parser.JaqlParser.notExpr(JaqlParser.java:3321)
        at com.ibm.jaql.lang.parser.JaqlParser.andExpr(JaqlParser.java:3269)
        at com.ibm.jaql.lang.parser.JaqlParser.orExpr(JaqlParser.java:3246)
        at com.ibm.jaql.lang.parser.JaqlParser.expr(JaqlParser.java:727)
        at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:635)
        at com.ibm.jaql.lang.parser.JaqlParser.topAssign(JaqlParser.java:420)
        at com.ibm.jaql.lang.parser.JaqlParser.stmt(JaqlParser.java:289)
        at com.ibm.jaql.lang.parser.JaqlParser.stmtOrFn(JaqlParser.java:211)
        at com.ibm.jaql.lang.parser.JaqlParser.parse(JaqlParser.java:154)
        at com.ibm.jaql.lang.Jaql.prepareNext(Jaql.java:288)
        at com.ibm.jaql.lang.Jaql.run(Jaql.java:399)
        at com.ibm.jaql.lang.Jaql.run(Jaql.java:67)
        at
com.ibm.jaql.util.shell.AbstractJaqlShell.runInteractively(AbstractJaqlShell.jav
a:48)
        at
com.ibm.jaql.util.shell.AbstractJaqlShell.main(AbstractJaqlShell.java:80)
        at JaqlShell.main(JaqlShell.java:272)

Original issue reported on code.google.com by [email protected] on 26 Aug 2009 at 3:15

Wrong DFLT_HBASE_VERSION

In bin/jaql script, DFLT_HBASE_VERSION is set to 0.18.3. HBase has release
0.18.1 and 0.19.3. But it does not have release 0.18.3.

Original issue reported on code.google.com by [email protected] on 2 Sep 2009 at 7:01

JaqlShell explain mode

Add a flag to the jaqlShell to explain an entire script and avoid evaluation.

Original issue reported on code.google.com by [email protected] on 24 Jun 2009 at 10:31

Add binary operations for (most) atomic types

A JRawComparator has to be registered for each type that supports
comparison without deserializing. Currently, such a comparator is provided
for JBinary only.

Original issue reported on code.google.com by [email protected] on 5 May 2009 at 12:31

Support NON-JSON streaming out in Jaql Shell

Vuk's Original Design

I think this would be a useful feature, in particular, when jaql is used in
batch mode for the purpose of feeding other programs produced by jaql
scripts. I don't think it makes much sense for interactive mode, but you
let me know what you think about this.

If one calls the shell in batch mode, all outputs should be written with
the given format, say CSV, XML, etc. My guess is that this will be in
particular useful for the case where there is only one query and they want
the output piped to another program.

Now, how to specify the format? One option is as you suggest-- add an
option to the shell (e.g., --format csv) that forces top-level writes to be
formatted accordingly. As you've seen in Jaql, I/O is specified through
descriptors (e.g., {type: 'local', location: 'foo.json', options: {adapter:
"...", format: "..."}}) (you can see the examples in
conf/storage-default.jql or src/test/com/ibm/jaql/storageQueries.txt). One
options is to have the argument to --format have a corresponding I/O
descriptor. For example, you may have the argument to --format (e.g.,
'csv'), be the key in the storage registry (e.g., storage-default.jql).
Instead of the FileStreamOutputAdapter?, I'd use a StreamOutputAdapter?
that you bind to System.out-- then all should work as if writing to any
stream. In summary, the things to do are:

   1. add a option to jaql shell for --format 

   2. in JaqlShell?, if --format is provided, so long as we're in batch
mode, the format is valid, and the format derives from StreamOutputAdapter?, 

        setup a StreamOutputAdapter? that is bound to System.out,
initialized properly, etc.

   3. Of course, to get this to work with CSV, you'll have to add a CSV
entry to storage-default.jql first (I'd play with this first in interactive
mode, then move 

        to the other tasks)

Next, do we only want to support options that have an entry in
storage-default.jql? What if I have a new format I want to use? Given
jaql's architecture, this shouldn't be a big deal. Just like when an IO
descriptor is given for read/write, one could conceive of passing in such a
descriptor to "--format". This will make argument parsing a bit trickier
(will need to parse json-- if its a string, then its a key, if its a
record, its a descriptor) but its doable. What do you think of this extra
generality?

Let me know if some of the above doesn't make sense.

Original issue reported on code.google.com by [email protected] on 22 Sep 2009 at 2:37

Use jaql class folders in jaqlshell

http://groups.google.com/group/jaql-devel/browse_thread/thread/ef8388b0b4f1bd10

Original issue reported on code.google.com by [email protected] on 21 Sep 2009 at 5:16

Jaql pipe syntax

Redesign the jaql syntax around a unix pipe-like notation.

Original issue reported on code.google.com by [email protected] on 16 Oct 2008 at 3:13

Attachments:

Add support for efficient invocation of R from Jaql

There should be an efficient way of invoking R from Jaql.

Original issue reported on code.google.com by [email protected] on 17 Sep 2009 at 1:17

Delimited (CSV) output is non-standard

The delimited output format uses backslash escaping for embedded quotes
(\"), newlines (\n), and non-printable characters (\u...).  (I.e., the same
as done for JSON strings.)  However, the CSV description at
http://en.wikipedia.org/wiki/Comma-separated_values says that embedded
quotes should be doubled ("this has an embedded quote "" in it") and that
new lines are printed inside the string ("this has an embedded
newline in it").  Also non-printable characters are allowed in the file, I
believe.

We should move the the double quoting and verbatim non-printable characters
(although what about encoding; we are writing UTF-8 right now).

The unfortunate problem with embedded new lines is that the TextInputFormat
will not split the file correctly.  On the other hand, other tools will not
import the back-quoted values correctly.  Perhaps we should have a few ways
to handle newlines: embedded (standard), backquoted, or striped.  This
could be a newline substitution string ("\n", "\\n", or "").

We should also support the reverse options while reading CSV files.  If it
has embedded newlines, we can avoid splitting the file (multiple files
could still be mapped) or perhaps we can fix up the text splitter to not
split on a line with unbalanced quotes (is this sufficient to detect
embedded newlines?).

For JSON serialized fields, we should definitely avoid creating the
formatting newlines.

FYI IBM DB2 and MS Excel both supported embedded newlines and double quoting.

Original issue reported on code.google.com by [email protected] on 23 Sep 2009 at 2:50

Unify serialization and stream adapters

Currently, StreamAdapters include a "formatter" that knows how to convert 
stream bytes to json. The formatter is currently a StreamToJson converter. 
We should change this so that we re-use the serialization framework-- it 
makes little sense to have to api's that basically handle the same task.

Original issue reported on code.google.com by [email protected] on 23 Sep 2009 at 12:53

Add support for Hadoop 0.20.x

Add support for Hadoop 0.20.x

Original issue reported on code.google.com by [email protected] on 18 Sep 2009 at 12:37

Extend functionality of JaqlShell

Extend JaqlShell with functionality to read its input from files or to 
evaluate a command line argument.

Original issue reported on code.google.com by [email protected] on 26 Mar 2009 at 10:11

Semantics of global variables

Jaql treats the definition of global variables as simply a definition; it
does not evaluate the variable's value immediately.  The variable
definition is included in each query evaluation, which means the variable
will be evaluated for each query evaluation.  Moreover, the variable can be
inlined inside the query in such a way that causes it to be evaluated
multiple times in a single query.

The "materialize $var" statement can be used to force the evaluation of a
variable.  This doesn't seem very clean, but provides a short-term
workaround.  One problem with materialization is the value is not stored in
a map/reducible location (eg hdfs), so we do not get map/reduce over the
variable result.  This is a general problem with all variables.  Moreover,
it is unclear when we should materialize into a distributed location (for
large results this makes sense) vs store in memory (for small results). 
Currently, the user has to handle this using an explicit write.

Another issue is that global variables are never redefined; instead a new
variable is created that hides the old one - old references are still to
the old variable.  This makes variable definitions feel like they are
evaluated immediately even though the evaluation is lazy, but causes
unexpected results in the case of functions.  Consider two examples:

$x = 1;
$y = $x + 1;
$x = 2;
$y; // produces 2, which seems right.

$f = fn() 1;
$g = fn() $f() + 1;
$f = fn() 2; // incorrectly think this redefines $f() and therefore affects $g
$g(); // produces 2, which seems wrong 

We need some more thinking here.

Original issue reported on code.google.com by [email protected] on 12 Mar 2009 at 8:43

Multi-user issues

Jaql is difficult to use when multiple users want to run jaql on the same 
machine. The issue stems from poor default values (e.g., /tmp 
directories). The solution is to make sure that usernames are included in 
such default values so that there are no clashes when multiple jaql users 
are on the same machine.

Original issue reported on code.google.com by [email protected] on 26 Jan 2009 at 6:39

lines() function does not handle nulls

When reading using lines(), an empty line should be treated as a null.
Similarly, when writing, an null value should be written as an empty line.

Original issue reported on code.google.com by [email protected] on 18 Sep 2009 at 10:57

Improvements to the lines() function

The lines() function provides support for reading in lines of text. But if
the lines of text are single columns and the user wants to convert it to a
specific atomic type, then s/he has to pipe the output of lines() to
convert(). lines() function should be designed along similar lines as del()
and take additional argument that can be used to specify the schema in case
the input file has single columns.

Additionally, write(lines()) should write text files where each data item
is converted to a single line of text. At present, write(lines()) behaves
similar to write(hdfs()) and generates sequence files.

Original issue reported on code.google.com by [email protected] on 16 Sep 2009 at 10:52

Deprecate older hadoop versions

Remove support for 0.15.3, and 0.16.1

Original issue reported on code.google.com by [email protected] on 26 Jan 2009 at 6:50

Deferred access for large objects

When a large objects are read from an input along with a bunch of other
fields (eg metadata about the object) and then the metadata is processed by
several expensive operations, it sometime make sense to defer the retrieval
of the large object and go back to the input to fetch the large object
after processing.  We could do few things here:

1. support large object handles that are fetched later.
2. a cost-based rewrite that considers early vs late fetching of the large
object.

At a minimum, we should do what is necessary to allow a user to do this
manually without a lot of effort.

Original issue reported on code.google.com by [email protected] on 20 Mar 2009 at 9:29

expand unroll is broken

expand unroll copies the whole array that should be unrolled, instead of
just one element per result.

To reproduce: run the following statement in the jaql shell:
[ [1, [9,8,7] ] ]-> expand unroll $[1];
Expected result:
[ [1, 9], [1, 8], [1, 7] ]
Actual result:
[ [1, [9, 8, 7]], [1, [9, 8, 7]], [1, [9, 8, 7]] ]

Original issue reported on code.google.com by [email protected] on 31 Jul 2009 at 10:44

Add support for user-defined aggregates

This includes support for UDAs written in Jaql itself as well as UDAs
written in Java.

Original issue reported on code.google.com by [email protected] on 11 Sep 2009 at 7:46

Path expressions involving records produce wrong results

There are several issues with path expressions involving field selections
from a record. Example 1 and 2 below show the current way of infering field
names and also how it should be. Example 3 should be disallowed; field
exclusion should only allow one level.

#  QUERY                              CURRENT RESULT    SHOULD BE
1  { a: {b: 1, c: 2} }{.a.b}          { a: 1 }          { b: 1 }
2  { a: {b: 1, c:2} }{.a.b, .a.c}     throws error      { b:1, c:2 }
3  { a: {b: 1, c:2} }{*-.a.b}         {}                throw arrow

Original issue reported on code.google.com by [email protected] on 14 Jul 2009 at 2:17

SpilledJsonArray does not always ensure that there is a spill file

What steps will reproduce the problem?
1. Load a large data set
2. Try to run e.g. a group into { num: count($) } statement
3. It will fail with

What is the expected output? What do you see instead?

expected: It should properly process the query
result: 
java.lang.NullPointerException
com.ibm.jaql.util.BaseUtil.writeVUInt(BaseUtil.java:297)
com.ibm.jaql.io.serialization.binary.def.DefaultBinaryFullSerializer.copy(Defaul
tBinaryFullSerializer.java:154)
com.ibm.jaql.io.serialization.binary.def.DefaultBinaryFullSerializer.copy(Defaul
tBinaryFullSerializer.java:1)
com.ibm.jaql.json.type.SpilledJsonArray.addCopySerialized(SpilledJsonArray.java:
443)



What version of the product are you using? On what operating system?

- Trunk revision 208, hadoop 0.18.3

Please provide any additional information below.

The problem is that the "addCopySerialized(DataInput input,
BinaryFullSerializer serializer)" method of SpilledJsonArray does not
always ensures that there is a spill file. If there are many records and
the values get written into the spill file it throws a null pointer
exception because there it is not created

Original issue reported on code.google.com by [email protected] on 12 Jun 2009 at 6:47

Error handling during import/export

The interfaces KeyValueImport and KeyValueExport do not allow their convert
() method to throw an exception. I think that it is crucial that these 
methods can report conversion errors back to Jaql (w/o using runtime 
exceptions), either using declared exceptions or a return value.

Original issue reported on code.google.com by [email protected] on 24 Mar 2009 at 10:58

reverse() is broken

the reverse function fails with the following exception:

reverse([1,2,3]);

java.lang.NullPointerException: undefined variable: $
java.lang.NullPointerException: undefined variable: $
        at com.ibm.jaql.lang.core.Var.getValue(Var.java:208)
        at com.ibm.jaql.lang.expr.core.VarExpr.eval(VarExpr.java:102)
        at com.ibm.jaql.lang.expr.core.IndexExpr.eval(IndexExpr.java:142)
        at com.ibm.jaql.lang.expr.core.CmpSpec.eval(CmpSpec.java:104)
        at com.ibm.jaql.lang.expr.core.CmpSingle.eval(CmpSingle.java:67)
        at com.ibm.jaql.lang.expr.core.SortExpr.iter(SortExpr.java:113)
        at com.ibm.jaql.lang.expr.core.BindingExpr.iter(BindingExpr.java:213)
        at
com.ibm.jaql.lang.expr.core.TransformExpr.iter(TransformExpr.java:148)
        at com.ibm.jaql.lang.Jaql.run(Jaql.java:405)
        at com.ibm.jaql.lang.Jaql.run(Jaql.java:67)
        at
com.ibm.jaql.util.shell.AbstractJaqlShell.runInteractively(AbstractJaqlShell.jav
a:48)
        at
com.ibm.jaql.util.shell.AbstractJaqlShell.main(AbstractJaqlShell.java:84)
        at JaqlShell.main(JaqlShell.java:272)

The explain output is the following:
explain reverse([1,2,3]);
Invalid query... Undefined variables:
$
$ = ( [1, 2, 3]
-> enumerate() )
  -> sort using ((index($, 0)) desc)
-> transform each $ (index($, 1))

The working solution should look like this:
( [1, 2, 3]-> enumerate() )
-> sort using (fn($) cmp [(index($, 0)) desc])
-> transform each $ (index($, 1));

Original issue reported on code.google.com by [email protected] on 8 Sep 2009 at 6:50

Generic HBase support

Jaql currently only supports hbase tables that use jaql's binary JSON
format for column values.  We need to generalize the support in a way that
adds conversion from an hbase row (the column byte array values) into a
JSON view, like the converters for other InputFormats.

Original issue reported on code.google.com by [email protected] on 17 Sep 2009 at 12:04

schema keywords conflict with conversion functions

The introduction of schema made some all type names keywords. As a
consequence, the conversion functions do not work anymore. For example,
"double(1)" will throw an exception but should produce "1.0d".

Workarounds:
- rename conversion functions
- adapt parser accordingly

Original issue reported on code.google.com by [email protected] on 22 Jun 2009 at 10:02

Selection pushdown

1. Push selections earlier in the query.
2. Add generic support to push selections down to the I/O layer (eg, HBase
filters)

Original issue reported on code.google.com by [email protected] on 20 Mar 2009 at 9:25

Error when using output stream


localRead('/home/user/foo',
{format: 'com.acme.extensions.data.FromJSONTxtConverter'}) 

works for reading but 

localWrite('/home/user/bar.json', 
{format: 'com.acme.extensions.data.ToJSONTxtConverter'},$blah); 

does not work for writing. 

java.lang.Exception: formatter must implement ItemOutputStream
java.lang.Exception: formatter must implement ItemOutputStream
        at com.ibm.jaql.io.stream.StreamOutputAdapter.initializeFrom
(StreamOutputAdapter.java:57)
        at com.ibm.jaql.io.AbstractOutputAdapter.initializeFrom
(AbstractOutputAdapter.java:48)
        at com.ibm.jaql.io.AdapterStore$OptionHandler.getAdapter
(AdapterStore.java:305)
        at com.ibm.jaql.lang.expr.io.StWriteExpr.eval(StWriteExpr.java:129)
        at com.ibm.jaql.lang.expr.top.QueryExpr.eval(QueryExpr.java:92)
        at com.ibm.jaql.lang.Jaql.main1(Jaql.java:86)
        at JaqlShell.runInteractively(JaqlShell.java:173)
        at JaqlShell.main(JaqlShell.java:378)

Original issue reported on code.google.com by [email protected] on 26 Jan 2009 at 6:42

Add file to store command history

Adds a file to store command history. As a result, when a user 
launches Jaql Shell again, he can see the commands inputted in 
the previous Jaql Shell session.

Original issue reported on code.google.com by [email protected] on 23 Sep 2009 at 12:59

Add support for HBase 0.20.0 for Hadoop 0.20.x line



Add support for HBase 0.20.0 for Hadoop 0.20.x line

Original issue reported on code.google.com by [email protected] on 19 Sep 2009 at 12:47

unexpected token: { error for aggregate function

Open a jaqlshell in Cygwin. After inputting the following code in jaqlshell.

  mapReduce( 
    { input:  {type: "hdfs", location: "sample.dat"}, 
      output: {type: "hdfs", location: "results.dat"}, 
      map:    fn($v) ( $v -> transform [$.x, 1] ),
      reduce: fn($x, $v) ( $v -> aggregate {x: $x, num: count($)} )
    });

There is the following error.

line 9:44: unexpected token: {
line 9:44: unexpected token: {
        at com.ibm.jaql.lang.parser.JaqlParser.aggregate(JaqlParser.java:997)
        at com.ibm.jaql.lang.parser.JaqlParser.op(JaqlParser.java:844)
        at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:641)
        at com.ibm.jaql.lang.parser.JaqlParser.optAssign(JaqlParser.java:569)
        at com.ibm.jaql.lang.parser.JaqlParser.block(JaqlParser.java:469)
        at com.ibm.jaql.lang.parser.JaqlParser.parenExpr(JaqlParser.java:1815)
        at com.ibm.jaql.lang.parser.JaqlParser.basic(JaqlParser.java:4317)
        at com.ibm.jaql.lang.parser.JaqlParser.fnCall(JaqlParser.java:2516)
        at com.ibm.jaql.lang.parser.JaqlParser.path(JaqlParser.java:3179)
        at com.ibm.jaql.lang.parser.JaqlParser.typeExpr(JaqlParser.java:3828)
        at com.ibm.jaql.lang.parser.JaqlParser.unaryAdd(JaqlParser.java:3762)
        at com.ibm.jaql.lang.parser.JaqlParser.multExpr(JaqlParser.java:3655)
        at com.ibm.jaql.lang.parser.JaqlParser.addExpr(JaqlParser.java:3632)
        at
com.ibm.jaql.lang.parser.JaqlParser.instanceOfExpr(JaqlParser.java:3526)
        at com.ibm.jaql.lang.parser.JaqlParser.compare(JaqlParser.java:3455)
        at com.ibm.jaql.lang.parser.JaqlParser.inExpr(JaqlParser.java:3383)
        at com.ibm.jaql.lang.parser.JaqlParser.kwTest(JaqlParser.java:3357)
        at com.ibm.jaql.lang.parser.JaqlParser.notExpr(JaqlParser.java:3321)
        at com.ibm.jaql.lang.parser.JaqlParser.andExpr(JaqlParser.java:3269)
        at com.ibm.jaql.lang.parser.JaqlParser.orExpr(JaqlParser.java:3246)
        at com.ibm.jaql.lang.parser.JaqlParser.expr(JaqlParser.java:727)
        at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:635)
        at com.ibm.jaql.lang.parser.JaqlParser.function(JaqlParser.java:884)
        at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:658)
        at com.ibm.jaql.lang.parser.JaqlParser.fieldValue(JaqlParser.java:3164)
        at com.ibm.jaql.lang.parser.JaqlParser.field(JaqlParser.java:2974)
        at com.ibm.jaql.lang.parser.JaqlParser.record(JaqlParser.java:2920)
        at com.ibm.jaql.lang.parser.JaqlParser.basic(JaqlParser.java:4297)
        at com.ibm.jaql.lang.parser.JaqlParser.fnCall(JaqlParser.java:2516)
        at com.ibm.jaql.lang.parser.JaqlParser.path(JaqlParser.java:3179)
        at com.ibm.jaql.lang.parser.JaqlParser.typeExpr(JaqlParser.java:3828)
        at com.ibm.jaql.lang.parser.JaqlParser.unaryAdd(JaqlParser.java:3762)
        at com.ibm.jaql.lang.parser.JaqlParser.multExpr(JaqlParser.java:3655)
        at com.ibm.jaql.lang.parser.JaqlParser.addExpr(JaqlParser.java:3632)
        at
com.ibm.jaql.lang.parser.JaqlParser.instanceOfExpr(JaqlParser.java:3526)
        at com.ibm.jaql.lang.parser.JaqlParser.compare(JaqlParser.java:3455)
        at com.ibm.jaql.lang.parser.JaqlParser.inExpr(JaqlParser.java:3383)
        at com.ibm.jaql.lang.parser.JaqlParser.kwTest(JaqlParser.java:3357)
        at com.ibm.jaql.lang.parser.JaqlParser.notExpr(JaqlParser.java:3321)
        at com.ibm.jaql.lang.parser.JaqlParser.andExpr(JaqlParser.java:3269)
        at com.ibm.jaql.lang.parser.JaqlParser.orExpr(JaqlParser.java:3246)
        at com.ibm.jaql.lang.parser.JaqlParser.expr(JaqlParser.java:727)
        at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:635)
        at com.ibm.jaql.lang.parser.JaqlParser.exprList(JaqlParser.java:4564)
        at com.ibm.jaql.lang.parser.JaqlParser.fnArgs(JaqlParser.java:2405)
        at
com.ibm.jaql.lang.parser.JaqlParser.builtinCall(JaqlParser.java:4334)
        at com.ibm.jaql.lang.parser.JaqlParser.fnCall(JaqlParser.java:2521)
        at com.ibm.jaql.lang.parser.JaqlParser.path(JaqlParser.java:3179)
        at com.ibm.jaql.lang.parser.JaqlParser.typeExpr(JaqlParser.java:3828)
        at com.ibm.jaql.lang.parser.JaqlParser.unaryAdd(JaqlParser.java:3762)
        at com.ibm.jaql.lang.parser.JaqlParser.multExpr(JaqlParser.java:3655)
        at com.ibm.jaql.lang.parser.JaqlParser.addExpr(JaqlParser.java:3632)
        at
com.ibm.jaql.lang.parser.JaqlParser.instanceOfExpr(JaqlParser.java:3526)
        at com.ibm.jaql.lang.parser.JaqlParser.compare(JaqlParser.java:3455)
        at com.ibm.jaql.lang.parser.JaqlParser.inExpr(JaqlParser.java:3383)
        at com.ibm.jaql.lang.parser.JaqlParser.kwTest(JaqlParser.java:3357)
        at com.ibm.jaql.lang.parser.JaqlParser.notExpr(JaqlParser.java:3321)
        at com.ibm.jaql.lang.parser.JaqlParser.andExpr(JaqlParser.java:3269)
        at com.ibm.jaql.lang.parser.JaqlParser.orExpr(JaqlParser.java:3246)
        at com.ibm.jaql.lang.parser.JaqlParser.expr(JaqlParser.java:727)
        at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:635)
        at com.ibm.jaql.lang.parser.JaqlParser.topAssign(JaqlParser.java:420)
        at com.ibm.jaql.lang.parser.JaqlParser.stmt(JaqlParser.java:289)
        at com.ibm.jaql.lang.parser.JaqlParser.stmtOrFn(JaqlParser.java:211)
        at com.ibm.jaql.lang.parser.JaqlParser.parse(JaqlParser.java:154)
        at com.ibm.jaql.lang.Jaql.prepareNext(Jaql.java:288)
        at com.ibm.jaql.lang.Jaql.run(Jaql.java:399)
        at com.ibm.jaql.lang.Jaql.run(Jaql.java:67)
        at
com.ibm.jaql.util.shell.AbstractJaqlShell.runInteractively(AbstractJaqlShell.jav
a:48)
        at
com.ibm.jaql.util.shell.AbstractJaqlShell.main(AbstractJaqlShell.java:80)
        at JaqlShell.main(JaqlShell.java:272)

Original issue reported on code.google.com by [email protected] on 26 Aug 2009 at 2:49

Soft and weak keywords

We currently have a large number of strict keywords, that is, keywords that
cannot be used as identifiers. We should make most (all but the keywords
that have to be strict) of them soft.

Original issue reported on code.google.com by [email protected] on 24 Sep 2009 at 3:48

Support multiple-line jaql commands in jaqlshell

jaqlshell does not support multiple-line jaql commands now. Every single
line of a multiple-line jaql command is treated as a history entry.

Original issue reported on code.google.com by [email protected] on 4 Sep 2009 at 7:55

Provide each Expr with a chance to initialize once per query invocation.

Provide each Expr with a chance to do one-time initialization before
evaluation.  We should also probably include a close call after all
evaluation.  A call after compilation is over, but before initialization
might be useful for any post-compilation needs.

Original issue reported on code.google.com by [email protected] on 12 Mar 2009 at 8:20

Allow long running Exprs to report status to the system

Long running functions can timeout during map/reduce when one record is
processed for a long time.  We should provide a way that any Expr can
report that progress is being made.  This probably involves placing the
hadoop reporter inside the jaql runtime context, but we will probably want
to abstract away from hadoop a little bit so it works in serial code as well.

This feature should consider a larger feature of reporting on the status of
a query, even when multiple map/reduce jobs are running concurrently.  It
should also include reporting of things besides progress, eg, number of
exceptions that were handled.

Original issue reported on code.google.com by [email protected] on 12 Mar 2009 at 8:26

Hadoop 0.19 support

Add hadoop 0.19 support

Original issue reported on code.google.com by [email protected] on 26 Jan 2009 at 6:49

weimingtom / jaql Goto Github PK

jaql's People

Contributors

Stargazers

Watchers

jaql's Issues

Recommend Projects

Recommend Topics

Recommend Org