weimingtom / jaql Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/jaql
Automatically exported from code.google.com/p/jaql
When using the convert() function in a pipe, conversion of a null value
followed by a conversion of a non-null value will lead to a
NullPointerException.
In the following code example, the first occurrence of {a: "1"} is
converted correctly, its second occurrence produces the error.
[ { a: "1" }, {a: null}, {a: "1"} ]
-> transform convert($, schema { a: long? });
Original issue reported on code.google.com by [email protected]
on 11 Sep 2009 at 4:21
Hadoop's new api (currently in trunk) differs greatly from current
releases. Our approach to deal with multiple api versions is two-fold: (1)
isolate the deps on hadoop to a small area of the code, and (2) for those
deps that do differ across hadoop/hbase versions, take the code out of
jaql's source tree and place it under the appropriate vendor version.
Problems (and proposed solutions):
(1) while hadoop deps are isolated, the situation can improve:
=> move deps out of com.ibm.jaql.io.registry and com.ibm.jaql.util into
sub-packages that are specific to hadoop/hbase
(2) copying an entire file is brittle from a maintenance perspective.
=> use abstract classes to further isolate the parts of a class that
depend on a particular hadoop/hbase api
A further discussion can be found on the jaql-devel mailing list:
http://groups.google.com/group/jaql-
devel/browse_thread/thread/e312f3f979353090
Original issue reported on code.google.com by [email protected]
on 21 Sep 2009 at 9:26
The "jaql" script currently expects a class name as its first argument.
Since this almost always will be JaqlShell, I propose to hardcode
JaqlShell into "jaql" script. The old functionality is retained in a
script called "jaql-generic".
Original issue reported on code.google.com by [email protected]
on 26 Mar 2009 at 8:46
What steps will reproduce the problem?
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
jaql trunk r248
Please provide any additional information below.
There seems to be some weird problem with KeyLookup that has popped up
recently. It seems it is unable to read some of the temp file which it
itself is generating. This is new problem since I didn't face this problem
until last night. Here is how it goes.
Following is the JAQL code:
$ratings = read(hdfs('/user/sudipto/netflix/data/all/json'));
$estrate = 0;
$cust = read(hdfs('/user/sudipto/netflix/data/all/materialized/custparam'));
$movie = read(hdfs('/user/sudipto/netflix/data/all/materialized/movieparam'));
$imHashJoin = fn($outer, $okey, $inner, $ikey) (
$build = $inner -> transform [$ikey($), $],
$outer -> transform [$okey($), $]
-> keyLookup($build)
-> transform {$[1].*, $[2].*}
);
$ratings
-> $imHashJoin(fn($r) $r.tid, $movie, fn($m) $m.movie_id)
-> $imHashJoin(fn($r) $r.cid, $cust, fn($c) $c.cust_id)
-> transform { $.cust_id, $.movie_id, $.rating, diff: $.rating - $estrate,
$.cparam, $.mparam }
-> write(hdfs('/user/sudipto/netflix/data/all/materialized/join'));
In the hash join, it first spawns up a MR job to read in the individual
inner tables, and Temps them. Then it tries to join the large outer table
and the temped inner table (This is somewhat new as I think the earlier
version of key lookup did not do so. Probably you wanted to fix the
inlining problem of this expression?). Nevertheless, when the 3rd MR job is
spawned which does the main join, it reports the following error:
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
hdfs://impl00.almaden.ibm.com:9000/user/sudipto/jaql_temp_4847551314303483
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:210)
at
com.ibm.jaql.io.hadoop.DefaultHadoopInputAdapter.getSplits(DefaultHadoopInputAda
pter.java:163)
at
com.ibm.jaql.io.hadoop.DefaultHadoopInputAdapter.iter(DefaultHadoopInputAdapter.
java:184)
at com.ibm.jaql.lang.expr.io.AbstractReadExpr$1.(AbstractReadExpr.java:100)
at com.ibm.jaql.lang.expr.io.AbstractReadExpr.iter(AbstractReadExpr.java:99)
at com.ibm.jaql.lang.expr.index.KeyLookupFn.iter(KeyLookupFn.java:72)
at com.ibm.jaql.lang.expr.core.BindingExpr.iter(BindingExpr.java:209)
at com.ibm.jaql.lang.expr.core.TransformExpr.iter(TransformExpr.java:148)
at com.ibm.jaql.lang.expr.core.DoExpr.iter(DoExpr.java:126)
at com.ibm.jaql.lang.core.JaqlFunction.iter(JaqlFunction.java:269)
at com.ibm.jaql.lang.core.JaqlFunction.iter(JaqlFunction.java:350)
at
com.ibm.jaql.lang.expr.hadoop.MapReduceBaseExpr$MapEval.run(MapReduceBaseExpr.ja
va:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198
Following is the explain:
(
$fd_2 = mapReduce({
("input"):(hdfs("/user/sudipto/netflix/data/all/materialized/custparam")),
("output"):(HadoopTemp()), ("map"):(fn($mapIn) ( $mapIn
-> transform each $ ([null, [($).("cust_id"), $]]) )
) }),
$fd_0 = mapReduce({
("input"):(hdfs("/user/sudipto/netflix/data/all/materialized/movieparam")),
("output"):(HadoopTemp()), ("map"):(fn($mapIn) ( $mapIn
-> transform each $ ([null, [($).("movie_id"), $]]) )
) }),
write((
$fd_1 = mapReduce({
("input"):(hdfs("/user/sudipto/netflix/data/all/json")),
("output"):(HadoopTemp()), ("map"):(fn($mapIn) ( keyLookup($mapIn
-> transform each $ ([($).("tid"), $]), read($fd_0))
-> transform each $ ([null, { (index($, 1)).*, (index($, 2)).* }]) )
) }),
keyLookup(read($fd_1)
-> transform each $ ([($).("cid"), $]), read($fd_2))
-> transform each $ ({ (index($, 1)).*, (index($, 2)).* })
)
-> transform each $ ({ (($)).("cust_id"), (($)).("movie_id"),
(($)).("rating"), ("diff"):((($).("rating"))-(0)), (($)).("cparam"),
(($)).("mparam") }), hdfs("/user/sudipto/netflix/data/all/materialized/join"))
)
Note that this problem was encountered when the mapreduce cluster was
running under a username different from the user account used for
submitting jobs from another remote machine.
Original issue reported on code.google.com by [email protected]
on 22 Jul 2009 at 2:01
1. Open a jaqlshell in cygwin.
2. Input some jaql commands.
2. After hitting UP or DOWM key, command history does not show up.
Original issue reported on code.google.com by [email protected]
on 4 Sep 2009 at 7:50
Currently jaql does only support concatenating of files. This limits the
reusability of jaql script considerably. Jaql needs new ways to group
functionality and to import it into a file.
Original issue reported on code.google.com by [email protected]
on 6 Aug 2009 at 9:20
Jaqlshell will hang after `quit;` is issued in jaqlshell.
Original issue reported on code.google.com by [email protected]
on 2 Sep 2009 at 2:10
1+1 returns a JsonLong, while 1/1 results in a JsonDecimal, as well as 1/1.0
In .1/2d, the .1 gets converted from BigDecimal to double.
From the shell, you can't see whether your result is stored as long or
BigDecimal:
jaql> typeof(1);
"number"
jaql> typeof(1.1);
"number"
Original issue reported on code.google.com by [email protected]
on 8 Jul 2009 at 11:21
Hi, Vuk.
Could you please do this code review for me? Thanks.
====================
Review Descriptions
====================
The following points are key points in my implementation. You can
also see my questions in TODO comments of the source code.
- Is JsonOutputStream a duplicate of JsonTextOutputStream? If
yes, it should be deleted. Otherwise, I need to extract the common
stuff of JsonOutputStream and JsonTextOutputStream to an
abstract class.
- printArrayClose method is added to JsonToStream. And finish
method is added to ClosableJsonWriter. All these are needed to
support the finish of writing without closing the underlying
STDOUT in Jaql.
- AbstractWriteExpr's eval method demands that the JSON value to
be printed be JSON array. Following this style, Jaql Shell also
prints JSON values in array mode. It means that even only one
JSON value is printed, it will be wrapped with "[" and "]".
- In batch mode, outputs to STDOUT are disabled. Otherwise, these
outputs will mess the evaluated result up.
===========
Next Steps
===========
- Support for CSV.
- Support for XML.
- Add parsing mechanism for general Jaql IO descriptors.
- Unit test and bug fixing.
Know Issues
-------------
- "]" will be printed if the input is empty. Nothing should be
printed.
Original issue reported on code.google.com by [email protected]
on 22 Sep 2009 at 3:55
The current process of determing the list of available functions
in jaql is to manually add them to a hash table. This process can
be simplified because all jaql functions are marked with the
"JaqlFn" Annotation.
The idea now is to add automatically add all classes annotated with
this Annotation to the hashtable. There are two ways to accomplish
this. First through runtime reflection of all classes or with a
special annotation processing step during compile time. Because a
one time effort is generally better i chose to process the annotations
during compile time.
To accomplish this a custom Processor class is required that handles
the annotations. Based on the information provided by them it creates
a new class source file which contains a list of all functions. This
class is then used by FunctionLib to fill its hashtable.
Because the new class is generated after FunctionLib is compiled it
can only be indirectly referenced by using a ClassLoader. FunctionLib
also needs to know which methods the generated class provides so a new
interface was added that is implemented by the generated class.
Because the build process has changed the build.xml needs to be changed.
Two new steps where added to the compile task. After the first compilation
phase a second run of the java compiler is needed this time just with the
processor. It generates a new source file. This file is then compiled in
third compilation run.
When reviewing my code changes, please focus on:
- One big problem that is still left is that eclipse does not support that
the annotation processor is in the same project as the code it should
process. The proposed solutions to this are:
1. A seperate project for the processor.
2. Supply a jar containing the processor and use that for processing. This
jar needs to be recreated after each change to the processor. This should
happen only seldom.
Which way should we choose?
- I also checked both the original function list and the generated one. The
generated one contains additionally hdfsRead / hdfsWrite. Is that a
problem?
After the review, I'll merge this patch into:
/trunk
Original issue reported on code.google.com by [email protected]
on 19 Jun 2009 at 7:32
Attachments:
We currently support reading JSON files with one JSON value per line, and
JSON files that represent arrays. Support for arbitrary JSON requires hacks
like
read(lines("file")) -> combine(strcat) -> json();
We should provide a nice API along the lines of read(jsontext("file"));
Original issue reported on code.google.com by [email protected]
on 19 Sep 2009 at 12:41
What steps will reproduce the problem?
1. Open a Cygwin console
2. Type `bin/jaqlshell`
The outpout is similar to the following text.
{{{
cygpath: cannot create short name of
D:\road\hadoop\jaql\vendor\hadoop\0.18.3\logs
Initializing Jaql.
}}}
Original issue reported on code.google.com by [email protected]
on 25 Aug 2009 at 11:19
1. Add generic projection support to the I/O layer
2. Do projection pushdown analysis in the compiler
Original issue reported on code.google.com by [email protected]
on 20 Mar 2009 at 9:24
Some user defined functions have non-trivial initialization phases
that in the ideal case, would not be repeated for each record of a
mapper's input. Some work-arounds include the use of static variables
(need to worry about multiple threads) and the registry infrastructure in
jaql (as used for random number sampling). The first option may not be
safe and the second requires jaql code to be modified which is not a good
long option.
While there is merit in exposing the registration infrastructure, a better
option is to support initialization for user-defined functions.
Original issue reported on code.google.com by [email protected]
on 27 Jan 2009 at 1:39
Input the following code in jaqlshell.
[
{x: 0, text: "zero"},
{x: 1, text: "one"},
{x: 0, text: "two"},
{x: 1, text: "three"},
{x: 0, text: "four"},
{x: 1, text: "five"},
{x: 0, text: "six"},
{x: 1, text: "seven"},
{x: 0, text: "eight"}
]
-> write(hdfs("sample.dat"));
There is the following error.
line 59:1: unexpected token: j
line 59:1: unexpected token: j
at com.ibm.jaql.lang.parser.JaqlParser.path(JaqlParser.java:3234)
at com.ibm.jaql.lang.parser.JaqlParser.typeExpr(JaqlParser.java:3828)
at com.ibm.jaql.lang.parser.JaqlParser.unaryAdd(JaqlParser.java:3762)
at com.ibm.jaql.lang.parser.JaqlParser.multExpr(JaqlParser.java:3655)
at com.ibm.jaql.lang.parser.JaqlParser.addExpr(JaqlParser.java:3632)
at
com.ibm.jaql.lang.parser.JaqlParser.instanceOfExpr(JaqlParser.java:3526)
at com.ibm.jaql.lang.parser.JaqlParser.compare(JaqlParser.java:3455)
at com.ibm.jaql.lang.parser.JaqlParser.inExpr(JaqlParser.java:3383)
at com.ibm.jaql.lang.parser.JaqlParser.kwTest(JaqlParser.java:3357)
at com.ibm.jaql.lang.parser.JaqlParser.notExpr(JaqlParser.java:3321)
at com.ibm.jaql.lang.parser.JaqlParser.andExpr(JaqlParser.java:3269)
at com.ibm.jaql.lang.parser.JaqlParser.orExpr(JaqlParser.java:3246)
at com.ibm.jaql.lang.parser.JaqlParser.expr(JaqlParser.java:727)
at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:635)
at com.ibm.jaql.lang.parser.JaqlParser.topAssign(JaqlParser.java:420)
at com.ibm.jaql.lang.parser.JaqlParser.stmt(JaqlParser.java:289)
at com.ibm.jaql.lang.parser.JaqlParser.stmtOrFn(JaqlParser.java:211)
at com.ibm.jaql.lang.parser.JaqlParser.parse(JaqlParser.java:154)
at com.ibm.jaql.lang.Jaql.prepareNext(Jaql.java:288)
at com.ibm.jaql.lang.Jaql.run(Jaql.java:399)
at com.ibm.jaql.lang.Jaql.run(Jaql.java:67)
at
com.ibm.jaql.util.shell.AbstractJaqlShell.runInteractively(AbstractJaqlShell.jav
a:48)
at
com.ibm.jaql.util.shell.AbstractJaqlShell.main(AbstractJaqlShell.java:80)
at JaqlShell.main(JaqlShell.java:272)
Original issue reported on code.google.com by [email protected]
on 26 Aug 2009 at 3:15
In bin/jaql script, DFLT_HBASE_VERSION is set to 0.18.3. HBase has release
0.18.1 and 0.19.3. But it does not have release 0.18.3.
Original issue reported on code.google.com by [email protected]
on 2 Sep 2009 at 7:01
Add a flag to the jaqlShell to explain an entire script and avoid evaluation.
Original issue reported on code.google.com by [email protected]
on 24 Jun 2009 at 10:31
A JRawComparator has to be registered for each type that supports
comparison without deserializing. Currently, such a comparator is provided
for JBinary only.
Original issue reported on code.google.com by [email protected]
on 5 May 2009 at 12:31
Vuk's Original Design
I think this would be a useful feature, in particular, when jaql is used in
batch mode for the purpose of feeding other programs produced by jaql
scripts. I don't think it makes much sense for interactive mode, but you
let me know what you think about this.
If one calls the shell in batch mode, all outputs should be written with
the given format, say CSV, XML, etc. My guess is that this will be in
particular useful for the case where there is only one query and they want
the output piped to another program.
Now, how to specify the format? One option is as you suggest-- add an
option to the shell (e.g., --format csv) that forces top-level writes to be
formatted accordingly. As you've seen in Jaql, I/O is specified through
descriptors (e.g., {type: 'local', location: 'foo.json', options: {adapter:
"...", format: "..."}}) (you can see the examples in
conf/storage-default.jql or src/test/com/ibm/jaql/storageQueries.txt). One
options is to have the argument to --format have a corresponding I/O
descriptor. For example, you may have the argument to --format (e.g.,
'csv'), be the key in the storage registry (e.g., storage-default.jql).
Instead of the FileStreamOutputAdapter?, I'd use a StreamOutputAdapter?
that you bind to System.out-- then all should work as if writing to any
stream. In summary, the things to do are:
1. add a option to jaql shell for --format
2. in JaqlShell?, if --format is provided, so long as we're in batch
mode, the format is valid, and the format derives from StreamOutputAdapter?,
setup a StreamOutputAdapter? that is bound to System.out,
initialized properly, etc.
3. Of course, to get this to work with CSV, you'll have to add a CSV
entry to storage-default.jql first (I'd play with this first in interactive
mode, then move
to the other tasks)
Next, do we only want to support options that have an entry in
storage-default.jql? What if I have a new format I want to use? Given
jaql's architecture, this shouldn't be a big deal. Just like when an IO
descriptor is given for read/write, one could conceive of passing in such a
descriptor to "--format". This will make argument parsing a bit trickier
(will need to parse json-- if its a string, then its a key, if its a
record, its a descriptor) but its doable. What do you think of this extra
generality?
Let me know if some of the above doesn't make sense.
Original issue reported on code.google.com by [email protected]
on 22 Sep 2009 at 2:37
http://groups.google.com/group/jaql-devel/browse_thread/thread/ef8388b0b4f1bd10
Original issue reported on code.google.com by [email protected]
on 21 Sep 2009 at 5:16
Redesign the jaql syntax around a unix pipe-like notation.
Original issue reported on code.google.com by [email protected]
on 16 Oct 2008 at 3:13
Attachments:
There should be an efficient way of invoking R from Jaql.
Original issue reported on code.google.com by [email protected]
on 17 Sep 2009 at 1:17
The delimited output format uses backslash escaping for embedded quotes
(\"), newlines (\n), and non-printable characters (\u...). (I.e., the same
as done for JSON strings.) However, the CSV description at
http://en.wikipedia.org/wiki/Comma-separated_values says that embedded
quotes should be doubled ("this has an embedded quote "" in it") and that
new lines are printed inside the string ("this has an embedded
newline in it"). Also non-printable characters are allowed in the file, I
believe.
We should move the the double quoting and verbatim non-printable characters
(although what about encoding; we are writing UTF-8 right now).
The unfortunate problem with embedded new lines is that the TextInputFormat
will not split the file correctly. On the other hand, other tools will not
import the back-quoted values correctly. Perhaps we should have a few ways
to handle newlines: embedded (standard), backquoted, or striped. This
could be a newline substitution string ("\n", "\\n", or "").
We should also support the reverse options while reading CSV files. If it
has embedded newlines, we can avoid splitting the file (multiple files
could still be mapped) or perhaps we can fix up the text splitter to not
split on a line with unbalanced quotes (is this sufficient to detect
embedded newlines?).
For JSON serialized fields, we should definitely avoid creating the
formatting newlines.
FYI IBM DB2 and MS Excel both supported embedded newlines and double quoting.
Original issue reported on code.google.com by [email protected]
on 23 Sep 2009 at 2:50
Currently, StreamAdapters include a "formatter" that knows how to convert
stream bytes to json. The formatter is currently a StreamToJson converter.
We should change this so that we re-use the serialization framework-- it
makes little sense to have to api's that basically handle the same task.
Original issue reported on code.google.com by [email protected]
on 23 Sep 2009 at 12:53
Add support for Hadoop 0.20.x
Original issue reported on code.google.com by [email protected]
on 18 Sep 2009 at 12:37
Extend JaqlShell with functionality to read its input from files or to
evaluate a command line argument.
Original issue reported on code.google.com by [email protected]
on 26 Mar 2009 at 10:11
Jaql treats the definition of global variables as simply a definition; it
does not evaluate the variable's value immediately. The variable
definition is included in each query evaluation, which means the variable
will be evaluated for each query evaluation. Moreover, the variable can be
inlined inside the query in such a way that causes it to be evaluated
multiple times in a single query.
The "materialize $var" statement can be used to force the evaluation of a
variable. This doesn't seem very clean, but provides a short-term
workaround. One problem with materialization is the value is not stored in
a map/reducible location (eg hdfs), so we do not get map/reduce over the
variable result. This is a general problem with all variables. Moreover,
it is unclear when we should materialize into a distributed location (for
large results this makes sense) vs store in memory (for small results).
Currently, the user has to handle this using an explicit write.
Another issue is that global variables are never redefined; instead a new
variable is created that hides the old one - old references are still to
the old variable. This makes variable definitions feel like they are
evaluated immediately even though the evaluation is lazy, but causes
unexpected results in the case of functions. Consider two examples:
$x = 1;
$y = $x + 1;
$x = 2;
$y; // produces 2, which seems right.
$f = fn() 1;
$g = fn() $f() + 1;
$f = fn() 2; // incorrectly think this redefines $f() and therefore affects $g
$g(); // produces 2, which seems wrong
We need some more thinking here.
Original issue reported on code.google.com by [email protected]
on 12 Mar 2009 at 8:43
Jaql is difficult to use when multiple users want to run jaql on the same
machine. The issue stems from poor default values (e.g., /tmp
directories). The solution is to make sure that usernames are included in
such default values so that there are no clashes when multiple jaql users
are on the same machine.
Original issue reported on code.google.com by [email protected]
on 26 Jan 2009 at 6:39
When reading using lines(), an empty line should be treated as a null.
Similarly, when writing, an null value should be written as an empty line.
Original issue reported on code.google.com by [email protected]
on 18 Sep 2009 at 10:57
The lines() function provides support for reading in lines of text. But if
the lines of text are single columns and the user wants to convert it to a
specific atomic type, then s/he has to pipe the output of lines() to
convert(). lines() function should be designed along similar lines as del()
and take additional argument that can be used to specify the schema in case
the input file has single columns.
Additionally, write(lines()) should write text files where each data item
is converted to a single line of text. At present, write(lines()) behaves
similar to write(hdfs()) and generates sequence files.
Original issue reported on code.google.com by [email protected]
on 16 Sep 2009 at 10:52
Remove support for 0.15.3, and 0.16.1
Original issue reported on code.google.com by [email protected]
on 26 Jan 2009 at 6:50
When a large objects are read from an input along with a bunch of other
fields (eg metadata about the object) and then the metadata is processed by
several expensive operations, it sometime make sense to defer the retrieval
of the large object and go back to the input to fetch the large object
after processing. We could do few things here:
1. support large object handles that are fetched later.
2. a cost-based rewrite that considers early vs late fetching of the large
object.
At a minimum, we should do what is necessary to allow a user to do this
manually without a lot of effort.
Original issue reported on code.google.com by [email protected]
on 20 Mar 2009 at 9:29
expand unroll copies the whole array that should be unrolled, instead of
just one element per result.
To reproduce: run the following statement in the jaql shell:
[ [1, [9,8,7] ] ]-> expand unroll $[1];
Expected result:
[ [1, 9], [1, 8], [1, 7] ]
Actual result:
[ [1, [9, 8, 7]], [1, [9, 8, 7]], [1, [9, 8, 7]] ]
Original issue reported on code.google.com by [email protected]
on 31 Jul 2009 at 10:44
This includes support for UDAs written in Jaql itself as well as UDAs
written in Java.
Original issue reported on code.google.com by [email protected]
on 11 Sep 2009 at 7:46
There are several issues with path expressions involving field selections
from a record. Example 1 and 2 below show the current way of infering field
names and also how it should be. Example 3 should be disallowed; field
exclusion should only allow one level.
# QUERY CURRENT RESULT SHOULD BE
1 { a: {b: 1, c: 2} }{.a.b} { a: 1 } { b: 1 }
2 { a: {b: 1, c:2} }{.a.b, .a.c} throws error { b:1, c:2 }
3 { a: {b: 1, c:2} }{*-.a.b} {} throw arrow
Original issue reported on code.google.com by [email protected]
on 14 Jul 2009 at 2:17
What steps will reproduce the problem?
1. Load a large data set
2. Try to run e.g. a group into { num: count($) } statement
3. It will fail with
What is the expected output? What do you see instead?
expected: It should properly process the query
result:
java.lang.NullPointerException
com.ibm.jaql.util.BaseUtil.writeVUInt(BaseUtil.java:297)
com.ibm.jaql.io.serialization.binary.def.DefaultBinaryFullSerializer.copy(Defaul
tBinaryFullSerializer.java:154)
com.ibm.jaql.io.serialization.binary.def.DefaultBinaryFullSerializer.copy(Defaul
tBinaryFullSerializer.java:1)
com.ibm.jaql.json.type.SpilledJsonArray.addCopySerialized(SpilledJsonArray.java:
443)
What version of the product are you using? On what operating system?
- Trunk revision 208, hadoop 0.18.3
Please provide any additional information below.
The problem is that the "addCopySerialized(DataInput input,
BinaryFullSerializer serializer)" method of SpilledJsonArray does not
always ensures that there is a spill file. If there are many records and
the values get written into the spill file it throws a null pointer
exception because there it is not created
Original issue reported on code.google.com by [email protected]
on 12 Jun 2009 at 6:47
The interfaces KeyValueImport and KeyValueExport do not allow their convert
() method to throw an exception. I think that it is crucial that these
methods can report conversion errors back to Jaql (w/o using runtime
exceptions), either using declared exceptions or a return value.
Original issue reported on code.google.com by [email protected]
on 24 Mar 2009 at 10:58
the reverse function fails with the following exception:
reverse([1,2,3]);
java.lang.NullPointerException: undefined variable: $
java.lang.NullPointerException: undefined variable: $
at com.ibm.jaql.lang.core.Var.getValue(Var.java:208)
at com.ibm.jaql.lang.expr.core.VarExpr.eval(VarExpr.java:102)
at com.ibm.jaql.lang.expr.core.IndexExpr.eval(IndexExpr.java:142)
at com.ibm.jaql.lang.expr.core.CmpSpec.eval(CmpSpec.java:104)
at com.ibm.jaql.lang.expr.core.CmpSingle.eval(CmpSingle.java:67)
at com.ibm.jaql.lang.expr.core.SortExpr.iter(SortExpr.java:113)
at com.ibm.jaql.lang.expr.core.BindingExpr.iter(BindingExpr.java:213)
at
com.ibm.jaql.lang.expr.core.TransformExpr.iter(TransformExpr.java:148)
at com.ibm.jaql.lang.Jaql.run(Jaql.java:405)
at com.ibm.jaql.lang.Jaql.run(Jaql.java:67)
at
com.ibm.jaql.util.shell.AbstractJaqlShell.runInteractively(AbstractJaqlShell.jav
a:48)
at
com.ibm.jaql.util.shell.AbstractJaqlShell.main(AbstractJaqlShell.java:84)
at JaqlShell.main(JaqlShell.java:272)
The explain output is the following:
explain reverse([1,2,3]);
Invalid query... Undefined variables:
$
$ = ( [1, 2, 3]
-> enumerate() )
-> sort using ((index($, 0)) desc)
-> transform each $ (index($, 1))
The working solution should look like this:
( [1, 2, 3]-> enumerate() )
-> sort using (fn($) cmp [(index($, 0)) desc])
-> transform each $ (index($, 1));
Original issue reported on code.google.com by [email protected]
on 8 Sep 2009 at 6:50
Jaql currently only supports hbase tables that use jaql's binary JSON
format for column values. We need to generalize the support in a way that
adds conversion from an hbase row (the column byte array values) into a
JSON view, like the converters for other InputFormats.
Original issue reported on code.google.com by [email protected]
on 17 Sep 2009 at 12:04
The introduction of schema made some all type names keywords. As a
consequence, the conversion functions do not work anymore. For example,
"double(1)" will throw an exception but should produce "1.0d".
Workarounds:
- rename conversion functions
- adapt parser accordingly
Original issue reported on code.google.com by [email protected]
on 22 Jun 2009 at 10:02
1. Push selections earlier in the query.
2. Add generic support to push selections down to the I/O layer (eg, HBase
filters)
Original issue reported on code.google.com by [email protected]
on 20 Mar 2009 at 9:25
localRead('/home/user/foo',
{format: 'com.acme.extensions.data.FromJSONTxtConverter'})
works for reading but
localWrite('/home/user/bar.json',
{format: 'com.acme.extensions.data.ToJSONTxtConverter'},$blah);
does not work for writing.
java.lang.Exception: formatter must implement ItemOutputStream
java.lang.Exception: formatter must implement ItemOutputStream
at com.ibm.jaql.io.stream.StreamOutputAdapter.initializeFrom
(StreamOutputAdapter.java:57)
at com.ibm.jaql.io.AbstractOutputAdapter.initializeFrom
(AbstractOutputAdapter.java:48)
at com.ibm.jaql.io.AdapterStore$OptionHandler.getAdapter
(AdapterStore.java:305)
at com.ibm.jaql.lang.expr.io.StWriteExpr.eval(StWriteExpr.java:129)
at com.ibm.jaql.lang.expr.top.QueryExpr.eval(QueryExpr.java:92)
at com.ibm.jaql.lang.Jaql.main1(Jaql.java:86)
at JaqlShell.runInteractively(JaqlShell.java:173)
at JaqlShell.main(JaqlShell.java:378)
Original issue reported on code.google.com by [email protected]
on 26 Jan 2009 at 6:42
Adds a file to store command history. As a result, when a user
launches Jaql Shell again, he can see the commands inputted in
the previous Jaql Shell session.
Original issue reported on code.google.com by [email protected]
on 23 Sep 2009 at 12:59
Add support for HBase 0.20.0 for Hadoop 0.20.x line
Original issue reported on code.google.com by [email protected]
on 19 Sep 2009 at 12:47
Open a jaqlshell in Cygwin. After inputting the following code in jaqlshell.
mapReduce(
{ input: {type: "hdfs", location: "sample.dat"},
output: {type: "hdfs", location: "results.dat"},
map: fn($v) ( $v -> transform [$.x, 1] ),
reduce: fn($x, $v) ( $v -> aggregate {x: $x, num: count($)} )
});
There is the following error.
line 9:44: unexpected token: {
line 9:44: unexpected token: {
at com.ibm.jaql.lang.parser.JaqlParser.aggregate(JaqlParser.java:997)
at com.ibm.jaql.lang.parser.JaqlParser.op(JaqlParser.java:844)
at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:641)
at com.ibm.jaql.lang.parser.JaqlParser.optAssign(JaqlParser.java:569)
at com.ibm.jaql.lang.parser.JaqlParser.block(JaqlParser.java:469)
at com.ibm.jaql.lang.parser.JaqlParser.parenExpr(JaqlParser.java:1815)
at com.ibm.jaql.lang.parser.JaqlParser.basic(JaqlParser.java:4317)
at com.ibm.jaql.lang.parser.JaqlParser.fnCall(JaqlParser.java:2516)
at com.ibm.jaql.lang.parser.JaqlParser.path(JaqlParser.java:3179)
at com.ibm.jaql.lang.parser.JaqlParser.typeExpr(JaqlParser.java:3828)
at com.ibm.jaql.lang.parser.JaqlParser.unaryAdd(JaqlParser.java:3762)
at com.ibm.jaql.lang.parser.JaqlParser.multExpr(JaqlParser.java:3655)
at com.ibm.jaql.lang.parser.JaqlParser.addExpr(JaqlParser.java:3632)
at
com.ibm.jaql.lang.parser.JaqlParser.instanceOfExpr(JaqlParser.java:3526)
at com.ibm.jaql.lang.parser.JaqlParser.compare(JaqlParser.java:3455)
at com.ibm.jaql.lang.parser.JaqlParser.inExpr(JaqlParser.java:3383)
at com.ibm.jaql.lang.parser.JaqlParser.kwTest(JaqlParser.java:3357)
at com.ibm.jaql.lang.parser.JaqlParser.notExpr(JaqlParser.java:3321)
at com.ibm.jaql.lang.parser.JaqlParser.andExpr(JaqlParser.java:3269)
at com.ibm.jaql.lang.parser.JaqlParser.orExpr(JaqlParser.java:3246)
at com.ibm.jaql.lang.parser.JaqlParser.expr(JaqlParser.java:727)
at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:635)
at com.ibm.jaql.lang.parser.JaqlParser.function(JaqlParser.java:884)
at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:658)
at com.ibm.jaql.lang.parser.JaqlParser.fieldValue(JaqlParser.java:3164)
at com.ibm.jaql.lang.parser.JaqlParser.field(JaqlParser.java:2974)
at com.ibm.jaql.lang.parser.JaqlParser.record(JaqlParser.java:2920)
at com.ibm.jaql.lang.parser.JaqlParser.basic(JaqlParser.java:4297)
at com.ibm.jaql.lang.parser.JaqlParser.fnCall(JaqlParser.java:2516)
at com.ibm.jaql.lang.parser.JaqlParser.path(JaqlParser.java:3179)
at com.ibm.jaql.lang.parser.JaqlParser.typeExpr(JaqlParser.java:3828)
at com.ibm.jaql.lang.parser.JaqlParser.unaryAdd(JaqlParser.java:3762)
at com.ibm.jaql.lang.parser.JaqlParser.multExpr(JaqlParser.java:3655)
at com.ibm.jaql.lang.parser.JaqlParser.addExpr(JaqlParser.java:3632)
at
com.ibm.jaql.lang.parser.JaqlParser.instanceOfExpr(JaqlParser.java:3526)
at com.ibm.jaql.lang.parser.JaqlParser.compare(JaqlParser.java:3455)
at com.ibm.jaql.lang.parser.JaqlParser.inExpr(JaqlParser.java:3383)
at com.ibm.jaql.lang.parser.JaqlParser.kwTest(JaqlParser.java:3357)
at com.ibm.jaql.lang.parser.JaqlParser.notExpr(JaqlParser.java:3321)
at com.ibm.jaql.lang.parser.JaqlParser.andExpr(JaqlParser.java:3269)
at com.ibm.jaql.lang.parser.JaqlParser.orExpr(JaqlParser.java:3246)
at com.ibm.jaql.lang.parser.JaqlParser.expr(JaqlParser.java:727)
at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:635)
at com.ibm.jaql.lang.parser.JaqlParser.exprList(JaqlParser.java:4564)
at com.ibm.jaql.lang.parser.JaqlParser.fnArgs(JaqlParser.java:2405)
at
com.ibm.jaql.lang.parser.JaqlParser.builtinCall(JaqlParser.java:4334)
at com.ibm.jaql.lang.parser.JaqlParser.fnCall(JaqlParser.java:2521)
at com.ibm.jaql.lang.parser.JaqlParser.path(JaqlParser.java:3179)
at com.ibm.jaql.lang.parser.JaqlParser.typeExpr(JaqlParser.java:3828)
at com.ibm.jaql.lang.parser.JaqlParser.unaryAdd(JaqlParser.java:3762)
at com.ibm.jaql.lang.parser.JaqlParser.multExpr(JaqlParser.java:3655)
at com.ibm.jaql.lang.parser.JaqlParser.addExpr(JaqlParser.java:3632)
at
com.ibm.jaql.lang.parser.JaqlParser.instanceOfExpr(JaqlParser.java:3526)
at com.ibm.jaql.lang.parser.JaqlParser.compare(JaqlParser.java:3455)
at com.ibm.jaql.lang.parser.JaqlParser.inExpr(JaqlParser.java:3383)
at com.ibm.jaql.lang.parser.JaqlParser.kwTest(JaqlParser.java:3357)
at com.ibm.jaql.lang.parser.JaqlParser.notExpr(JaqlParser.java:3321)
at com.ibm.jaql.lang.parser.JaqlParser.andExpr(JaqlParser.java:3269)
at com.ibm.jaql.lang.parser.JaqlParser.orExpr(JaqlParser.java:3246)
at com.ibm.jaql.lang.parser.JaqlParser.expr(JaqlParser.java:727)
at com.ibm.jaql.lang.parser.JaqlParser.pipe(JaqlParser.java:635)
at com.ibm.jaql.lang.parser.JaqlParser.topAssign(JaqlParser.java:420)
at com.ibm.jaql.lang.parser.JaqlParser.stmt(JaqlParser.java:289)
at com.ibm.jaql.lang.parser.JaqlParser.stmtOrFn(JaqlParser.java:211)
at com.ibm.jaql.lang.parser.JaqlParser.parse(JaqlParser.java:154)
at com.ibm.jaql.lang.Jaql.prepareNext(Jaql.java:288)
at com.ibm.jaql.lang.Jaql.run(Jaql.java:399)
at com.ibm.jaql.lang.Jaql.run(Jaql.java:67)
at
com.ibm.jaql.util.shell.AbstractJaqlShell.runInteractively(AbstractJaqlShell.jav
a:48)
at
com.ibm.jaql.util.shell.AbstractJaqlShell.main(AbstractJaqlShell.java:80)
at JaqlShell.main(JaqlShell.java:272)
Original issue reported on code.google.com by [email protected]
on 26 Aug 2009 at 2:49
We currently have a large number of strict keywords, that is, keywords that
cannot be used as identifiers. We should make most (all but the keywords
that have to be strict) of them soft.
Original issue reported on code.google.com by [email protected]
on 24 Sep 2009 at 3:48
jaqlshell does not support multiple-line jaql commands now. Every single
line of a multiple-line jaql command is treated as a history entry.
Original issue reported on code.google.com by [email protected]
on 4 Sep 2009 at 7:55
Provide each Expr with a chance to do one-time initialization before
evaluation. We should also probably include a close call after all
evaluation. A call after compilation is over, but before initialization
might be useful for any post-compilation needs.
Original issue reported on code.google.com by [email protected]
on 12 Mar 2009 at 8:20
Long running functions can timeout during map/reduce when one record is
processed for a long time. We should provide a way that any Expr can
report that progress is being made. This probably involves placing the
hadoop reporter inside the jaql runtime context, but we will probably want
to abstract away from hadoop a little bit so it works in serial code as well.
This feature should consider a larger feature of reporting on the status of
a query, even when multiple map/reduce jobs are running concurrently. It
should also include reporting of things besides progress, eg, number of
exceptions that were handled.
Original issue reported on code.google.com by [email protected]
on 12 Mar 2009 at 8:26
Add hadoop 0.19 support
Original issue reported on code.google.com by [email protected]
on 26 Jan 2009 at 6:49
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.