jakartaee / batch Goto Github PK
View Code? Open in Web Editor NEWThe Jakarta Batch project produces the Batch Specification and API.
Home Page: https://projects.eclipse.org/projects/ee4j.batch
License: Apache License 2.0
The Jakarta Batch project produces the Batch Specification and API.
Home Page: https://projects.eclipse.org/projects/ee4j.batch
License: Apache License 2.0
Originally opened as bug 6288 by ScottKurz
It is a noticeable asymmetry between partitioned and non-partition steps that there is no equivalent to beforeStep/afterStep on the partition thread.
True, the ChunkListener runs on the partition thread, but you don't have a clear way without inventing one to know it's the last chunk. True, the PartitionAnalyzer runs at the end of the partition, but that's back on the main thread.
Would be curious to hear other opinions on whether this would be desirable for a future release.
I agree with your comments. The closest fit, as you say, is the ChunkListener. It requires some extra coding to know that it is the final chunk and overall doesn't feel as clean as a PartitionListener (or extended StepListener) interface would.
That leaves partitioned batchlet steps without any before/after listener options at all. Of course it could be coded/called directly from each batchlet, but I think the listener approach is cleaner.
Let me follow-up and propose adding beforePartition/afterPartition to StepListener.
So beforeStep/afterStep will continue to be called only on the top-level thread, and the new beforePartition/afterPartition will only be called on the partition threads.
(Also relevant to keep in mind, the PartitionReducer#beginPartitionedStep/afterPartitionedStepCompletion methods allow for before/after calls as well, and only from the top-level. Both these and beforeStep/afterStep can continue to be useful in different cases from the top-level.)
Let's deal with any other new listener method proposals separately (e.g. https://java.net/bugzilla/show_bug.cgi?id=7374)
Bug 7374 is now #91
Originally opened as bug 6427 by ScottKurz
Since the PartitionedStepControllerImpl doesn't currently catch exceptions thrown by analyzeStatus() or analyzeCollectorData(), an exception thrown from either of these two can immediately fail the job, while allowing the partitions to continue executing.
This can be a problem if the job is restarted, since the RI will determine that the partitions cannot be restarted at that point (if they are still executing from the original execution).
I think a better behavior is to wait around for the partitions to complete, and then call the rollbackPartitionedStep() method.
(Yet another interesting question is whether to continue "dispatching" not-previously running partitions after an analyzer exception. I'm choosing to continue to do so for now in the patch I'll push shortly).
I know I need to supply a test at some point. This also might be worthy of a future TCK addition.
Let's consider the RI complete (even absent a new test).
The spec question still: what should happen to an executing partitioned step when an exception is thrown from the analyzer.
Possibly we consider execution of new partitions (i.e. whether to start a partition that hasn't been started yet because we've only run a subset of the total #) as well.
Originally opened as bug 5776 by cf126330
JobContext may contain contextual data a step artifact needs. Currently the only way for a batch artifact to obtain StepContext or JobContext is through injection. So if both stepContext and jobContext are needed, there will be 2 field injections. It will be nice to be able to retrive JobContext from StepContext on-demand.
@Inject StepContext stepContext;
public Object readItem() {
if(needToAccessJobContext)
JobContext jobContext = stepContext.getJobContext();
...
}
Organization Name ("Organization") and, if applicable, URL
IBM - OpenLiberty - https://openliberty.io/
Implementation Name, Version and download URL (if applicable)
The former reference implementation, informally known as "jbatch", Version 1.0.3, together with the Jakarta Batch API v1.0.2. For the "jbatch" implementation, we provide Maven download links for implementation and SPI jars. The project source repository is here. The Jakarta Batch API Maven coordinates are: jakarta.batch:jakarta.batch-api:1.0.2.
Product Name, Version and download URL (if applicable)
N/A - the implementation is tested outside of a "product".
Specification Name, Version and download URL
Jakarta Batch, Version 1.0.
TCK Version, digital SHA-256 fingerprint and download URL
Jakarta Batch TCK Version 1.0.2 - (will be) available for official download at URL: https://download.eclipse.org/jakartaee/batch/1.0/eclipse-batch-tck-1.0.2.zip.
Provisional results were obtained via running against staged zip here. TODO: We need to come back and adjust SHA256 for the final binary.
(Staged zip) SHA256: 26855ba168b814f65af1251cc853dc6455d2f9ecd2b215f06327ec090b32542a
Public URL of TCK Results Summary
https://github.com/WASdev/standards.jsr352.jbatch/wiki/jbatch-v1.0.3-tck-v.1.0.2
Any Additional Specification Certification Requirements
N/A
Java runtime used to run the implementation
Java 8
$ java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)
Summary of the information for the certification environment, operating system, cloud, ...\
$ uname -a
Linux skr 3.10.0-957.21.3.el7.x86_64 #1 SMP Fri Jun 14 02:54:29 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.6 (Maipo)"
By checking this box I acknowledge that the Organization I represent accepts the terms of the EFTL.
By checking this box I attest that all TCK requirements have been met, including any compatibility rules.
The getJobOperator() method currently logs a warning message when no job operator impl is found. Given that failure to find a proper job operator impl is a fatal error, throwing BatchRuntimeException
is probably better.
Logging a warning may not be clear to the user what has caused the program to fail. It also requries users to do a null check after calling this method.
Brought up discussion here to understand what the EE 9 direction is.
Maybe it would be better to include this change along with the package rename.
I'll add more detail if/when we decide this is a valid candidate.
Originally opened as bug 6509 by ScottKurz
The spec text, in describing "item-count" as
"Specifies the number of items to process per chunk"
is ambiguous.
(That's one example.. the "process" verb is used in a few places in the spec.)
Cheng helpfully raised this exact point in this thread:
https://java.net/projects/jbatch/lists/public/archive/2013-04/message/88
In the thread it was decided that "item-count" was actually the number of items READ, not PROCESSED (which could be interpreted as not including filtered items).
Anyway, I'd like to update the spec in the next revision here.
I just got confused myself on this issue, and recently delivered a change (@1679181) to the RI which is excluding filtered items from being counted against the item-count checkpoint. (I'll reference this bug to fix this recent "fix").
Note: The link referenced above is bad, but had it archived.
pg 20 defines item-count as Specifies the number of items to process per chunk when using the item checkpoint policy. Just to confirm, this is the number of items that were attemped, whether it was skipped, filtered, or processed successfully?So for item-count 10, if 9 of them were skipped, we will still commit the buffered 1 output as a checkpoint?Thanks,Cheng
Originally opened as bug 6490 by ScottKurz
As noted in the Bug 5675 discussion, it would be nice to perhaps get persistent user data for the individual partitions.
This would have to be added in a future spec update.
I think another important use case to mention is using the reducer to compensate for missed calls to the analyzer (since the analyzer calls are not guaranteed to complete, and one might need to reconcile this fact on restart against persistent user data).
Originally opened as bug 6511 by BrentDouglas
See discussion at https://java.net/projects/jbatch/lists/public/archive/2014-11/message/4
Note that the link above is bad, but we had it archived..pasting it here.
Hi,
I have been reading the spec a bit today and I have some questions relating
to section 8.2.1.4.
** Skipping/Retrying an exception*
I think the terminology used in this section could be improved. There are a
lot of phases that could be better worded to make their intent more clear
e.g.
These changes would make it more clear that the entire chunk is being
retried. As for 'skipping an exception' , I can't find a clear
interpretation in the spec of what this means but my understanding is that
it is something like 'ignore that the method invocation threw and continue
to the next think on chunk table 11.8'. No matter if this is correct or not
I think the spec would be easier to understand with an update to better
explain the scope of what is being skipped.
While I was a little unclear on 'skipping', I found the explanation for
'retrying' even less clear. The explanation of what happens when a
retryable-exception-class matches an exception is summed up in this one
sentence:
*When a retryable exception occurs, the default behavior is for the batch
runtime to rollback the current *chunk and re-process it with an
item-count of 1 and a checkpoint policy of item.
The first time I read this I saw it as, abandon the running chunk, run a
chunk with an item count 1 and if that works proceed as before the rollback
with the configured checkpoint policy. After looking at the RI, BatchEE and
JBeret, I saw that this is not a common interpretation. From what I can see
(I'm going to ignore BatchEE as it's implementation of retry looks
unimplemented/broken), both these implementations process the number of
item already read from the failed chunk. I don't really like this
interpretation, assuming we are 'reprocessing the current chunk', why is
the unprocessed portion excluded? The interpretation I have decided I like
the most is that the entire chunk where the failure occurred should be
reprocessed. This has the advantage of keeping the chunks aligned (as in if
you are running item-size=4 with a reader that produces 16 things and
failed in the second chunk you wont end up with chunk sizes 4,(1,1,1),4 4,1
but 4,(1,1,1,1),4,4). I would like to find out why these two only process a
portion which IMO is not in the description in the spec.
** Metrics when skipping from #checkpointInfo*
As exceptions thrown from the #checkpointInfo methods of ItemReader and
ItemWriter are able to be skipped/etc (8.2.1.4.1):
It also applies to exceptions thrown during
checkpoint commit processing. A failed commit will be treated the same as
a failed write.
I would like to know if the relevant READ_SKIP_COUNT or WRITE_SKIP_COUNT
metric should be incremented if the corresponding #checkpointInfo method
throws a skippable exception. (btw, the RI doesn't actually try
skipping/retrying these). My understanding of 10.2 is that they should be
e.g.
6. readSkipCount - the number of skippable exceptions thrown by the
ItemReader.
But they should not be passed to a listener (from the javadoc), e.g.:
even though 8.2.1.4.1 says:
A Skip Listener receives controlafter a skippable exception is thrown by
the reader, processor, or writer.
Is this interpretation correct?
Brent
From EE Platform meeting notes.
Decided:
Originally opened as bug 6581 by cf126330
Various script languages have extensive support for data procesing, which makes it easy to write batch components. It will be nice to support certain batch artifacts writting in scripting languages, and open up the Java batch framework to scripting developers as well.
For instance, a batchlet written as Groovy script to perform a short task. Or ItemReader, ItemProcessor, and ItemWriter written as JRuby script to process CSV resources.
JSR 223 Scripting for the JavaTM Platform (https://www.jcp.org/en/jsr/detail?id=223) may be leveraged to plugin in various scripting engine.
We may also need to adjust job xml schema to accommodate scripting-related configurations.
Is this really necessary? You can run Groovy batchlets the same way you would java batchlets, they are both going to produce regular class files. I just knocked up an example doing just that (https://github.com/BrentDouglas/groovy-batch-example) and it works as expected. Regarding scripts in other languages, it would be trivial to knock up a batchlet/other artifact that plugs a script loaded based on a property into a ScriptEngine as it it, it there really any need to change anything?
Regarding the job schema, section 10.6 is pretty clear that implementation specific loaders are allowed.
While it is certainly possible to achieve all these within application space with the current spec, it will be even better to avoid any intermediary if the spec can standardize it in future releases. This way any batch spec implementation can implement it in a portable way, without resorting to any implementation-specific loader.
Originally opened as bug 6458 by ScottKurz
When readItem() returns null to begin the end of the chunk loop, do we bother calling ItemReadListener.afterRead?
I think the afterRead() Javadoc sort of suggests we only call it if it's non-null:
The updated flow outline in Sec. 11.8 seems to suggest we call it with a null.
...
c. <repeat until checkpoint criteria reached OR readItem returns 'null'> {
i. <->[ItemReadListener.beforeRead] // thread A
ii. <->ItemReader.readItem // thread A
iii. ->[ItemReadListener.afterRead] // thread A
I can see both sides: It is nice to not have to check for null in afterRead(). OTOH, I suppose someone could come up with a case where it is nice to know the step is about to end (perhaps?)..and this is the precedent we've put out there in the RI.
I guess I'm ending up saying it's too late to change the behavior, and we should clarify the Javadoc. Any thoughts?
In JBeret, we always call ItemReader.afterRead(item), whether the item is null or not. I think a listener is primarily interested in an action being performed, more than the outcome of an action (the item returned from readItem). Since we already called beforeRead, it's good to be symmetrical by calling afterRead.
This doesn't seem like such a big deal picking this up again. Now the spec even seems clear enough to me. Nice that JBeret and the RI are consistent.
I suppose we could consider adding a TCK test at some point so I'll leave it open for now.
Originally opened as bug 5386 by ScottKurz
In reviewing the TCK, I suggested this might be overly restrictive and lifting this restriction worth consideration, and Michael agreed.
Originally opened as bug 5760 by ScottKurz
Added 'future' to whiteboard to consider idea of extending Closeable in a future release.
Not committing to address in future, just listing as candidates and pointing out they're excluded from current Maintenance Rel.
As I'm looking back on this now I'm realizing that I got too hung up on the idea that it was too late to start implementing Closeable (with idempotent close()). We could have still considered saying this was the required behavior even without the interface to enforce this further.
For what it's worth, I'm almost positive the RI may call close() twice in some cases, for example if the writer close() throws an exception on certain paths we might retry to close both reader and writer.
Noticing this in walking through my fix for Bug 6259.
Another thing to clarify: the diagrams in Section 11, showing reader and writer close() in a single transaction, could lead to some confusion. E.g. if the writer close() throws an exception, and the tran rolls back, what does this mean for the reader? Do we still call reader close()? In what transaction?
This should be clarified. Note that there are now several close()-related issues open (e.g. Bug 6456, Bug 7684) and it might make sense to tackle them together instead of via the current breakdown.
In https://jakarta.ee/xml/ns/jakartaee/jobXML_2_0.xsd we kept the version at "1.0".
This is OK in the sense that it's the first version in this targetNS.. but the other schemas https://jakarta.ee/xml/ns/jakartaee/ seem to have all updated their versions to match the spec version
Originally opened as bug 5926 by cf126330
There should be a portable configuration for apps to enable distributed execution for partitioned steps. For example,
If the above job is executed in a clustered application server environment, distributed='true' allows the batch container to allocate partitions to different cluster nodes for execution.
This is similar to the distributable element in web.xml (see http://www.oracle.com/webfolder/technetwork/jsc/xml/ns/javaee/web-app_2_5.xsd)
Why would we limit this type of functionality to only application server specific use cases? In Spring Batch we offer an extension point (PartitionHandler: http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/partition/PartitionHandler.html) that allows for the ability to distribute partition work in any way the developer sees fit.
Thanks for the linked resource. I agree batch clustering should not be tied to application server or Java EE environment.
If we were to add this, I could see a case for the default being true... more compatible w/ 1.0 where nothing special was required in some implementations.
Originally opened as bug 5432 by cf126330
StepContext has a getException method:
http://docs.oracle.com/javaee/7/api/javax/batch/runtime/context/StepContext.html
But StepExecution does not:
http://docs.oracle.com/javaee/7/api/javax/batch/runtime/StepExecution.html
Any exception that occurred during a step execution is vital data for a batch client, which can only get hold of StepExecution. StepContext is intended more for the code running inside batch container, not the client. So how about adding getException method to StepExecution interface?
A common use is, a batch job monitoring tool shows step execution data in a table:
id, name, batch status, exit status, exception, metrics
Originally opened as bug 5383 by cf126330
PostConstruct and PreDestroy callbacks enable applications to influence how batch container manages batch artifact lifecycles. Both classes are in already Java SE 6:
http://docs.oracle.com/javase/6/docs/api/javax/annotation/PostConstruct.html
http://docs.oracle.com/javase/6/docs/api/javax/annotation/PreDestroy.html
For example,
@PostConstruct
private void checkData() throws BatchRuntimeException {
//make sure all injected properties are in place and valid.
//do the necessary data conversion from string properties to other types
//or consolidate several properties into 1
//or other initialization work
}
@PreDestroy
void cleanUp() {
//clean up
}
Currently there is no portable way to perform init and cleanup work in a batch artifact. So applications have to do it at the beginning of business processing methods, and have to take care to only perform it once.
If the batch artifact is loaded via CDI then PostConstruct and PreDestroy will be called.
So it seems the concern would be to standardize this across non-CDI artifact loading.
Originally opened as bug 6234 by cf126330
When a batch artifact class (e.g., MyItemReader) contains @BatchProperty fields, the @BatchProperty annotation does not appear in the generated javadoc. Can we add @documented meta-annotation to BatchProperty definition, so any usage of @BatchProperty will appear in javadoc?
For example,
@Inject
@BatchProperty
protected String queueCapacity;
I want both @Inject and @BatchProperty appear in the javadoc, so readers can know this is a batch property injection fields.
BatchRuntime.getJobOperator() method currently always do its work inside a doPriv block, which is not necessary when no security manager is used.
I suggest we first check for the existence of security manager, and wrap it within a doPriv block only if a sec mgr is present.
Create/Update CONTRIBUTING files
Per input from the Eclipse EMO, each Specification Project needs to ensure that a CONTRIBUTING.md or CONTRIBUTING.adoc file exists in each specification-related repository maintained by Specification Projects.
In addition, the CONTRIBUTING.md or CONTRIBUTING.adoc file needs to include the following text:
## Eclipse Development Process
This Eclipse Foundation open project is governed by the Eclipse Foundation
Development Process and operates under the terms of the Eclipse IP Policy.
The Jakarta EE Specification Committee has adopted the Jakarta EE Specification
Process (JESP) in accordance with the Eclipse Foundation Specification Process
v1.2 (EFSP) to ensure that the specification process is complied with by all
Jakarta EE specification projects.
* https://eclipse.org/projects/dev_process
* https://www.eclipse.org/org/documents/Eclipse_IP_Policy.pdf
* https://jakarta.ee/about/jesp/
* https://www.eclipse.org/legal/efsp_non_assert.php
Please do this at your earliest convenience.
Thank you!
-- EE4J PMC ([email protected])
Bump the version to 2.0.0 2.0.
Add updated dates when appropriate.
Originally opened as bug 6512 by BrentDouglas
See discussion at https://java.net/projects/jbatch/lists/public/archive/2014-11/message/4
Note that the link above is bad, but we had it archived...pasting it here
Hi,
I have been reading the spec a bit today and I have some questions relating
to section 8.2.1.4.
** Skipping/Retrying an exception*
I think the terminology used in this section could be improved. There are a
lot of phases that could be better worded to make their intent more clear
e.g.
These changes would make it more clear that the entire chunk is being
retried. As for 'skipping an exception' , I can't find a clear
interpretation in the spec of what this means but my understanding is that
it is something like 'ignore that the method invocation threw and continue
to the next think on chunk table 11.8'. No matter if this is correct or not
I think the spec would be easier to understand with an update to better
explain the scope of what is being skipped.
While I was a little unclear on 'skipping', I found the explanation for
'retrying' even less clear. The explanation of what happens when a
retryable-exception-class matches an exception is summed up in this one
sentence:
*When a retryable exception occurs, the default behavior is for the batch
runtime to rollback the current *chunk and re-process it with an
item-count of 1 and a checkpoint policy of item.
The first time I read this I saw it as, abandon the running chunk, run a
chunk with an item count 1 and if that works proceed as before the rollback
with the configured checkpoint policy. After looking at the RI, BatchEE and
JBeret, I saw that this is not a common interpretation. From what I can see
(I'm going to ignore BatchEE as it's implementation of retry looks
unimplemented/broken), both these implementations process the number of
item already read from the failed chunk. I don't really like this
interpretation, assuming we are 'reprocessing the current chunk', why is
the unprocessed portion excluded? The interpretation I have decided I like
the most is that the entire chunk where the failure occurred should be
reprocessed. This has the advantage of keeping the chunks aligned (as in if
you are running item-size=4 with a reader that produces 16 things and
failed in the second chunk you wont end up with chunk sizes 4,(1,1,1),4 4,1
but 4,(1,1,1,1),4,4). I would like to find out why these two only process a
portion which IMO is not in the description in the spec.
** Metrics when skipping from #checkpointInfo*
As exceptions thrown from the #checkpointInfo methods of ItemReader and
ItemWriter are able to be skipped/etc (8.2.1.4.1):
It also applies to exceptions thrown during
checkpoint commit processing. A failed commit will be treated the same as
a failed write.
I would like to know if the relevant READ_SKIP_COUNT or WRITE_SKIP_COUNT
metric should be incremented if the corresponding #checkpointInfo method
throws a skippable exception. (btw, the RI doesn't actually try
skipping/retrying these). My understanding of 10.2 is that they should be
e.g.
6. readSkipCount - the number of skippable exceptions thrown by the
ItemReader.
But they should not be passed to a listener (from the javadoc), e.g.:
even though 8.2.1.4.1 says:
A Skip Listener receives controlafter a skippable exception is thrown by
the reader, processor, or writer.
Is this interpretation correct?
Brent
Originally opened as bug 6457 by ScottKurz
At the end of Sec. 11.9 in "Rollback Procedure"
...
6. <->ItemWriter.open // thread A, pass last committed checkpoint info
7. <->ItemReader.open // thread A, pass last committed checkpoint info
...
it clearly says to call open with the last committed checkpoint (strikes me as odd that it would say to call the writer open first.
The RI doesn't do this.
Since the TCK is weak in this area let's also look over carefully the RI code with the text in "8.2.1.4.4" in mind:
When a retryable exception occurs, the default behavior is for the batch runtime to rollback the current chunk and re-process it with an item-count of 1 and a checkpoint policy of item. If the optional ChunkListener is configured on the step, the onError method is called before rollback.
Sorry... the problem as I'd originally written it up is not a problem.
I got confused unwrapping a pile of exceptions.. my mistake.
The RI does indeed call open with the last checkpoint on retry-with-rollback.
One thing we're doing wrong though: we should only be processing the last chunk of data, in one at a time in one-item chunks. Instead we go and process the rest of the step in the one-item chunk mode.
It also looks like, from code inspection, that we fail to implement this behavior in the case of a custom checkpoint algorithm. I read 8.2.1.4.4 to imply that we should use the single-item chunk even if there had been a custom checkpoint algorithm. (This implies that we count the previous "working chunk size".. I don't think anyone would suggest using another method to keep track of when we've reprocessed the previous chunk.)
So I renamed the bug accordingly to capture this problem.
I just checked what we did in JBeret chunk processing: during a retry, it doesn't use or modify itemCount value, since the retry is by single item. After the retry, it's back to normal phase and the original itemCount or checkpoint policy resumes.
So, getting back to this thread on retry-with-rollback.
A tricky point is how do we know when we're done processing the "original chunk"? I.e. when should we stop processing the data in single-item chunks (one-at-a-time), and resume using the normal chunk boundary.
Here's how I'd naturally propose to solve this problem, (given that the RI has so far sidestepped this):
If you are using item-count to checkpoint, then, on retry-with-rollback, we read-process-write (item-count)# of items one-at-a-time before reverting to normal (item-count)#-sized chunks.
How does that sound? What are the other implementations doing?
Cheng, I don't think your response went into that level of detail, or if you meant it to I missed it. (Thanks).
If this sounds good I could take a stab at writing this up more formally.
org.jberet.runtime.runner.ChunkRunner has a couple of internal fields to track progress:
readPosition: Current position in the reader input data
checkpointPosition: the last checkpoint position in the reader intput data
failurePoint: Where the failure occurred that caused the current retry. The retry should stop after the item at failurePoint has been retried.
These 3 fields are synchronized with each other at various points. During a retry, readPosition rewinds to the last checkpointPosition and increments from there until it reaches the last failurePoint. After that point, all items are back to normal treatment. So whether we use item-count or custom checkpointAlgorithm, we only retry failed items. I think that's the basic retry flow in JBeret.
In your proposal, if we read-process-write (item-count)# of items one-at-a-time, that may result in many single-item checkpointing, especially if it fails early during a chunk (e.g., failed at 3rd item when item-count is 100).
Thanks for your explanation Cheng.
Agree your implementation is better. I just committed something similar in:
WASdev/standards.jsr352.jbatch@1679181
(this fixed a few other bugs too).
Doing so made me give up on the thought that we should amend the spec any here.
Let's call it a day and let this stand as an RI bug.
There are still a couple small ambiguities... e.g. what happens if one of my
items gets filtered (by the processor).. do I read another or immediately commit the tran (the MR update clarifies no writeItems is called if all items have been filtered)? I'd entertained the spec update idea thinking a change was relatively unlikely to break any app behavior.. but I don't want to get into a whole set of things..
So a retry-with-rollback exception on read and process only retries up until the last-read item 1-by-1 before reverting to normal chunk processing, though a retry-with-rollback exception on write retries the whole chunk.
Consider if something like Roberto's suggestion would be helpful:
"After the faulty chunk is retried, the
implementation should/must return to the configured checkpoint policy. "
See:
https://java.net/projects/jbatch/lists/public/archive/2014-12/message/13
Assigning to SPEC.
Note: The link above is bad, but we had it archived. Here it is:
Hi everyone,
I've been working and testing the behaviours of Batch implementations because
of https://github.com/javaee-samples/javaee7-samples/issues/281 (related with
skip and retry exceptions).
We're observing different behaviours with Glassfish (JBatch) and Wildfly
(JBeret). I believe that we need a few Spec clarifications in order to fix
the issues.
The Spec states:Â
 8.2.1.4.4 Default Retry Behavior - Rollback When a retryable exception
occurs, the default behavior is for the batch runtime to rollback the
currentchunk and re-process it with an item-count of 1 and a checkpoint
policy of item.Â
It's not very clear what should happen with subsequent chunks. Current GF 4.1
just process the rest of the step in a 1 by 1 element in case of a retry.
Jberet just process the current chunk 1 by 1. I did find an issue about this
in JBatch:Â https://java.net/bugzilla/show_bug.cgi?id=6457. JBatch will follow ;
the behaviour of JBeret, but probably the Spec text could use an improvement
/ clarification on this. Something like:
When a retryable exception occurs, the default behavior is for the batch
runtime to rollback the current chunk and re-process it with an item-count of
1 and a checkpoint policy of item. After the faulty chunk is retried, the
implementation should/must return to the configured checkpoint policy.Â
About Skip:Â
Skipping on Reader and Processor it's not a problem since they handle one
item at a time, but Writer handle a list of items. Should all the items be
skipped? Or somehow try to detect the faulty element and skip only that one?
Example:
Read Item 1, 2, 3Process Item 1, 2, 3Write Item 1, 2, 3 (faulty element)
You already handed items 1, 2 and 3 to the Writer. To know which element is
failing you need to retry with an item count of one. This works fine with
Retryable Exceptions, but what if you only have Skippable Exceptions
configured? I couldn't find in the Spec what should be the behaviour about
this with only Skippable Exceptions and Writers. I guess that everything in
the Writer should be skipped and if you want the proper behaviour, just
configure the same Exception for Retry and Skip. Maybe we should include a
clarification about this as well in the SPEC?
Sorry for the long email.
Cheers,Roberto
Fix adoc links like:
https://java.net/bugzilla/show_bug.cgiid=5746
There are many aserisks in code snippets in various spec adocs. For instance,
*@return* batch status string
*@return* exit status string
*@see* jakarta.batch.runtime.metric.Metric for definition of standard
public *enum* MetricType
public *long* getValue();
public *static* JobOperator getJobOperator()
The above are all copied from the generated PDF spec file.
Originally opened as bug 6356 by fantarama
In section 11.7 actions 6-18 define a transaction that live for the entire duration of all partitions, but in long running scenario this expose the runtime to a transaction timeout exception.
My test case (with jberet implementation) is a partitioned chunk step with 8 partitions. All partitions end in about 30 minutes, transaction timeout occurs after 5 minutes so the runtime throw exception when try to commit (action 18) because the transaction isn't alive anymore.
May the transaction begin/commit need to be executed for each analyzeCollectorData?
I don't think your proposal really sits well with the way collectors work.
The spec allows a couple of ways to deal with your issue. The simplest and most portable way is to increase the transaction timeout for your step:
... ...This will increase the timeout for each chunk though.
A less portable way is to ask the JBeret guys to add a property to set the transaction timeout for this transaction individually so you would end up with something like:
... ...The workaround is clear, but I report this as a bug because I would like to understand why the spec define this approach.
Transactions are suited for short time atomic operation. The chunk oriented step is a powerful way to handle this aspect, and I agree that is a developer responsibility to handle the transaction timeout, but only for the sigle chunk, not the full step process. In the chunk step I can manage the item count to be sure that transaction will never be greater than a fixed time.
My step work on unfixed size set of data, can goes from 1000 items to over a milion of items. I can't predict the entire step duration, and I don't want to do: chunks support checkpoints, but what appened if the analyzer throw exception? It rollback 1h of work?
I think that is wrong to have a transaction that live for the entire step lifecycle, because developer could not handle this pattern.
I don't really see setting the transaction timeout to 30m when you need to run a 30m transaction as a workaround. A more correct way to deal with it that I wrote before would be something like:
@nAmed("fiveMinutesAndTenUnits")
final class FiveMinutesAndTenUnitsCheckpointAlgorithm implements CheckpointAlgorithm {
private static final int UNITS = 10; //What you would have set the 'units' attribute to
int current;
@Override
public int checkpointTimeout() throws Exception {
return TimeUnit.SECONDS.toMillis(5);
}
@Override
public void beginCheckpoint() throws Exception {
current = 0;
}
@Override
public boolean isReadyToCheckpoint() throws Exception {
if (current > UNITS) {
throw new IllegalStateException();
}
return target == ++current;
}
@Override
public void endCheckpoint() throws Exception {
//
}
}
I just noticed a post from Cheng on the forum regarding accessing a DB in an analyzer or collector which I am assuming is the underlying issue you are talking about here. The spec provides a straightforward way to persist data across restarts in the StepContext#setPersistentUserData(Serializable) method. This gets updated at the end of every step regardless of it's outcome (section 11.7 #21). It will then be loaded when you retry the failed step to complete the unfinished partitions.
If you leave the transaction timeout at 5m for a half hour step you will obviously have to ensure the step's 'start-limit' attribute either not set (unlimited) or set to at least 6.
I accidentally posted that before I was finished and can't work out how to edit the comment. The #checkpointTimeout() method should just return 5 as it is in seconds not millis and the checkpoint-algorithm should be in the unterminated chunk element. It should also say 'than I wrote before' on the second line rather than 'that I wrote before'.
I understand your solution, but maybe I wansn't able to explain the issue.
The chunk work inside a transaction, and I agree that I had to size the timeout correctly, but the chunk is not the step, is a part of them. The spec define begin/commit transaction only for the chunk not the step. As I wrote I can't predict how long will take the entire step, I can only define the business logic inside the chunk to take the smaller time possible and in my case the single chunk take only few seconds to terminate.
A non partitioned step don't has this issue because there aren't any thread that begin/commit transaction for the entire step duration.
But the partitioned one does! The thread that wait for all partitions ends and that call the partition analyzer methods work iside a single transaction.
The question is the same: why? Why I need a transaction that no one can predict the duration? I have millions of items to process, my single chunk never take more than a minute, but to process all the data set is a very long process which duration depends on too many variables.
If a single chunk take 70s instead of 30s this will in fact change a lot the total time.
Also if in the analyzer a simply write to a log (no db access) I'm exposed to the transaction timeout. I really just want to understand the reason of this design, why regular step work well and a partitioned one doesn't? I need partitions for time saving purpose, but in fact I can't use it because I had to know how long will take the entire step?
fantarama, thanks for raising this issue.
I agree there's definitely a spec clarification needed (if not a change/fix).
The RI seems wrong here as well.
My first reaction is to agree with you, and to argue the spec intention was that a new tran is started around each call to the analyzer. I.e. each analyzer call to either analyzeCollectorData or analyzeStatus is done within a global tran, and if the tran rolls back then PartitionReducer.rollbackPartitionedStep is called. (We'd call this a bug in laying out the outline flow chart then.)
I think the behavior becomes coherent with that approach.. but need to think it through.
For what it's worth, I don't see that the RI uses a global tran at any point when calling the analyzer, which is clearly wrong.
The timeout becomes a more pressing concern for partitioned chunks, but for the sake of symmetry I think the same comment would apply to partitioned batchlets, don't you? That would leave us knowing that the analyzer methods always run in a global tran.
Also we should note this is a potentially disruptive change. It could break any analyzer already using the RI, say, depending on the lack of a global tran.
Picking up this thread again... we have a case where the spec defines a behavior that I agree is undesirable, the RI implements a different behavior, and the TCK is silent. And switching between a global tran or not can break apps.
Since there's not an easy way out, I'd like to back up to square one.
I don't see why a transaction would be so commonly needed around the analyzer calls that the spec should insist that one is present. The analyzer itself could start one if it really needed one.
Now, it sounds like JBeret is already providing a global tran surrounding the analyzer calls.
Note that with the RI not starting global trans in the analyzer at all currently, we already have a case today where portability across Glassfish & JBeret is potentially an issue.
So let me pose the question to Cheng. What would you think about changing the behavior of JBeret here to not surround the analyzer calls with a global tran? Do you agree this would be preferable?
Yet another alternative to consider would be a single global tran around each analyzer call (this is roughly doubling the total # of trans, since there's basically one for each chunk tran). This would be less disruptive to any app running in JBeret, though still could break certain apps. Still, I wonder if the overhead of the tran is worth it enough to start it all the time.
I'm trying to especially incorporate the JBeret perspective here since I don't want to punish them for actually implementing what the spec said all along. Of course this is a public list, so don't take that to exclude other people commenting.
At some point we'd need to update the spec too, but I thought we could continue the conversation before then here.
Thanks,
Scott
On second thought, my thoughts were incomplete, as I failed to consider
PartitionReducer.rollbackPartitionedStep.
Need to think a bit more here..
Between the 2 options, I feel no transaction around analyzers is cleaner. I personally think this is the right move despite the risk of incompatibility. I will also check with WildFly and JBeret users for any concerns and objections.
I also think there should be no transactions around analyzer calls.
Once there is no single transaction around analyzer calls any more I do not see any reason for retaining transactions around calls to PartitionReducer.rollbackPartitionedStep or PartitionReducer.beforePartitionedStepCompletion.
Some java source files contain comments like:
// TODO Auto-generated constructor stub
They should be removed from java source files.
./api/src/main/java/jakarta/batch/operations/NoSuchJobInstanceException.java
./api/src/main/java/jakarta/batch/operations/BatchRuntimeException.java
./api/src/main/java/jakarta/batch/operations/NoSuchJobExecutionException.java
./api/src/main/java/jakarta/batch/operations/JobExecutionNotRunningException.java
./api/src/main/java/jakarta/batch/operations/JobExecutionAlreadyCompleteException.java
./api/src/main/java/jakarta/batch/operations/JobExecutionNotMostRecentException.java
./api/src/main/java/jakarta/batch/operations/NoSuchJobException.java
./api/src/main/java/jakarta/batch/operations/JobExecutionIsRunningException.java
Originally opened as bug 4834 by ScottKurz
--------------Original Comment History----------------------------
If I do this for a running job:
JobExecution exec = jo.getJobExecution(execId);
Do I see, in this 'exec' object, the updates as the job executes (e.g. BatchStatus getting set)?
Or am I only getting a snapshot, and might I need to get a new one to see the ongoing execution?
I.e. does the container have a reference (PBR) back to this JobExecution or is it taking a snapshot of the "value" (PBV)?
Since the spec has no remotable use cases... you might think it should maintain a reference.
I don't think the TCK requires PBR behavior.. I think we ask for a new object each time we want to check the status whether we need to or not.
Since there are no setters on JobExecution, JobInstance, it seems PBR would have no undesirable side effects...
I think it should be spec'd and not left to the implementation, since it is a very basic aspect of the JobOperator programming model.
The JobExecution should be viewed as a value object with no guarantees of it's ties to a running job. jo.getJobExecution(execId) may return a previously run JobExecution as an example. Also, with regards to the "no remoteable use cases", while the spec does not address remote executions, it does not preclude them either (there is nothing to prevent a user from developing a RemoteJobOperator implementation of the JobOperator interface for example).
I don't think we can say within the spec that the JobExecution returned by the JobOperator is the same instance being acted upon by an executing job.
(In reply to comment #1
)
Michael,
I agree with your angle on this one, which then leads me to the question of should the spec say that is is by value, e.g. that
JobExecution exec = jo.getJobExecution(execId);
JobExecution exec2 = jo.getJobExecution(execId);
return different instances?
Since there's no setters maybe it doesn't matter.
Te RI uses "by-reference" for a currently-executed job and "by-value" for a terminated job.
The TCK assumes "by-value", e.g. in the polling routine we get a new JobExecution each time we poll rather than rechecking status on the existing. But it doesn't require "by-value" either (obviously since the RI doesn't always use it).
I guess it could be left as an implementation detail. The only code I can conjure up where it matters woul be:
// JobOperator "monitor"
while (..running..) {
JobExecution exec = jo.getJobExecution(execId);
logMonitorEntry(exec)
}
// now go process logged executions
If I have PBR then instead of a list of "snapshots" I'm going to get a bunch of copies of the execution in its final state.
Specifying PBV "breaks" the RI though...however, we could adjust still I believe. Maybe it would be a clearer API to just specify "by value".
Leaving it open that a spec clarification might be desirable.
I'm not sure if this is settled but can I put in a vote for the spec and TCK to be changed to allow JobExecution, StepExecution && JobInstance to be value classes.
The way I see it these values are tied to the the implementation of the job repository and requiring these classes to provide an up to date view of the repository will add unnecessary performance penalties to some implementations.
This hasn't been settled, so thanks Brent too for your comments.
I had another thought in this area: that the spec might say that StepExecution#getPersistentUserData() is returned by value.
The way the RI is currently coded, it's theoretically possible for an operator to introduce side effects to the persistent user data that then affect say a Decider operating on the StepExecution.
We might also consider waiting to see if it turns out to be a real-world issue, since it might not be too likely someone would do that.
(In reply to ScottKurz from comment #5
)
This hasn't been settled, so thanks Brent too for your comments.
I had another thought in this area: that the spec might say that
StepExecution#getPersistentUserData() is returned by value.The way the RI is currently coded, it's theoretically possible for an
operator to introduce side effects to the persistent user data that then
affect say a Decider operating on the StepExecution.We might also consider waiting to see if it turns out to be a real-world
issue, since it might not be too likely someone would do that.
By exposing only java.io.Serializable to the client, the expectation is the client should not modify it.
I think it's more likely application itself (which knows the actual type of the user data) will use getPersistentUserData() to pass data between differnet parts, and the apps may expect the returned object is by reference. StepContext.getPersistentUserData() and StepExecution.getPersistentUserData should be consistent in this regard.
(In reply to cf126330 from comment #6
)
StepContext.getPersistentUserData() and
StepExecution.getPersistentUserData should be consistent in this regard.
Cheng, I agree with everything up until this last sentence, but wasn't sure where you were trying to end up.
You seemed to think that StepContext returns the persistent data by-reference and StepExecution returns it by-value, but then what did you mean in saying they should be "consistent"?
(In reply to ScottKurz from comment #7
)
(In reply to cf126330 from comment
#6
)StepContext.getPersistentUserData() and
StepExecution.getPersistentUserData should be consistent in this regard.Cheng, I agree with everything up until this last sentence, but wasn't sure
where you were trying to end up.You seemed to think that StepContext returns the persistent data
by-reference and StepExecution returns it by-value, but then what did you
mean in saying they should be "consistent"?
Application can access persistent user data via StepContext, and a job client can access persistent user data via StepExecution. So whether we settle on by-reference or by-value, the same rule should apply to both StepContext.getPersistentUserData() and StepExecution.getPersistentUserData(). I think by-reference is a better fit. If an application is really concerned with the risk of persistent user data being modified by a job client, it can choose immutable or unmodifiable types for persistent data.
I strongly think this should be by value (per my previous statements). One thing to keep in mind is that the StepExecution and JobExecution may not retrieved within the same JVM as the job actually executing. I could have a Job running in JVM 1 and use the JobOperator to retrieve the current state of the job from the job repository in JVM 2. How would this work if it was pass by reference and the expectation was that modifications to the data by the client in JVM 2 was to impact JVM 1? Allowing the client to modify the StepExecution/JobExecution in a way that the runtime is expected to react is a dangerous path IMHO.
Not committing to address in future, just listing as candidates and pointing out they're excluded from current Maintenance Rel.
Need to do:
targetNamespace="http://xmlns.jcp.org/xml/ns/javaee"
).Brought up discussion here.
E.g.:
Java EE environments. It requires Java 6 or higher.
Originally opened as bug 5416 by ScottKurz
The JSR 352 specification was developed with the intent to leave open the question of what technology would be used for:
a) dynamic injection - of batch properties and job/step context
b) artifact loading - mapping @ref values to classes and object instances
In working on the EE TCK (in CTS), we realized the following dilemna: the presence of beans.xml in the application archive forces the batch runtime to be able to satisfy a) via CDI.
How should the spec be updated to reflect this de facto requirement for CDI support?
Our position from Spring Batch was to ensure that CDI was not the assumption but I was never really happy with how far we swung in saying no assumption about DI. I had proposed a similar approach to the JSR-107, but cannot recall the details right now. I'd have to reinvestigate. I'm not for implying CDI as the DI.
Wayne,
Again, the core problem is this scenario: in an EE-environment, if you have a beans.xml in your app archive, then you must be able to satisfy the @Inject for properties and contexts via CDI. (Not sure if SE has an analogous case with beans.xml or not). CDI doesn't care that you wanted to use some other DI technology at that point in your batch app.. you blow up since CDI can't satisfy the injection.
Yes, another DI could put its own extra burden on the batch implementation, like CDI does, and as you know we have worked to keep the 352 language DI technology-neutral.
But CDI is part of the EE platform... so could a batch implementation really choose to blow up completely just because the batch artifacts are part of a bigger application using CDI? That's not very friendly from the EE platform viewpoint.
That's where I was headed in wondering if we could "require CDI support for injection when beans.xml is present" (note we wouldn't necessarily have to require its use for artifact loading, i.e. mapping @ref values to artifact instances). I'd probably need some help phrasing it correctly from a CDI expert. Another thing I wondered is if this is really for the EE 7 platform spec to clarify..not for our individual spec (352).
In any case, I'd be curious to hear if you'd developed any new angles (e.g. in JSR 107) that might be helpful in resolving this.
It seems to me that we've gone down a path that has backfired on us. The use of @Inject was intended to be a portable way to allow for properties/contexts to be injected.
What I think Scott is saying is that it, is in fact, not portable and ends up tying us to CDI. I was already planning on suggesting the removal of the @Inject annotation from properties (it's redundant to the @BatchProperty annotation). The only other place the @Inject is used (off the top of my head) is the contexts. I'd advocate for an update that addresses the injection of contexts without @Inject than require CDI support in any scenario.
As I've noted in other issues, the expert group was clear on this. The spec should not require DI but be open to it's use. To not only require DI, but a specific implementation of it goes directly against this.
I'm currently using the RI in one of my projects.
My problem is that in the Java EE environment where CDI is present I can use @nAmed for my artifacts and in the JSL file I use that name as the reference.
I also could use CDI interceptors, decorators etc.
But when my batch runs in Java SE this wouldn't work.
Therefor my batch will not be portable anymore.
CDI is based on JSR-330 javax.inject and Google Guice and Spring Framework implement JSR-330.
Could it be an option to allow everything that is in javax.inject?
(In reply to mminella from comment #3)
It seems to me that we've gone down a path that has backfired on us. The
use of @Inject was intended to be a portable way to allow for
properties/contexts to be injected.What I think Scott is saying is that it, is in fact, not portable and ends
up tying us to CDI. I was already planning on suggesting the removal of the
@Inject annotation from properties (it's redundant to the @BatchProperty
annotation). The only other place the @Inject is used (off the top of my
head) is the contexts. I'd advocate for an update that addresses the
injection of contexts without @Inject than require CDI support in any
scenario.As I've noted in other issues, the expert group was clear on this. The spec
should not require DI but be open to it's use. To not only require DI, but
a specific implementation of it goes directly against this.
+1 on the removal of @Inject annotation from properties already marked w/ @BatchProperty.
(In reply to mminella from comment #3)
We've collectively spent a lot of time and effort on this nuanced approach to DI, and I don't see why we now have to throw up our hands.
We had originally failed to recognize, however, that there is such thing as a CDI app, i.e. an EE app archive (someone feel free to state more precisely) that includes beans.xml, that will trigger CDI-managed injection.
(This case could exist in some form in SE but without CDI in the platform, we don't have to address it today.)
As it stands, we've left open that a 352 impl can inject into a non-CDI Batch application however it sees fit. There's no reason to now view this case any differently.
Requiring CDI support for a CDI app, in an EE server that presumably already has support for CDI in general, is a particular case, and I don't think strays from the original goal of not being unnecessarily tied to CDI.
Allow a CDI bean to do:
@Inject JobOperator jobOperator
(JSR 352 Bug 5075).
Possibly an associated task here is to specify that batch artifacts are considered managed beans, whether they have bean-defining annotations (BDA) or not.. though that could be done separately or later.
Currently, the Jakarta Batch Specification is in Word format. We need to convert this to asciidoc. If we can get this done for Jakarta EE 8, great! But, if not, my PR #2 will suffice.
Originally opened as bug 5382 by cf126330
Currently only java.lang.String is supported in @BatchProperty field injection. For other types of data, applications have to convert from String. I propose we support all common java types for @BatchProperty field injections, to make batch api easier to use. A related bug was Bug 4353.
We don't need to change the job xml schema, where properties are still declared as string values. The batch container is responsible for data conversion behind the scene. For example,
@BatchProperty
private int count;
A list of common java types pertaining to batch API:
all primitive types (int, short, byte, long, double, float, boolean, char)
array of all primitive types
wrapper type of all primitive types
array of all wrapper types of primitive types
List
Map<String, String>
Set
BigInteger
BitDecimal
java.util.Date
java.io.File
java.util.jar.JarFile
java.util.regex.Pattern
URL
URI
Inet4Address
Inet6Address
java.util.logging.Logger
java.lang.Class
Getting back to this.
I like the idea.. just wondering
A) precisely which ones to include/exclude
B) How to reference the specifics of each type of conversion (without having to define it in the batch spec itself). Not that it would be that hard to do so in our spec..just wondering if this was already defined.
E.g. it seems BatchEE accomplishes this using
org.apache.xbean.propertyeditor.PropertyEditors
I wonder where that would leave JSONObject ? Not as easy as I'm naively assuming for some reason or just didn't happen to be in that list?
Most probably something like github.com/tomitribe/sabot mecanism (same idea as JAXRS with its fromString or constructor handling) would match very well, I agree.
I think the spec should only guarantee that primitives (and their Object types), String, BigDecimal and BigInteger the allowed injection types. Maybe primitive arrays too. Implementations can implement more providers for other types if they'd like.
Seems kind of arbitrary to pick the set beyond the primitives, I agree...
I guess another direction to take this in is to model after JSF Converter, and define some built-in converters and the opportunity for custom converters.
Since JSF and JPA each have a bit different converters, seems like the precedent would be set for batch to define its own.
Would need to be clear if this is part of CDI integration or DI-neutral.
See here for detailed instructions.
Originally opened as bug 5701 by ScottKurz
The wording of the spec suggests @BatchProperty is a field, not method injection.
Though actually the Java code snippet pasted suggests support for method injection as well, all the RI fully supports is field injection.
One could argue method injection is a better practice in general, but it should probably at least be an alternative.
Per a discussion with the Spec Committee, the generated javadoc needs to include a reference to the EFSL (Eclipse Foundation Specification License). The Java code itself still contains the ASL v2 license. That's for development. But, the generated javadoc, which is considered part of the Spec, needs to use the EFSL.
We would just need to modify our pom.xml to include something like this (modified for this team, of course):
<bottom>
<![CDATA[Copyright © 1996-2019,
<a href="http://www.oracle.com">Oracle</a>
and/or its affiliates. All Rights Reserved.
Use is subject to
<a href="{@docRoot}/doc-files/speclicense.html" target="_top">license terms</a>.
]]>
</bottom>
Per the discussions on the Spec Committee and Platform Dev mailing lists, it's been discovered that many of the Javadoc and Spec references to the EFSL need updating. Please reference the following required updates and keep them in mind as your Spec Project is updating the files for Jakarta EE 9.
Note: Some Spec Projects have already started or even completed these updates. If so, just kindly return/close this Issue. Thanks!
Required EFSL updates for Javadoc
For javadoc, the EFSL.html located in src/main/javadoc/doc-files should be modified as follows:
<<url to this license>>
needs to be replaced with efsl.php link[1][title and URI of the Eclipse Foundation specification document]
needs to be replaced with the Specification Name and URL (Reference [2] for an example.)Required EFSL updates for Specifications
For specification, the license-efsl.adoc located in src/main/asciidoc should be modified as follows:
<<url to this license>>
needs to be replaced with efsl.php link[1][title and URI of the Eclipse Foundation specification document]
needs to be replaced with the Specification Name and URL (Reference [2] for an example.)[1] https://www.eclipse.org/legal/efsl.php
[2] https://jakarta.ee/specifications/enterprise-beans/4.0/
Is your feature request related to a problem? Please describe.
Jakarta EE 9 is the next major release, with the main focus of moving from the javax
namespace to the jakarta
namespace.
Describe the solution you'd like
This issue will be used to track the progress of this work effort via the Jakarta EE 9 Project board.
Additional context
Jakarta EE Specification Process
Eclipse Project Handbook
Eclipse Release Record for Jakarta EE 9
ToDo
Originally opened as bug 5779 by mminella
In DeciderTests#restartJobParameters(), line 683 creates a new set of properties for the restart of the job under test. Line 684 correctly overwrites a previously set job parameter in the new Properties object. However, line 685, incorrectly overwrites the job parameter in the old Properties object (not the new one). Line 685 should read:
restartJobParameters.setProperty("stop.job.after.this.step2", "None");
instead of
jobParameters.setProperty("stop.job.after.this.step2", "None");
Michael,
Good catch. I think this error does not affect the test logic though.
The MultipleExitStatusBatchlet looks for this property and returns a special exit status if the value matches the current step name.
This check is significant since this same batchlet is reused in multiple steps across this job in decider_transitions_from_split_on_restart.xml.
So in failing to do:
restartJobParameters.setProperty("stop.job.after.this.step2", "None");
we essentially did:
restartJobParameters.setProperty("stop.job.after.this.step2", null);
which leads to the same result in the MultipleExitStatusBatchlet logic.
That is, since there's no step named "None", failing to set it has the same effect as setting it to "None", even though that was clearly-intended.
--
Will leave this open to eventually clean up next time we do an update, but it shouldn't affect anyone's compliance.
That isn't correct. By not doing
restartJobParameters.setProperty("stop.job.after.this.step2", "None");
you are actually doing
restartJobParameters.setProperty("stop.job.after.this.step2", "split1flow2step2");
because on line 683, you copy the parameters from the first run to the second run. This causes flow2 to end up the same way it did in the first run, breaking the test.
For the record, it does impact our compliance.
(In reply to mminella from comment #2)
Michael,
Yes, I was wrong. I was incorrectly working backwards from the fact that this test passes for me.
Hmm...I wonder if our JREs are implementing java.util.Properties differently?
Seems the IBM and Oracle JREs I've tested with implement the constructor:
Properties(Properties properties)
WITHOUT doing a clone.
So updating jobParameters "after the fact"
jobParameters.setProperty("stop.job.after.this.step2", "None");
still has an effect on restartJobParameters allowing the test to pass.
Maybe you're using a JRE where this is implemented via a clone of the default properties?
Per the java.util.Properties javadoc:
A property list can contain another property list as its "defaults"; this second property list is searched if the property key is not found in the original property list.
The constructor you're using for this test is the one that specifies the defaults. So I would expect that if the property isn't set on the current Properties instance, it would look in the one provided via the constructor and find it there. What I think the difference is that I'm using Properties.getProperty("keyName") and you're probably using Properties.get("keyName"). The latter is really inherited from the Map interface and doesn't provide the full Properties functionality. The test case below exhibits that behavior:
@Test
public void testProperitesClone() {
Properties props1 = new Properties();
props1.put("key1", "value1");
props1.put("key2", "value2");
Properties props2 = new Properties(props1);
assertEquals("value1", props2.getProperty("key1"));
assertEquals("value2", props2.getProperty("key2"));
assertNull(props2.get("key1"));
assertNull(props2.get("key2"));
}
Michael, I agree that the code in the test is not as clear as it could be. Updating the default Properties list, "jobParameters" after the restartJobParameters constructor makes things a bit confusing. However, I don't see how this is an issue.
restartJobParameters.getProperty("stop.job.after.this.step2") should still return "None" instead of "split1flow2step2" since it defaults to the Properties list passed into the constructor.
@Test
public void testProperitesDefaultingAfterConstructor() {
Properties props1 = new Properties();
props1.put("key1", "value1");
props1.put("key2", "value2");
Properties props2 = new Properties(props1);
//override after constructor
props1.put("key2", "value2override");
assertEquals("value1", props2.getProperty("key1"));
assertNotSame("value2", props2.getProperty("key2"));
assertEquals("value2override", props2.getProperty("key2"));
}
As Scott mentioned, is it possible that your JRE is performing a clone during Properties(Properties properties)?
I stand corrected. It is a bug in our code. For those reading, if you're interested, when calling addAll() on a Properties object and passing it another Properties object, the defaults are not brought along. In our case, we were dropping the defaults when we created the new Properties object to use when restarting a job. Sorry for the confusion.
Thanks for clearing that up. No problem, sorry for the confusing test parameters. So let's tag with "tck_1.0" and come back whenever the next time is we need to do an update.
Remembering this issue, it occurred to me the spec should explicitly say what this test is expecting, which is that the jobParameters substitution operator is resolved using the java.util.Properties#getProperty(String) method, with its behavior concerning (recursive) defaults.
This is a natural pattern for batch where the previous execution's Properties serves as the default for the next execution's.
As we see here the 1.0 TCK already enforces it, so this mention would just to be even more clear.
We'd ended up thinking the spec should be clarified, the existing TCK test should be cleaned up, and possibly a new one added.
Recast summary. We should say something in the spec and maybe add another test that makes the same point as the one discussed earlier. (And look at this original test to clarify it as well).
From @rmannibucau in #13.....
if you take a simple "chain" reader R -> writer W. Today both use objects so they can coerce as they want and if there is a class cast exception it is user code, not jbatch. Tomorrow R will issue a Record and W will read a Map (for example) because I reuse R and W from a batch component library. Therefore runtime (jbatch impl) will fail.
Indeed we can say user can wrap the components but it is against the reusability and the whole component model (if so we would better inject a jbatch state saver than handle the flow in the framework).
This is where the ability to register implicit mappers/converters when launching the job+in cdi comes from.
Side note: can also require to define a generic Record type to have implicit coercing as Apache Beam/Spark/Flink and most workflow frameworks do to be able to work on generic flows but it can comes later since we wouldnt break composition with converters at least.
I think we want to decide which CDI constructs to integrate with (e.g. interceptors/decorators, whatever) as we look more at this.
Originally opened as bug 6456 by ScottKurz
Oh..one more minor note:
In "Rollback Procedure" at the end of Sec 11.9 we for some reason have the writer open() call before the reader open().
This seems like a copy/paste error. I would like to queue up a spec update to treat it as such.
Any objections?
Fixed in RI (along with other fixes) at:
WASdev/standards.jsr352.jbatch@1679181
Moving this to 'SPEC' to address the typo in Comment 1.
was there a test for it in the TCK or was it described that way in the API?
If so, then we need to keep it imo (according to JCP rules).
If it was ambiguous or even contradictory then we are free to 'fix' it.
There is no TCK test for it, or if there is, it is broken.
I don't know if it's ambiguous, section 11.9 says has ItemWriter.open
before ItemReader.open
, but it does go against the convention every other time these interfaces are mentioned where the pattern is always ItemReader.open
, ItemWriter.open
, ItemWriter.close
, ItemReader.close
.
(In reply to struberg from comment #3)
was there a test for it in the TCK or was it described that way in the API?
If so, then we need to keep it imo (according to JCP rules).If it was ambiguous or even contradictory then we are free to 'fix' it.
No, I'm almost 100% certain there was no test specifically enforcing the relative order of reader vs. writer open() in "Rollback Procedure". (As there was no test enforcing the order during normal processing, given that the RI contained this bug in 1.0).
So I think we're free to update the spec doc. And we really should add some TCK test coverage too.
(This order still surprises me though.)
Originally opened as bug 5075 by arungupta
JobOperator jo = BatchRuntime.getJobOperator();
does not align very well with the CDI-way of doing things in Java EE 7.
A better way would be:
@Inject JobOperator jo;
This request has been asked multiple times at different conferences around the world.
I also think batch must integrate with cdi asap.
+1 for me.
In Spring this would be handled with a FactoryBean. Couldn't this just be handled with a Producer without the need to add CDI specific language to the spec?
So say we were to take the same approach as we did with injecting the properties and contexts, and say that the API was:
@Inject JobOperator jobOp
without specifying whether it's CDI, or some other DI.
How would that then fit with the Spring approach?
With Spring Batch's implementation of the JSR, we have a FactoryBean that creates the Job/Step contexts to handle the @Inject scenario. I understand how @Inject for the JobOperator looks in the EE world. How does this proposal work in the SE world? Right now, all of the @Inject requirements still work in the SE world. This one is a bit different since the JobOperator is the entrance point for a batch application and probably provides/needs some bootstrapping.
Originally opened as bug 6473 by ScottKurz
AbstractItemWriteListener is in the RI, and AbstractItemReadListener and AbstractItemProcessListener are both in the spec.
So clearly an omission from the spec document.
Originally opened as bug 5502 by ScottKurz
I think it was Cheng who pointed out there was some ambiguity as to what metrics should show on a retry. Consider tightening up the spec by clarifying.
Not committing to address in future, just listing as candidates and pointing out they're excluded from current Maintenance Rel.
Originally opened as bug 5728 by BrentDouglas
Job's must have their before listeners and after listeners invoked in the same thread (11.3). Similarly partitioned Step's must have various parts invoked in the same thread as the thread as the Step started in (11.5,11.7).
Throughout these sections the main thread per section is referred to as 'Thread A' and it runs many of the items. This definition should be relaxed such that items run in 'Thread A' occur relative to other numbered items in the diagram with a relationship described as 'Happens before with the guarantee of only a single thread run in each item at a time'.
The current 'Thread A' definition means that we have a blocked thread for the duration of the job, and more blocked threads for the duration of each partitioned step, and really no convenient way to get around it. This proposal will not cause any issues with that approach but will allow other approaches, such as evented execution (which you could do now, but if 'Thread A' from one task got assigned another longer running task that the partitions, a completing task would have to wait for their 'Thread A' to be freed before resuming execution, so would not be very useful).
'Thread A' ---> |-------------> 'Px 1' |--> 'Px 2' |------> 'Other thing' |--------------------->
FIG.1 If 'Thread A' spawns Px1 and Px2, but Thread A gets put back in the pool and gets assigned to 'Other thing' from some other job, 'Thread A' can't resume the first job until 'Other thing' finishes.
The only downside to this proposal is that clients will no longer be able to use ThreadLocal's, but realistically, they shouldn't have been doing that anyhow in managed code and should have been storing things in the Job/Step Context anyway.
Brent,
Thanks for writing that up. It had crossed my mind that we might want to relax this.
Since I don't see this as a current spec or TCK ambiguity, I'm going to file this with a "future" tag, to look at for a possible future 1.1 release. Hopefully we can get some other feedback too at that point.
A README file could come in handy to the newcomers :-)
BatchRuntime class is documented in the spec doc, but its method just contains return null;
.
public *static* JobOperator getJobOperator() {
return null;
}
If we don't want to include the full method body, we may want to use ... or add comment like "the method body is omitted". The current form gives the reader the wrong impression that it always returns null.
Originally opened as bug 5515 by ScottKurz
See public ML discussion:
https://java.net/projects/jbatch/lists/public/archive/2013-10/message/2
Note: the link above is bad, but we have it archived. Pasting it here
While integrating our batch implementation into Java EE, we noticed this is an area not adequately covered by either batch spec or Java EE 7 Platform spec, which is understandable in the first release of the new spec. And in an earlier discussion in this list, Chris also indicated interest to remedy that in next version.
Our main concern with batch EE integration is portability and compatibility. Product A and B may have batch integrated into Java EE differently, and even though they both passed TCK, batch applications deployed to different appservers may still experience different behaviors.
Even though we tried to best align batch with EE platform during our batch integration, it's still possible the next version of batch spec may include different integration requirements we will need to implement at the risk of breaking product backward compatibility.
So I propose we start discussion on batch Java EE integration. Any new requirements will not be in batch spec 1.0, but they can provide guidance and expectations for application developers and spec implementers.
One area that can benefit from clarification is how to propagate various EE application contexts (e.g., classloader, security context, jndi naming context, transaction context) to batch threads. Inside an appserver, the batch container may use threads that are not managed by EE containers (ejb or web containers), and EE spec does not require context propagation to such unmanaged threads.
But we think it's common use case for batch artifacts running on unmanaged threads to use Java EE services such as jndi lookup, or EJB invocations, or resource injections. To achieve that, some form of context propagation is needed from the EE component that started batch job.
What types of contextual data must be propagated? If we follow the approach in JSR 236 Concurrency Utils for EE, it will be classloader, naming context and security context, but not transaction context. Batch runtime manages global transaction on its own so no tx propagation is needed.
When it comes to jndi naming context propagation, which jndi namespaces must be propagated, java:global/, java:app/, java:module/, and java:comp/? We believe java:global, java:app, java:module should be propagated, but java:comp namespace should not. All constant entries in java:comp required by EE spec will still be propagated, such as java:comp/UserTransaction. In webapp, java:comp/ mirrors java:module and so will also be made available. For this specific naming context propagation issue, we recently filed an issue with CTS, and I will be happy to share more details when we are on that topic.
Thanks,
Cheng
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.