Code Monkey home page Code Monkey logo

talendcomp_tjobinstance's Introduction

talendcomp_tjobinstance's People

Contributors

dependabot[bot] avatar jlolling avatar mattywausb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

talendcomp_tjobinstance's Issues

tJobInstanceLiveCheck problems to get PIDs on Debian 11

Hi Jan,

This issue pertains to calling PIDs via "ps -eo pid" in the tJobInstanceLiveCheck component. We've discovered that the live-check isn't functioning properly after upgrading Debian from 9 to 11 and Java from 8 to 11.

It seems to us that Debian 11 doesn't always respond with a complete set of PIDs using this command.

We would suggest that the component use "pgrep ." to retrieve this list. We hope this will resolve this behavior.

Thanks in advance!
Jens

Extend job_instance_status by job revision

Beside the general name of a running job, it would be great to know its exact revision/build id/version/save date.

I'm suggesting the following solution:
Add a column "job_revision_information" to the job_instance_status table.
Add a new Field "Revision information" to tJobInstance start, where you can declare an expression, that shall be put into the new column. Default is "$Id$". Add a checkbox, "use timestampt from jobinfo.properties". Default is "yes".

In Default mode, the component searches for the jobInfo.properties file of the job and uses the date information in that file. The file is created by the build process of talend and does not exist when running the job in the studio.

Unchecking the "use timestamp from jobInfo.propteries" gives full control to the developer how to insert version information. Here are two possible solutions:
He can put here some replacement string, that will be detected and rewritten by the version control system . (e.g. commit to svn -> build by commandline).
He can use a context variable and rewrite the information in the context properties files of the job during the building or deployment process. (Probably better, when not using the talend toolchain for build and deployment. So build might happen before Version Control system is adding its information)

Rename table column job_display_name to task_name

Often a job runs for multiple purposes called tasks. To enable storing the task name currently the column job_display_name is misused. On the other hand the column job_display_name were actually never really used for its original intent purprose.

Dump Counter Elements to stdout

When configuring countres in tJobInstanceEnd the named counters can be saved to a table (Save named counters).

An additional checkbox to dump the counter output to stdout would be great.

That way, the counters are available as part of the Log in TAC when clicking on Display last execution details.

image

It's quicker to find the counters there (and have a rough estimate whether this is the job responsible for faulire in a chain of jobs) than to lookup in the database when searching for the job not processing the expected amount of data.

Change Timestamp column in job_instance_status for mysql

During some "crazy" default beaviour with timestamps in mysql, at least the timestamps for job_start and job_stop should be decared with default "null" as follows:

JOB_STARTED_AT TIMESTAMP(3) NULL default NULL,
JOB_ENDED_AT TIMESTAMP(3) NULL default NULL,

Else the mysql assigns 0 as default, wich will result in exceptions, when reading the timestamps with talend.

tJobInstanceStart: option to exit on startup error

Status:
In General tJobInstance Start it put into the preJob phase.
During the standard bevavior of talend, an error in the preJob phase will not result in a complete stop of the jobs.
Instead, the job will immedatily enter the main phase and after that, the postJob phase.

Request
When tJobsInstaceStart fails, there must be something completely wrong, and the job should not continue.
This can be done by killing the JVM but then should be optional to prevent killing independent Java jobs of the same JVM container.
Another approach would be a "hack" into the generated code structure to encapsulate all code behind tJobsInstanceStart.
Maybe a Feature Request to Talend "Die on error" on tPrejob would be a good idea.

* Workaround*
The main part of a job must check if job_instance id is not null. Else won't do anything.

Refactor Counter Configuration Elements

I know, the following is hard to construct backwards compatible but I think it will make tJobinstanceStop less complex and provide potential extensability.
Instead of providing dedicated table elements in the component to add Input/Ouput Counters, it can be more efficent by just having one table element to configure all counters. Mapping the single entries to the summarizing category would be done by an additional first column with a dropdown box to select the category (input, output,reject...).

This single table can use more space an would show for normal jobs configured elements without any scrolling. Since, categorization is no longer a "hard coded" feature, an extention to more categories, can be done easy.

For Migration from the old Version, a job, needs to migrate all item files. This might be the hard part to provide on rollout. A soft method would be the possilitiy to switch the behaviour and concept with a checkbox, and let the developers decide, wich way they will use it. Both methods would be compatible with the new datamodel of issue #7

Return Errorcode 0 after an exception

It is not possible overwrite the returncode to 0 when catching an exception inside of the job.
This would be nice to catch irrelevant errors and declare the run as valid. (e.g. an error in some post processing, after all data was transformed and loaded)

Add Ignore counter to job_instance_status and tJobInstanceStop

I would like to measure the ignored rows. "Ingnored" means, the incoming information shall not be copied to the destination on purpose. Having this counter one can measure the completnes of the transformation by checking
input = output + reject + ignore

tJobInstanceEnd does not always commit

tJobinstanceEnd does not commit its update under following circumstances:
Connection in the main job is set with autocommit "off" on "on" (doesent matter in this case).
Job calls a subjob wich reuses uses the connection but does not contain tJobinstances. (It's a module for some extended management of job metadata). In the subjob also setting autocommit to "off" in the connection component and using an explicit commit.

In this case, only the initial start data is written to job_instance_status. JobInstanceEnd data is not commited.

When I deactivate the subjob, all works fine in the main job.
When I set connection to autcommit in the subjob it also works fine.

JMX Interface

MBean for Log4j-Logging
MBean for globalMap access
MBean for context var access

Get Counts from monitored flows

As an alternative to retrieve counters from the global map it would be nice to get the count information from "monitored" flows. Working with this feature would be like follows:

  1. Mark flows to count as "monitored"
  2. in tJobinstanceEnd: enter name* of the flow and declare a countergroup and counter name
    * or select name from a dropdown, if possible

tJobInstanceEnd should act as follows:

  • in generation/compile phase
    • throw an error if the flow does not exist or is not marked
    • throw a warning if a monitored flow is not entered as counter
  • capture and summarize the count data (flows in iterations will throw a value for each iteration)
  • if no data was captured for the counter in the whole jobrun, register a zero for the counter

Provide tDie Message in return_message

Currently, when ending the job with a tDie, the return_message is NULL-
(Tested with TOS 6.5.1 and tJobInstanceEnd V 8.5 (20220405))
It would be helpful to get the text of the triggered tDie in the return_message.

Add miscallenious counters

Sometimes i would like to log a counter without counting it to the general input/ouput/reject category.
An additional counting configuration to add "miscallenious" named counters would do the job.

Extend Job_instance_counters table by "CouterGroup" column

It would be helpful, to have a marking in the job_instance_counter table, wich represents the "general counter group" in the tJobInstanceEnd. This can be a 1 char colums holding one of the following markers:
I = input
O= output
U = update
D = delete
R = reject
G = ignore (see "Ignore" counter issue)

Need to add column COUNTER_TYPE to table JOB_INSTANCE_COUNTERS

In the DDL in the PDF document you need to add column COUNTER_TYPE to table JOB_INSTANCE_COUNTERS.

When the check box "save named counters" is thicked for teh Job_instance_end, it will raise an error on the underlying INSERT statement:

insert into dwh.JOB_INSTANCE_COUNTERS (JOB_INSTANCE_ID,COUNTER_NAME,COUNTER_TYPE,COUNTER_VALUE) values (15,'ReadMetadataEntries','output',5) was aborted: ERROR: column "counter_type" of relation "job_instance_counters" does not exist.

That column is currently missing from the DDL in the documentation.

Work-arround: alter table dwh.job_instance_counters add column counter_type varchar(25);

Inconsistency in memory logging feature

Release 8.2, Talend TOS 6.x:
When activating "Memory Usage Monitoring" there only the output of the time of the peak usage but not the usage itself.
Usage must be logged by own code now. Former versions printed all values to console.

This is what I get now:

|134 |Van Buren|Missouri |
|134 |Kennedy |Alaska |
|134 |Carter |Missouri |
'--------------------+---------+-----------'

Maximum of memory usage measured at: 2021-10-12 16:05:44
2021-10-12 16:05:44|eGvJPX|eGvJPX|eGvJPX|41596|ETLFRAMEWORK|template_demo_no_context_load__pg|_AEkDECtsEeyJ0sp_R7CZAQ|1.0|dev1||end|success|396

Removal of the consol output and providing the values only as return variables, has huge impact in case of upgrade in older projects, since we would need to add extra components to every job, to print the values.
Please integrate the console print again.

compilation error in talend 7.3.1

Talend 7.3.1 doesn't provide log4j anymore. This causes compilation errors when using tJobInstance components. The tJobInstanceEnd part contains code to close appenders even if log4j is not explicitly activated in tJobInstanceStart component.
The marked line with --> error causes the compilation error

// close all other known appenders in this job
for (java.util.Map.Entry<String, Object> entry : globalMap.entrySet()) {
	if (entry.getKey().endsWith("_APPENDER")) {
		Object object = entry.getValue();
		if (object instanceof org.apache.log4j.Appender) {
			// detach appender
--> error			org.apache.log4j.Logger.getLogger("talend")
					.removeAppender((org.apache.log4j.Appender) object);
			try {
				// close appender
				((org.apache.log4j.Appender) object).close();
			} catch (Throwable t) {
				// ignore errors
			}
		}
	}
}

House keeping function

Define job name pattern and how long the data should be available.
Delete by data range or delete by number records.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.