The CloudSlang Orchestration Engine (Score) is a general-purpose orchestration engine which is process-based, embeddable, lightweight, scalable and multilingual.
Score is the core engine for running workflows. It supports multiple workflow languages (DSL) using a pluggable compiler approach. Adding a new workflow DSL requires adding a new compiler that will translate the DSL (written in xml, yaml, etc.) to a generic workflow representation called an ExecutionPlan.
worker recovery is checked every 2 minutes by Central (WorkerRecoveryServiceImpl.java). A worker is deemed to be recovered if
(the worker has not sent a keepalive recently) OR (the queue contains unacknowledged messages for this worker)
What happens:
there is only one worker in the RAS_Operator_Path group alias. the worker can connect to central fine...there is no connection issue.
queue contains 8 older and unacknowledged messages by the above RAS
every 2 minutes, central triggers recovery of this worker, which leads to the triggering of the recovery/restart of the worker itself(due to WRW change)
the above process obviously stops all running flows on the worker, slowing all executions down on central.
because there is only this worker in the RAS_Operator_Path, and he was already determined as "not alive" by the central recovery mechanism, the unack. messages can't be assigned to this worker(despite him being very much alive).
this makes it impossible to reassign those 8 messages, and the recovery for this worker will be attempted every 2 minutes, forever...until there will be another worker in the RAS_Operator_Path
operations that need more than 2 min, will NEVER complete, and their flows remain stuck
Suggestions:
worker recovery should not be triggered by the presence of unack queue messages. if the worker is alive(has sent keep-alives) and he's the only one in that group, it should be attempted to reassign those messages to him again, and not trigger the recovery process, which will cause all flow runs on that worker to be brutally destroyed.
Let's say we have flow flow_system_properties_subflow.sl in flow_system_properties_subflow.sl which uses simplecp.flows.flow_system_properties_subflow_level1 which located in file simplecp.flows.flow_system_properties_subflow_level1.sl located in the same folder
When executing cli with command
run --f c:/Temp/slang/tests/SimpleCp/flow_system_properties_subflow.sl --cp c:/Temp/slang/tests
We get error
Command failed java.lang.RuntimeException: java.lang.RuntimeException: Reference: 'simplecp.flows.flow_system_properties_subflow_level1' in executable: 'flow_system_properties_subflow', wasn't found in path
When executing command with specifying --cp with flow path
run --f c:/Temp/slang/tests/SimpleCp/flow_system_properties_subflow.sl --cp c:/Temp/slang/tests/SimpleCp
We get error
Command failed java.lang.RuntimeException: java.lang.IllegalArgumentException: Source flow_system_properties_subflow has dependencies but no path was given to the compiler
I've embedded the score to standalone application to try it and now I'm trying to embed it to ours.
The first problem was that we have quite old libraries (Spring 3.2.3, Hibernate 3.6.10, Spring data jpa 1.4.1) so I have to adjust that but luckily it did not require any drastic changes and it still works in the standalone application.
Now I think I have problem with the database. During the startup it's creating Worker and saving it to WorkerNodeRepository. The save does not throw any error but the repository stays empty. So it throws exception when looked up by uuid.
If I don't configure any datasource for score it uses the H2 by default? Should I connect it to our datasource? Is it persisting something between the application starts or does the database serve only for current executions?
I am kinda stuck at one place.
Maven project score
!!!! DependencyServiceTest is failing !!!!
!!!! Running with -DskipTests for now !!!!
!!!! These tests pass matrix, not sure why it fails here… !!!!
Exception in thread "main" org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'io.cloudslang.worker.management.WorkerRegistration#0': Invocation of init method failed; nested exception is java.net.UnknownHostException: alonbecker.local: alonbecker.local: nodename nor servname provided, or not known
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:136)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:407)
there are 35 occurrences of old sl syntax (action syntax) in language repository, need to check whether they need to be updated and act accordigly (ps.s cool IntelliJ plugin propts for them)
Hi all,
my application embeds both SpringBoot v.2.7.x and score-all v.0.3.280.
I would like to upgrade SpringBoot to v.3.x, that comes with Spring Framework 6.x, targeted for JDK17, and using jakarta.persistence instead of javax.persistence, too.
My question is: is there a chance that a new release of score-all built using jdk17 and jakarta libraries will come soon?
The busyWorkersService seems to run by default at intervals of 200 milli seconds configured in scoreEngineSchedulerContext.xml (from score-all)
This rapid run is adding to load on the database.
Qn 1: Can this interval (safely) be changed to for e.g 5 seconds?
Qn 2: Since its in the XML , teh XML has to be patched into the jar I guess after changing the config? Are there any other options to reconfigure?
score seems very interesting to be used in several use cases. in order to ease the use, I would like to ask a few requests to make it easier to work with :
add compile command to CLI – so one can have a “dry run” on the yaml flow file.
visual EP tool for slang, a way to see your flow from a slang file.
Parallel feature is important. ant like structure is a good example.
docs for content (Java + Yaml). you are lacking content docs, both on the Yaml flows/operations and on Java actions.
... for example if your java content will throw an exception, this RuntimeException will suppress the exception message ( for example if your content will throw a NullPointerException, the execution.log will not tell you that)
Fix: add the exception message to the RuntimeException message
Score finished event is not synchronous to the execution and therefore we need to re consider if that is the way to listen to the run termination or to the language events.
Adding worker more than once to the same workers group causes a database integrity exception due to the fact that the primary key on OO_WORKER_GROUPS is composed of worker_id and group_name