Code Monkey home page Code Monkey logo

durabletask-java's People

Contributors

bachuv avatar cgillum avatar davidmrdavid avatar kaibocai avatar kamperiadis avatar microsoft-github-policy-service[bot] avatar shreyas-gopalakrishna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

durabletask-java's Issues

Durable retry policies

Retry policies are not yet implemented for the Java SDK. Retry polcies will cover:

  • Activities and sub-orchestrations
  • Declarative retry policies (max. number of retries, exponential backoff, etc.)
  • Imperative retry policies, implemented as a lambda function

I consider this to be in scope for public preview. I'm currently working on it for .NET Isolated and I'd like to also implement it for Java to help make sure the design is solid and applies well to multiple languages.

Rename Task.get() to Task.await()

Per internal discussion, using .await() instead of .get() will make more intuitive sense to the developer. It's inconsistent with CompletableFuture, but there's no strong reason to try and be consistent with CompletableFuture (it might actually be beneficial to avoid this association).

Continue-as-new support

Requirement

Orchestrations should be able to use the continue-as-new pattern as described here.

One or more integration tests should also be written to exercise the continue-as-new codepath.

Implementation notes

A good starting point for writing an integration test would be to look at the Continue-as-new integration test for .NET Isolated: https://github.com/microsoft/durabletask-dotnet/blob/0998d1719e98c3d340f964d7ee659dc4e2916643/test/DurableTask.Sdk.Tests/OrchestrationPatterns.cs#L404-L432

Custom status support

Requirement

Orchestrations should be able to set a custom status, as described here.

One or more integration tests should also be written to exercise the custom status code path.

Implementation notes

When writing the integration test, this .NET Isolated custom status test can be used as a reference.

Improved orchestrator and activity input/output handling

Background

The current TaskOrchestration and TaskActivity interfaces require implementations to use the context object to receive inputs. Consider the following example (copied from the integration tests):

final String orchestratorName = "SingleActivity";
final String activityName = "Echo";
final String input = Instant.now().toString();
DurableTaskGrpcWorker worker = this.createWorkerBuilder()
    .addOrchestrator(orchestratorName, ctx -> {
        String activityInput = ctx.getInput(String.class);
        String output = ctx.callActivity(activityName, activityInput, String.class).get();
        ctx.complete(output);
    })
    .addActivity(activityName, ctx -> {
        return String.format("Hello, %s!", ctx.getInput(String.class));
    })
    .buildAndStart();

This design, while it works, has a few issues:

  1. The fetching of inputs is not type-safe because the call to ctx.getInput(String.class) could fail if the input isn't actually a string.
  2. The call to ctx.getInput() isn't intuitive, making the programming model harder for developers to learn.
  3. The call to ctx.complete(output) is both unintuitive and inconsistent with how activity functions are defined.

This issue tracks improving the programming model to make processing inputs and outputs more intuitive and type-safe.

Proposal

The proposal is to change these interface definitions so that a developer could write the following, simpler orchestration and activity implementations:

final String orchestratorName = "SingleActivity";
final String activityName = "Echo";
final String input = Instant.now().toString();
DurableTaskGrpcWorker worker = this.createWorkerBuilder()
    .addOrchestrator(orchestratorName, (ctx, activityInput) -> {
        return ctx.callActivity(activityName, activityInput, String.class).get();
    })
    .addActivity(activityName, (ctx, name) -> {
        return String.format("Hello, %s!", name);
    })
    .buildAndStart();

The differences are:

  • The orchestrator and activity functions have the input passed to them explicitly.
  • The orchestrator can use the return value to set the output.

This results in a simpler, more intuitive programming model and is also consistent with the proposed C# programming model.

Azure Functions Templates

This item tracks any tooling work that's required for Durable Functions templates. I don't have enough context to know what exactly is required for Java, but I assume at a minimum work needs to be done to support templates in Core Tools.

Update README with getting started instructions

The README.md for this repo needs to be updated with getting started instructions. For now, let's focus on getting started for Azure Functions since the sidecar model isn't officially supported yet. A few things we'll need to include:

The exact content can be flexible, but ideally advanced folks can use this information to get started without necessarily needing to refer to the quick start.

Use middleware to replace OrchestrationRunner.loadAndRun in orchestrator functions

See Azure/azure-functions-java-worker#595 for reference.

The idea is to simplify the orchestrator function programming model from this:

@FunctionName("Chaining")
public String helloCitiesOrchestrator(
        @DurableOrchestrationTrigger(name = "runtimeState") String runtimeState) {
    return OrchestrationRunner.loadAndRun(runtimeState, ctx -> {
        String input = ctx.getInput(String.class);
        int x = ctx.callActivity("F1", input, int.class).await();
        int y = ctx.callActivity("F2", x, int.class).await();
        int z = ctx.callActivity("F3", y, int.class).await();
        return  ctx.callActivity("F4", z, double.class).await();
    });
}

...to something that looks like this...

@FunctionName("Chaining")
public double helloCitiesOrchestrator(
        @DurableOrchestrationTrigger(name = "runtimeState") String input) {
    int x = ctx.callActivity("F1", input, int.class).await();
    int y = ctx.callActivity("F2", x, int.class).await();
    int z = ctx.callActivity("F3", y, int.class).await();
    return  ctx.callActivity("F4", z, double.class).await();
}

Add a CHANGELOG.md file to the project root

Having a single file that tracks changes across versions will be helpful for users to understand what fixes were released across different builds. This will be very useful when doing servicing since going through the GitHub releases can be tedious.

  • Add the initial file and include links to PRs/fixes since the previous release
  • Create a PR template in this repo that includes a checkbox asking if the PR author has updated CHANGELOG.md.

See here for an example: https://github.com/microsoft/durabletask-mssql/blob/main/CHANGELOG.md.

Basically, it should look something like this. Note that I like to explicitly highlight external users that contributed PRs, when possible.

## v1.0.1

### Updates

* Fixed bug ([#198](https://github.com/microsoft/durabletask-java/issues/198))
* Fixed bug ([#199](https://github.com/microsoft/durabletask-java/issues/199))

## v1.0.0

### New

* New feature A ([#196](https://github.com/microsoft/durabletask-java/pull/196)) - contributed by [@user](https://github.com/user)
* New feature B ([#197](https://github.com/microsoft/durabletask-java/pull/197)) - contributed by [@user](https://github.com/user)

### Updates

* Fixed bug ([#187](https://github.com/microsoft/durabletask-java/issues/187))
* Updated dependency ([#188](https://github.com/microsoft/durabletask-java/issues/188))

### Breaking changes

* Renamed method from `Foo` to `Bar`

The purgeInstances API should take a timeout parameter

Purging lots of orchestration data can take a very long time. We should add a timeout parameter that allows the caller to limit how much time is spent purging instance data. If the timeout expires, ideally, we should still return the number of instances purged, but possibly return an addition value in the PurgeResult class that indicates whether the operation was cancelled.

Add createHttpManagementPayload client method

This is an existing feature supported by most of the other languages. More details here.

This will need to be created in the durabletask-azure-functions package since it's specific to Azure Functions.

The documentation linked above will also need to be updated with a code sample.

Sub-orchestration support

Requirement

Orchestrations should be able to create sub-orchestrations, as described here. The Retry and error handling should work the same as the APIs for invoking activities.

One or more integration tests should also be written to exercise the sub-orchestration codepath.

Implementation notes

In addition to adding new public methods on the TaskOrchestrationContext interface, these lines of code need to be uncommented and fully implemented in order to handle history events associated with sub-orchestrations.

When writing the integration test, this .NET Isolated sub-orchestration test can be used as a reference.

API to restart an orchestration instance

The HTTP URL for restarting an orchestration instance: restartPostUri
We use Eternal orchestrators in our use case. In the case of any failures that we can recover from, We would like to restart the orchestration instance. However, the restart API is not available in the Java SDK. Can we please prioritize the same?
cc: @cgillum @ChrisRomp

TaskOrchestrationContext.allOf should throw TaskFailureException instead of RuntimeException

The implementation of this method in TaskOrchestrationExecutor.java currently throws a RuntimeException if any of the inner tasks fails. This seems wrong because tasks are only supposed to fail with TaskFailedException.

One challenge is that TaskFailedException is designed to represent a single named task. We may need to create a new exception type for composite exceptions. Alternatively, we could surface the details of the first exception. It may be useful to use the current .NET implementation design as a reference for this design.

As part of this work item, we should also do the following:

  • Update the API documentation for TaskOrchestrationContext.allOf to accurately reflect the expected exception behavior
  • Write a unit or integration test that ensures the correct behavior

Remove thenXXX methods from Task class

After some discussion with the Azure CDAs, it was decided that then Task.thenXXX(...) methods should be removed to 1) simplify the API surface area and 2) avoid confusion with Java developers that have experience with similarly named APIs in the CompletableFuture class. None of the primary orchestration patterns require these methods, so it's generally safe to remove them.

As part of this work, we should also remove the comparison with CompletableFuture in the class's JavaDoc documentation. This is to help further prevent confusion through comparison.

Suspend and Resume client APIs

Add option to sort orchestration query results in descending order

There has been an ask to support querying for orchestration instances and sorting by descending order. We should investigate if this is practical at the storage provider level and, if it is, expose it all the way up to the Java client SDK.

Suggested implementation

[Request] Add ability to filter instance status history by eventType

On the Get Instance Status API there's a showHistory parameter which returns a historyEvents array for the instance. Would it be possible to add a $filter query parameter or an explicit includeEventType type of parameter to enable filtering of the historyEvents result to only specific types of events?

Current example response:

{
  "createdTime": "2018-02-28T05:18:49Z",
  "historyEvents": [
      {
          "EventType": "ExecutionStarted",
          "FunctionName": "E1_HelloSequence",
          "Timestamp": "2018-02-28T05:18:49.3452372Z"
      },
      {
          "EventType": "TaskCompleted",
          "FunctionName": "E1_SayHello",
          "Result": "Hello Tokyo!",
          "ScheduledTime": "2018-02-28T05:18:51.3939873Z",
          "Timestamp": "2018-02-28T05:18:52.2895622Z"
      },
      {
          "EventType": "TaskCompleted",
          "FunctionName": "E1_SayHello",
          "Result": "Hello Seattle!",
          "ScheduledTime": "2018-02-28T05:18:52.8755705Z",
          "Timestamp": "2018-02-28T05:18:53.1765771Z"
      },
      {
          "EventType": "TaskCompleted",
          "FunctionName": "E1_SayHello",
          "Result": "Hello London!",
          "ScheduledTime": "2018-02-28T05:18:53.5170791Z",
          "Timestamp": "2018-02-28T05:18:53.891081Z"
      },
      {
          "EventType": "ExecutionCompleted",
          "OrchestrationStatus": "Completed",
          "Result": [
              "Hello Tokyo!",
              "Hello Seattle!",
              "Hello London!"
          ],
          "Timestamp": "2018-02-28T05:18:54.3660895Z"
      }
  ],
  "input": null,
  "customStatus": { "nextActions": ["A", "B", "C"], "foo": 2 },
  "lastUpdatedTime": "2018-02-28T05:18:54Z",
  "output": [
      "Hello Tokyo!",
      "Hello Seattle!",
      "Hello London!"
  ],
  "runtimeStatus": "Completed"
}

Typed proxy support

Requirements

In order to create a type-safe experience for working with orchestrations and activities, the Java SDK for the Durable Task Framework should support type-safe invocation of orchestration and activities.

For example, when calling an activity, instead of the following:

Object activityInput = "World";
String result = context.callActivity("SayHello", activityInput, String.class).get();

...we should be able to instead write:

MyActivities activities = // get a pointer to an activity proxy
String result = activities.sayHello("World");

We'd want the same experience also for orchestrations (created by a client) and sub-orchestrations.

Ideally this support will also allow developers to easily unit test their code.

Design considerations

Java's Proxy support could be a good starting point for such a feature.

Automated tests for Azure Functions

We currently have a GitHub Actions workflow for the Java SDK. However, we don't have one set up for Azure Functions yet. We should add this to ensure any changes we make don't break Azure Functions scenarios.

Here are my suggestions for this work item:

  1. Create a new Durable Functions Java sample app with some HTTP triggers that run some orchestrations. Ideally these will be used for our documentation samples as well as automated testing. Importantly, it should reference the locally built version of the Java SDK and annotations.
  2. Create a Dockerfile that containerizes the sample app.
  3. Create a new integration test project under azurefunctions that runs the sample app's orchestration via its HTTP triggers.
  4. Add a new job to the existing build-validation.yml workflow to build this docker image and run these tests as part of the CI.

Here's an example workflow definition that demonstrates how we can setu up and run a containerized Function app as part of a CI job. The actual setup logic is implemented as a PowerShell script so that it can be run locally. It does a few important things for us:

The script then runs a simple orchestration test. However, instead of doing this from the script, I suggest we use a /gradlew test command to execute tests as a separate step in the GitHub action workflow job. That will make it easier to analyze the results.

As a final step, we should probably dump the container logs so that we can debug any issues that arise more easily. Alternatively, we could configure the function app to publish logs to Application Insights, which might make it easier to query for issues compared to digging through text logs.

Structured logging

Requirement

Structured logging should be added to the SDK to support tracing and debugging by developers and operators. Ideally, these logs integrate naturally with the logging system used by the application host.

Proposal

The java.util.logging.Logger class can be used as the main abstraction for logging, similar to the ILogger interface in .NET.

Other considerations

Azure Functions has a particular way of doing logging and tracing for Java-based function apps. Ideally any design we consider should integrate nicely with external runtime hosts (especially Azure Functions). Azure Functions code can call context.getLogger() to get an instance of java.util.logging.Logger to use for logging.

https://docs.microsoft.com/en-us/azure/azure-functions/functions-reference-java?tabs=bash%2Cconsumption#logger

Instance termination support

Requirement

Orchestration instances should be able to be terminated, as described here.

One or more integration tests should also be written to exercise the instance termination code path.

Implementation notes

When writing the integration test, this .NET Isolated termination test can be used as a reference.

DurableTaskGrpcClient.getInstanceMetadata does not return null when no instance found.

Regarding the API, getInstanceMetadata should return null when there are no instances found.

public abstract OrchestrationMetadata getInstanceMetadata(String instanceId, boolean getInputsAndOutputs);

     * @return a metadata record that describes the orchestration instance and its execution status, or
     *         <code>null</code> if no such instance is found.
     */
    @Nullable
    public abstract OrchestrationMetadata getInstanceMetadata(String instanceId, boolean getInputsAndOutputs);

However, In that case, I found that DurableTaskGrpcClient.getInstanceMetadata returns DEFAULT_INSTANCE with orchestrationState_=0 (ORCHESTRATION_STATUS_RUNNING).

// Protobuf code.

    /**
     * <code>.OrchestrationState orchestrationState = 2;</code>
     * @return The orchestrationState.
     */
    @java.lang.Override
    public com.microsoft.durabletask.implementation.protobuf.OrchestratorService.OrchestrationState getOrchestrationState() {
      return orchestrationState_ == null ? com.microsoft.durabletask.implementation.protobuf.OrchestratorService.OrchestrationState.getDefaultInstance() : orchestrationState_;
    }
 
 
    // @@protoc_insertion_point(class_scope:OrchestrationState)
    private static final com.microsoft.durabletask.implementation.protobuf.OrchestratorService.OrchestrationState DEFAULT_INSTANCE;
    static {
      DEFAULT_INSTANCE = new com.microsoft.durabletask.implementation.protobuf.OrchestratorService.OrchestrationState();
    }
    public static com.microsoft.durabletask.implementation.protobuf.OrchestratorService.OrchestrationState getDefaultInstance() {
      return DEFAULT_INSTANCE;
    }

This issue prevents us from invoking client.scheduleNewOrchestrationInstance in Singleton pattern described the official document here. That also means the library users cannot tell the accurate state of instance.

https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-singletons?tabs=java#singleton-example

@FunctionName("HttpStartSingle")
public HttpResponseMessage runSingle(
        @HttpTrigger(name = "req") HttpRequestMessage<?> req,
        @DurableClientInput(name = "durableContext") DurableClientContext durableContext) {

    String instanceID = "StaticID";
    DurableTaskClient client = durableContext.getClient();

    // Check to see if an instance with this ID is already running
    OrchestrationMetadata metadata = client.getInstanceMetadata(instanceID, false);
    if (metadata.isRunning()) {
        return req.createResponseBuilder(HttpStatus.CONFLICT)
                .body("An instance with ID '" + instanceID + "' already exists.")
                .build();
    }

    // No such instance exists - create a new one. De-dupe is handled automatically
    // in the storage layer if another function tries to also use this instance ID.
    client.scheduleNewOrchestrationInstance("MyOrchestration", null, instanceID);
    return durableContext.createCheckStatusResponse(req, instanceID);
}

Change OrchestratorBlockedEvent and TaskFailedException to be unchecked exceptions

To improve the ease of use for Durable Functions for Java, we should change OrchestratorBlockedEvent and TaskFailedException to both derive from the unchecked RuntimeException. The main drivers for this are:

  1. Users don't have to declare that every orchestrator functions throws OrchestratorBlockedEvent, TaskFailedException when using the new middleware.
  2. Users can more easily use context APIs from within lambdas.

We should also rename OrchestratorBlockedEvent to OrchestratorBlockedException now that it derives from Exception.

Non-blocking APIs for Durable Task clients

Issue summary

The DurableTaskClient APIs are all blocking APIs and use gRPC blocking-stubs internally. This limits the scalability of apps that depend on these client APIs.

Proposal

The DurableTaskClient abstract class should add support for non-blocking variants of the various APIs (starting orchestrations, waiting for their completion, etc.).

Other considerations

Azure Functions for Java doesn't yet support non-blocking functions: Azure/azure-functions-java-worker#244, so non-blocking client APIs likely won't benefit Azure Functions users. Unfortunately, there's no indication about if or when this support will be added to Azure Functions.

DurableClientContext.createCheckStatusResponse status code

Hi,
I have a project which invokes a durable function from a logic app (through a http handler), and when using this approach against a durable function written in C# the logic app Asynchronous Pattern processing is able to poll the status because the orchestrator code returns a response with status code set to 202 (see doc https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.webjobs.extensions.durabletask.idurableorchestrationclient.createcheckstatusresponse?view=azure-dotnet#microsoft-azure-webjobs-extensions-durabletask-idurableorchestrationclient-createcheckstatusresponse(system-net-http-httprequestmessage-system-string-system-boolean)).

Returns
[HttpResponseMessage](https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpresponsemessage)
An HTTP 202 response with a Location header and a payload containing instance control URLs.

However when the logic app is pointed towards a durable function written in Java, the DurableClientContext.createCheckStatusResponse is returning a response with a status code of 201 (

) which means the logic app doesn't poll for progress and immediately continues.

My query was should this function be returning the status code set as 202 to be consistent with the .NET sdk?

The waitForOrchestrationStart/Complete methods should throw TimeoutException

Currently these methods will raise a io.grpc.StatusRuntimeException if the timeout deadline is exceeded.

io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 0.998280100s. [closed=[], open=[[remote_addr=/127.0.0.1:4001]]]

This is problematic because we don't want the details of gRPC to bleed out into the programming model, particularly for things that are "expected".

Instead, we should catch the gRPC exception internally, validate that it's a DEADLINE_EXCEEDED exception, and then throw a TimeoutException. This is a checked exception, which means that callers will be required to catch it.

Note that this is a breaking change and should be done before the 1.0.0 release.

Purge instance support

Requirement

Orchestration clients should be able to purge orchestration data, as described here. Purging single instances and multiple instances using filters should be supported.

One or more integration tests covering this scenario should also be added.

Implementation notes

Note that there's an equivalent work item being tracked to add this for .NET Isolated support. This would be a good reference for the Java work-item, if available.

Suggested implementation strategy

The Durable Task sidecar works by making API calls into the Durable Task Framework (DTFx). Unfortunately, DTFx doesn't support a "purge" API as part of its core abstraction. Rather, this was added only to specific DTFx backends. To implement this feature correctly, we need to make purge a first-class feature of DTFx. Here are some suggestions for how to do this:

  1. Add a new public IOrchestrationServicePurgeClient interface to the DurableTask.Core project and define a PurgeInstanceStateAsync method:
public interface IOrchestrationServicePurgeClient
{
    void Task<PurgeHistoryResult> PurgeInstanceStateAsync(string instanceId);
}
  1. Update the AzureStorageOrchestrationService class in Azure/durabletask (the same repo) to implement this new interface and map it to the existing PurgeInstanceHistoryAsync method.

  2. Update the SqlOrchestrationService class in microsoft/durabletask-mssql to implement this new interface and map it to the existing PurgeOrchestrationHistoryAsync method.

  3. Update the NetheriteOrchestrationService class in microsoft/durabletask-netherite to implement this new interface and map it to the existing PurgeInstanceHistoryAsync method.

It's not strictly necessary for us to do this work for all backends at once. We can start with (1) and skip ahead to the next section so we can more quickly get to integration testing and validate the design. I suggest we open GitHub issues to separately track the different steps above.

  1. Update the InMemoryOrchestrationService class in the microsoft/durabletask-sidecar repo to implement this new interface. There is no existing method that implements similar functionality, so an implementation will need to be written from scratch. This is an in-memory storage provider so it's fine to just remove the appropriate objects in InMemoryInstanceStore.store. This is the storage provider used by integration tests so this implementation should probably be the first one to work on.

  2. Uncomment and finish implementing the DeleteInstance gRPC definition. For local development, these changes can be done directly in the sub-module under the microsoft/durabletask-sidecar project. This API will need to match the API defined in the new IOrchestrationServicePurgeClient method mentioned above.

  3. Using the updated gRPC definition, add a new overload method to the sidecar's TaskHubGrpcServer class. This method will take the IOrchestrationServiceClient object (which should already be accessible) and try to cast it into IOrchestrationServicePurgeClient. If the cast succeeds, then invoke the new method. Otherwise, throw a new NotSupportedException with a helpful error message.

  4. In the Java SDK, add two new methods to DurableTaskClient with a matching method signature for the single-instance purge API and another for the multi-instance purge API. The implementations should both call the new gRPC method that was defined in a previous step. The multi-instance purge API should internally use the new query API to select all the instances that need to be deleted (in pages) and issue individual deletes, ideally in parallel (but with some limit on the amount of parallelism to avoid thread starvation). There should also be a parameter for either implementing a timeout or cancellation since this operation could be very slow. The exact design should be based on what's most familiar to Java developers. In the future, we can consider optimizing the multi-instance purge API such that the sidecar implements multi-instance purge, but that's not critical at this point.

com.microsoft.durabletask.DataConverter$DataConverterException in case of no default constructor

If no default constructor is generated by javac, the following deserializing warning raises in OrchestrationTrigger. The message is not clear for this library users.

WARNING: The orchestrator failed with an unhandled exception: com.microsoft.durabletask.DataConverter$DataConverterException: Failed to deserialize the JSON text to net.yutobo.durablefunc.Person.

Regarding the java debugger output, this warning is from the exception of Jackson.

InvalidDefinitionException@92 "com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of `net.yutobo.durablefunc.Person` (no Creators, like default constructor, exist): cannot deserialize from Object value (no delegate- or property-based Creator)
 at [Source: (String)"{"name":"yusuke","age":30}"; line: 1, column: 2]"

Client Function

        Person person = new Person("foobar", 30);
        client.scheduleNewOrchestrationInstance("orchestrateSingleton", person, instanceID);

OrchestrationTrigger

return OrchestrationRunner.loadAndRun(runtimeState, ctx -> {

            Person person = ctx.getInput(Person.class); // com.microsoft.durabletask.DataConverter$DataConverterException
            context.getLogger().info(person.toString());

JavaDoc documentation

This issue tracks adding detailed JavaDoc documentation to the public APIs and ensuring this documentation can be viewed by users using IDEs like VS Code and IntelliJ.

Sample: Custom encryption for orchestration state

Scenario

There are a variety of users who will need to encrypt their durable state using encryption keys they control. While it's possible to do this directly in the storage layer, it may be beneficial to support this directly in the API layer. The most natural way to do this is to use the DataConverter interface.

One challenge will be with how to deal with key rotation. For example, what happens if an orchestration can run for up to 1 year, but a company has a key rotation policy of 30 days? Do we require that old keys be kept around to decrypt old state, or do we create a mechanism for re-encrypting all orchestration state so that old keys can be fully decommissioned? These are some of the challenges that the sample should try to address.

Besides creating a reference for users to follow, one of the other outcomes could be changes to the API surface.

Update package name and version for public preview

Recommended name change:

durabletask-sdk --> durabletask-client

This is in line with how we named the .NET Isolated SDK. The reasoning is that this package is simply a "client" for the sidecar process that actually drives the execution.

Recommended version change:

0.1.0 --> 1.0.0-beta.1

Query instance history support

Requirement

Orchestration clients should be able to query for an orchestration instance and also get back the history events for that orchestration, similar to the showHistory flag mentioned here.

Implementation notes

The .NET in-process Durable Functions implementation returned a JSON array. For this work-item, we should instead return something structured - e.g., an array of strongly typed objects.

Non-blocking Activity task implementations

Issue

The TaskActivity interface is currently has a synchronous run method that returns that output of that activity:

public interface TaskActivity {
    Object run(TaskActivityContext ctx);
}

There's currently no way to define a non-blocking, asynchronous activity implementation. This can result in issues related to the scalability of applications if many activity tasks need to be executed concurrently.

Proposal

There should be an alternative mechanism for implementing non-blocking, asynchronous activity tasks. For example:

public interface AsyncTaskActivity {
    CompletionStage<Object> run(TaskActivityContext ctx);
}

The activity will finish when the CompletionStage is completed.

Other considerations

It should be noted that Azure Functions for Java doesn't yet support non-blocking functions: Azure/azure-functions-java-worker#244. Support for non-blocking activity tasks likely won't benefit Azure Functions users until the Java worker for Azure Functions adds async function support. Unfortunately, there's no indication about if or when this support will be added.

Remove builders for non "entry point" classes

This is feedback from the Azure SDK team. The primary feedback is that only "entry point" or "front door" classes, like DurableTaskGrpcClient and DurableTaskGrpcWorker should have builders. We should avoid them for other "non-entry point" classes, based on UX research done over the past 10 years or so.

Builders can be replaced with constructors, getters, and setters. We can still support the fluent code style by having setters return the current object.

Builders to be replaced currently live in the following classes:

  • TaskOptions
  • RetryPolicy
  • PurgeInstanceCriteria
  • OrchestrationStatusQuery,
  • NewOrchestrationInstanceOptions

API review with Azure SDK

Azure SDK has agreed to help review our API surface area for consistency with other Microsoft-related API best practices. This review needs to be completed prior to our GA v1.0.0 release (ideally even before public preview).

Some things we expect will be brought up:

  • General API design guidelines
  • JavaDocs
  • Dependencies, especially dependencies on things like Jackson

Multi-instance query support

Requirement

Orchestration clients should be able to issue filtered queries to the state store for orchestration instances, as described here.

Basic Instructions

There are quite a few steps required to get this working end-to-end involving updating the Java SDK, updating the gRPC contract, updating the sidecar, and updating the Durable Task Framework (DTFx). The good news is that the changes should all be fairly simple. Here are some instructions for how this can be done:

Durable Task Framework updates

The https://github.com/Azure/durabletask repo is where DTFx lives. It defines the core data types and backend storage providers that are common across all architectures. Unfortunately, there is no common abstraction for multi-instance query support. We will need to create this abstraction and then update the existing storage providers to support it. This shouldn't require much work but will require several PRs across several GitHub repos.

  1. Define a new public IOrchestrationServiceQueryClient interface. It can define a single method for querying for multiple orchestration instances. The design needs to be compatible with the existing methods mentioned below so that each storage provider can successfully implement it. The methods below should hopefully all be compatible with each other.

  2. Update the AzureStorageOrchestrationService class in Azure/durabletask to implement this new interface and map it to the existing GetOrchestrationStateAsync method.

  3. Update the SqlOrchestrationService class in microsoft/durabletask-mssql to implement this new interface and map it to the existing GetManyOrchestrationsAsync method.

  4. Update the NetheriteOrchestrationService class in microsoft/durabletask-netherite to implement this new interface and map it to the existing QueryOrchestrationStatesAsync method.

It's not strictly necessary for us to do this work for all backends at once. We can start with (1) and skip ahead to the next section so we can more quickly get to integration testing and validate the design. I suggest we open GitHub issues to separately track the different steps above.

Sidecar Updates

These changes are required for integration testing to be unblocked. This work will also benefit the .NET Isolated support for Durable Functions.

  1. Update the InMemoryOrchestrationService class in the microsoft/durabletask-sidecar repo to implement this new interface. There is no existing method that implements similar functionality, so an implementation will need to be written from scratch. This is an in-memory storage provider so it's fine to just enumerate all the objects in InMemoryInstanceStore.store and return the objects that match the filter. This is the storage provider used by integration tests so this implementation should probably be the first one to work on.

  2. Uncomment and finish implementing the QueryInstances gRPC definition. For local development, these changes can be done directly in the sub-module under the microsoft/durabletask-sidecar project. This API will need to match the API defined in the new IOrchestrationServiceQueryClient method mentioned above.

  3. Using the updated gRPC definition, add a new overload method to the sidecar's TaskHubGrpcServer class. This method will take the IOrchestrationServiceClient object (which should already be accessible) and try to cast it into IOrchestrationServiceQueryClient. If the cast succeeds, then invoke the method. Otherwise, throw a new NotSupportedException with a helpful error message.

Java SDK Updates (finally)

These changes are required specifically for Java.

  1. In the Java SDK, add a new method to DurableTaskClient with a matching method signature for the new multi-instance query gRPC API. The implementation should simply call the new gRPC method that was defined in a previous step.

  2. Several test permutations can be written for integration testing. My suggestion is to take a look at the MultiInstanceQueries tests in the SQL backend project. It has pretty good coverage with a relatively simple test definition. The test could be broken up into multiple tests or kept as one. Either way is fine.

So there's a decent a mount of infrastructure work required for this work item, but it will also benefit the .NET Isolated work as well as any direct users of the Durable Task Framework (which there are many within Microsoft), so the impact is much larger than just our Java users.

GitHub Actions PR validation workflow

Requirement

Each PR should trigger a GitHub Actions workflow that does a build and runs the integration tests. Additionally, the project README file should include a badge that displays the current status of the build, as well as instructions for how developers can do the validations on their local machines.

Approach

The integration tests require a sidecar to be present since the Java orchestrations can't be run end-to-end in the test process. A sidecar implementation exists here and can be packaged as a publicly available Docker container and used for these tests.

Other considerations

The build.gradle file currently disables running these integration tests because of their dependency on an external sidecar. I'm not very familiar with the gradle build system, but we'll need to make sure there is a simple way to execute these tests from the command line.

Entity support

For feature parity, durable entities should be supported at some point.

Add waitForCompletionOrCreateCheckStatusResponse client API

Other languages support a client API that allows you to wait for an orchestration to complete and either a) return the output if it completes within the specified timeout or b) return an HTTP 202 response if the timeout expires. The first release of Java doesn't support this, although it's possible to implement manually.

More information on this feature can be found here.

This is an Azure Functions-specific API that will need to be made to the durabletask-azure-functions package.

Once the API exists, the documentation linked above will need to have its sample updated to reference it.

Rename GRPC protobuf package

The GRPC protobuf package should be renamed to com.microsoft.durabletask.implementation.protobuf as this is for internal use and not public API.

Azure Functions documentation updates

Several Azure Durable Functions articles need to be either written or updated with Java sample code and references.

New content

Tutorials to update

Conceptural content to update

Reference articles

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.