portals-project / portals Goto Github PK

View Code? Open in Web Editor NEW

17.0 6.0 1.0 937 KB

Portals is a framework for stateful serverless apps, unifying dataflow streaming with actors

Home Page: https://www.portals-project.org/

License: Apache License 2.0

Scala 95.05% JavaScript 4.79% Shell 0.07% Dockerfile 0.10%

actors-framework data serverless-framework streams dataflow distributed fault-tolerance stateful worklflows

portals's Introduction

Portals

Project Information

Portals is a framework written in Scala under the Apache 2.0 License for stateful serverless applications.

The Portals framework aims to unify the distributed dataflow streaming model with the actor model, providing flexibility, data-parallel processing capabilities and strong guarantees. The framework is designed to be used in a serverless environment, where the user can focus on the business logic of the application, while the framework takes care of the infrastructure and failure management.

Key features:

Multi-dataflow applications: define, connect and compose multiple dataflows into complex services.
Inter-dataflow services, the Portal service abstraction: expose dataflows as services that can be used by other dataflows.
Decentralized cloud/edge execution: API primitives for connecting runtimes, and deploying on edge/cloud devices.

Find out more about Portals at https://portals-project.org.

Note

Disclaimer: Portals is a research project under development and not yet ready for production use. This repository contains a single-node runtime for testing and development purposes. A distributed runtime is currently under development.

Project Status and Roadmap

The Portals project is currently in the early stages of development. We are working towards a first release with ongoing work on a distributed decentralized runtime. Some of this preliminary work is done in parallel on other (private) repositories. We have planned a release for this fall 2023. Besides these ongoing developments, the release will include a Scala API, JS API, Interpreter, Compiler, Benchmarks, and Examples.

Note

Features that are currently in development are marked as experimental and are likely to change.

Project Setup

To use Portals in your project, add the following dependecy to your build.sbt file:

libraryDependencies += "org.portals-project" %% "portals" % "0.1.0-RC1"

A full project setup with instructions for executing a hello world example is available at https://github.com/portals-project/Hello-World.

Note

Portals has not yet been published to Maven Central. It can be published locally using the sbt publishLocal command. To use Portals in your project, import the local snapshot instead: libraryDependencies += "org.portals-project" %% "portals-core" % "0.1.0-SNAPSHOT".

Getting Started Guide

We recommend the following steps to get started.

Install Scala, we recommend working with sbt, together with Metals on VS Code.
Clone the Hello World repository.
Compile and run the project sbt compile;, sbt run;.
To get some inspiration, check out the examples or read the tutorial.

You can find further reading material on the website.

Examples

The Portals library comes with an API for defining multi-dataflow applications, and a serverless runtime for executing these applications. The most basic example would involve defining a workflow and a generator within the context of a PortalsApp, and executing this on the test runtime interpreter.

import portals.api.dsl.DSL.*
import portals.system.Systems
object HelloWorld extends App:
  val app = PortalsApp("HelloWorld"):
    val generator = Generators.fromList(List("Hello World!"))
    val workflow = Workflows[String, String]()
      .source(generator.stream)
      .map(_.toUpperCase())
      .logger()
      .sink()
      .freeze()
  val system = Systems.test()
  system.launch(app)
  system.stepUntilComplete()
  system.shutdown()

Check out an extensive Tutorial and the Examples Directory for more examples.

Support and Contact

For help or questions, contact the Portals developers and community on the Portals Google Groups mailing list.

If you find a bug in Portals, then open an issue.

Contributing

If you are interested in contributing to the project, please check out our contributing guidelines.

Cite Our Work

If you want to cite our work, please consider citing the following publication:

Jonas Spenger, Paris Carbone, and Philipp Haller. 2022. Portals: An Extension of Dataflow Streaming for Stateful Serverless. In Proceedings of the 2022 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! ’22), December 8-10, 2022, Auckland, New Zealand. ACM, New York, NY, USA, 19 pages. https://doi.org/10.1145/3563835.3567664

portals's People

Contributors

Stargazers

Watchers

Forkers

jspenger

portals's Issues

System registry

There is currently no access to the registry via the PortalsSystem. This was hacked in for the current version of the portalsjs branch, to enable accessing the registry for interactive querying. We should implement this in a cleaner way, and expose it from the PortalsSystem trait as a method with a similar interface as the registry in the builder, with the additional capability to also return everything in the registry.

After some time, it seems that we can do without a Registry for now. We are leaving this issue as discussion, as a discussion topic on whether it is really something that we want to include in the API.

Add link to Portals project website to README

Widen the Task Type when calling withWrapper

Calling withWrapper should widen the Task Type.

Example:

Tasks.map[Int, Int] { _ + 5 }
  .withWrapper[String]{ ctx ?=> wrapped => event =>
    if event < 3 then ctx.emit("hello") else wrapped(event)
  }

The final type of this Task should be from Int to Int | String. The same applies for the method on the FlowBuilder.

OnAtomComplete or OnAtomPreComplete/OnAtomPostComplete

Currently we have implemented only the eventhandler onAtomComplete on the Tasks. We should consider either implementing the promised handlers OnAtomPreComplete/OnAtomPostComplete; or if it is sufficient to keep the onAtomComplete handler as is.

Automatic code formatting improvements

Using the automatic code formatting sbt scalafmt is currently not setup in a way that formats the code to look nice. We should either modify the configuration for it to make it format it in a way that works with our code, or reduce the amount of formatting that it does, so that we have more control on the final look.

Examples

Examples are useful both for showcasing what our programming model can do, and also for planning and discussions on what features to include in our programming model.

The following example programs should be implemented and documented.

Canonical Stateful Serverless Applications

Shopping cart
Bank account
Social media / twitter clone

Other examples

Virtual actors
kTables + CSC + SQL (Optional)
Real-time analytics (e.g. Uber) (Optional)

Portals with Portals

We should implement Portals on Portals :). More information to be added.

Docs: API and Documentation

We should have the API and documentation be conveniently be available.

A first suggestion is to use scaladoc to generate both the API docs and the documentation docs. See, e.g., https://github.com/lampepfl/dotty/tree/main/docs and https://dotty.epfl.ch/.

Should have an API that waits for multiple futures

Users are only allowed to attach a handler to the response of a single Future. When multiple requests are sent from one task, we don't yet have an API to wait until all the responses are received.

Such code will not work because the future result will be cleared after the future handler exits

await(future1) {
    await(future2) {
        // future1.get will return None here
    }
}

A workaround looks like this

var v1 = null
await(future1) {
    v1 = future1.value.get
}

await(future2) {
    // can use v1 here
}

Tests should test both the parallel and the synchronous runtimes.

The tests currently either test only the parallel or only the synchronous runtimes. To test for both, the current practice is to modify the code in Systems.scala to use one or the other runtime backend that we want to test. It would be better if we test both runtimes in the tests, such that running sbt test will always test both.

Instead of what we are doing now, we should add functionality so that we can run a test case on all runtime backends, we should also leave open the possibility to select a specific runtime backend. This would entail adding some utility in the TestUtils that enables this form of execution. In doing this change, we should also cleanup the Systems.scala file, and add a Systems.default runtime option.

Scalafmt width

The current width setting of the scalafmt configuration is maxColumn = 120, this leads to very wide code. We should consider slimming it down to either 80 or 100.

Implementing Other Programming Models

We want to implement other programming models in Portals, to demonstrate and exercise the expressiveness of Portals. The following programming models should be implemented as API extensions of Portals as a first start.

Virtual Actors

State backend

We should integrate some state backend into Portals, such as RocksDB / LevelDB. For this we can reuse previous code from previous projects.

TaskStash

Similar to the Actor Stash, we should implement a TaskStash, that allows the user to more easily stash and unstash events over a per-key context. Example: https://doc.akka.io/docs/akka/current/typed/stash.html.

Package structure

Right now the package structure is mostly flat, as most everything resides in the portals package. This should be made hierarchical, and mirror the directory structure.

Get Started Contributing

Hi! Thanks for showing interest in contributing :). To get started, we recommend the following steps.

Clone the repository.
Compile: sbt compile.
Take a look at and execute some examples.
Execute the tests: sbt test.
Read up on our contribution guidelines.

Saga Transactions in Portals, Design Choice

I managed to get a first version of Saga-transactions working with nested Portals over the weekend. However, there is a potential downside to the current solution. There are two ways which we could go about this, I think both have pros / cons. We should decide on one of them to continue.

Both solutions allow a task to act both as a requester and as a replier, that is, it can send asks to some portal, but it can also handle requests and reply from another (or potentially the same portal).

1

We allow a task to call "reply" from within an "await" block. The flow which could cause this between Tasks A, B, C is the following:
A -- Ask --> B
B -- Receive Ask --
B -- Ask --> C
B -- Await --
C -- Receive Ask --
C -- Reply --
B -- Receive Reply, Exec Continuation --
B -- Reply Within Continuation --
A -- Receive Reply, ... ---

The problem with this first approach is that we need to save together with the Continuation also the Context from the Ask that caused the Continuation, this would include identifying information on the Asking-Key, Task, Workflow. This information may be somewhat heavy to keep for every continuation.

On the bonus side, this allows us to express in a nice way the flow of messages.

In this first model, we can express a Saga transaction as an unfolding sequence of requests, for which each link in the chain will pre-commit to the transaction. Once this chain has reached its end, each task can commit, and fold back by calling "reply" from within the continuation.

2

The second approach does not allow the continuation to capture the ask-handling-context, that is, it should not be possible to "reply" from within a "continuation" block.

This has the obvious benefit that we do not need to keep track of the asking context together with the continuation.

On the downside, this limits the expressivity of the model. In particular, we cannot express the Saga pattern anymore by simply allowing Tasks to Requesting, Awaiting the response, and Replying.

In this second model we can still express Saga transactions, however in a style akin towards programming with typed actors. The requester can together with its request also send a "replyTo" address; to replyer can save this replyTo address to some local state, such that it can reply back to the corresponding address later, when needed. Indeed, doing this we can emulate this unfolding and folding sequence of pre-commit and commits. However, it will be less natural (not via request/reply), but via request/request-to-reply-address.

TL;DR

The first option 1) allows us to express the Saga pattern naturally as request/reply steps; this, however, requires us to capture information about the caller in the continuation (potentially inefficient). Whereas 2) allows us to express the Saga pattern as we would in a typed Actor model without a reply function (for this we need to submit the reply address in the message); this may be more efficiently handled by the runtime, but perhaps more awkward to program. Personally, I think that we could go with 1.

Make TaskBuilder accessible from DSL

Currently the TaskBuilder cannot be accessed from the DSL, but rather has to be accessed from portals.application.task.TaskBuilder. It would be better if all the relevant methods for building an application are available directly from the DSL.

Therefore I suggest adding a Tasks entry point in the DSL that returns the TaskBuilder. This is also consistent with the naming convention currently in use.

Splitter

We should implement a splitter, that takes a multi-port stream, and splits it into separate single-port streams. This depende on first finishing the Ports issue #25.

Check for unused imports

Unused imports have been spotted here and there in the code. These should be cleaned up :).

Integrate Spores3 into the Portals Actor library

It would be a great addition to integrate Spores3 into the actor library, such that the "behaviors" are "spores".

An issue was encountered on a small test, as there seemed to be some issues to integrated typed ActorRefs. The following example illustrates this (at the time of writing, the example would cause an error for the Spore with the typed actor ref).

import com.phaller.spores.Spore

object Untyped extends App:
  case class ActorRef(key: Int)
  case class Message(msg: Int, aref: ActorRef)

  val sp = Spore[Message, Unit] {x => x match
    case Message(key, aref) => 
      println(x)
  }

  val aref = ActorRef(0)
  sp(Message(1, aref))

object Typed extends App:
  case class ActorRef[T](key: Int)
  case class Message(msg: Int, aref: ActorRef[Message])
  
  val sp = Spore[Message, Unit] {x => x match
    case Message(key, aref) => 
      println(x)
  }

  val aref = ActorRef[Message](0)
  sp(Message(1, aref))

Error message:

Exception occurred while executing macro expansion.
java.lang.AssertionError: NoDenotation.owner
	at dotty.tools.dotc.core.SymDenotations$NoDenotation$.owner(SymDenotations.scala:2511)
	at scala.quoted.runtime.impl.QuotesImpl$reflect$SymbolMethods$.owner(QuotesImpl.scala:2497)
	at scala.quoted.runtime.impl.QuotesImpl$reflect$SymbolMethods$.owner(QuotesImpl.scala:2497)
	at com.phaller.spores.Spore$package$.$anonfun$3(Spore.scala:76)
	at scala.collection.immutable.List.map(List.scala:250)
	at com.phaller.spores.Spore$package$.checkCaptures$1(Spore.scala:76)
	at com.phaller.spores.Spore$package$.checkBodyExpr(Spore.scala:121)
	at com.phaller.spores.Spore$.applyCode(Spore.scala:299)
	at com.phaller.spores.Spore$.inline$applyCode(Spore.scala:298)

Well-formedness check

We need to check the application for well-formedness before it is launched. Some of the information required for the check is dynamic, such as checking for naming conflicts.

The check method of the ApplicationBuilder should be implemented to perform all checks that we can perform with knowledge of only the application. Further, we also need to perform a check during deployment on the runtime to check if the declared dependencies exist, this should be implemented on the runtime system.

Data parallel execution.

The portals parallel execution runtime should also be data parallel. We have already implemented a data parallel runtime for the benchmarks, it however does not work with shuffles (when a key is modified by the workflow). Although we can reuse some of the ideas from the implemented data parallel runtime, we should consider rewriting it from scratch.

TaskExtensions should be in an object

The TaskExtensions should all be p[ut in an object TaskExtensions, currently they are not.

Update contribution guidelines

We should update the contribution guidelines, and remove the reference to the Akka project.

Sequencer, Rework?

The way in which a sequencer currently chooses the next atom in the sequence is strange. It takes a list of StreamRefs which have atoms available, and returns either Some StreamRef if a choice was made, or None if no choice was made. We should rethink this, if there is a better way to express this?

Actor runtime config option

Add a config to the actor runtime, such that it is possible to disable logging, for example. The config should be an optional argument to the ActorWorkflow factory method. It is sufficient, for now, to only have "log-level" as an argument. Disable logging for the tests.

Broadcast operators and broadcast events

There is currently no "broadcast" operator or event type in Portals. Having a broadcast operator would enable sending a message to all instances of a task. This can be used to set some state on all instances of an operator. Other use cases are to send a Portal request to all instances of a task. I think it is fairly clear that this operation can be very beneficial. However, it is not clear what the semantics of this operation should be, and what the operator should look like, etc. This will need further deliberation.

Merge distributed examples into examples

We should merge the distributed examples directory into the examples directory as both contain examples and it makes little sense to have two separate projects for this.

Implement the SAVINA Actor Benchmark

At some point we should implement the SAVINA actor benchmark.

PortalsJS logger should print time

The PortalsJS logger should print the time, which it currently does not. The general format of it should mimic that of the logger used in the Portals Core branch.

See
https://github.com/portals-project/portals/blob/portalsjs/core/src/main/scala/portals/util/Logger.scala

One alternative is to either insert some time stamps in the logger output; another alternative is to replace the logger that is used in the core branch by a logger which is supported both by the portalscore main branch and the portalsjs branch.

The logger always prints the same TaskId

The logger will currently always log the same TaskId for all tasks in a workflow. The following example in the playground shows this error, in which the first and second logger are in different tasks, yet print the same taskid:

var builder = PortalsJS.ApplicationBuilder("rangeFilter")
var _ = builder.workflows
  .source(builder.generators.fromRange(0, 1024, 8).stream)
  .logger()
  .filter(x => x % 2 == 0)
  .logger()
  .sink()
  .freeze()
var rangeFilter = builder.build()
var system = PortalsJS.System()
system.launch(rangeFilter)
system.stepUntilComplete()

The problem is that the TaskExecutor initializes a single shared execution context for all tasks in the workflow (for the interpreter) https://github.com/portals-project/portals/blob/main/core/src/main/scala/portals/runtime/executor/TaskExecutorImpl.scala#L22; and this is then only assigned once lazily https://github.com/portals-project/portals/blob/main/core/src/main/scala/portals/application/task/TaskContextImpl.scala#L38.

Benchmark tests

We should add a test suite for the benchmarks, so that we can also test that the benchmarks are correct / running. This should be done after refactoring the benchmarks into a separate project (see #2).

Test/Synchronous Runtime Rewrite

The current synchronous runtime suffers from being overly complicated, albeit it has good performance. We should instead rewrite it with the aim of making it easy to understand and mimicking the core atomic processing contract.

Split Project Build Into Multiple Projects

We should split the project build into multiple projects instead. I suggest the following structure.

core, with the Portals Core.
benchmarks, with various Portals Benchmarks.
examples, with executable examples and samples of Portals Applications.

Kafka/Pravega API

We need a distributed persistent log abstraction that we can use, based either on Kafka or Pravega. The API should allow us to interact with the log to perform two-phase commits. We also need some functions to deploy new logs, and to modify the deployments.

For Kafka, we can look at the Flink Kafka Producer https://nightlies.apache.org/flink/flink-docs-release-1.11/api/java/org/apache/flink/streaming/connectors/kafka/internal/FlinkKafkaProducer.html.

Runtime Names: Async, Sync

We currently name our synchronous runtime sync and parallel runtime async, however this is confusing. We should choose better names, and then rename the systems accordingly.

A first suggestion on new names is to rename the current sync runtime -> test instead. And to rename the async runtime -> parallel (or local). Further, the parallel runtime should be set to be the default runtime: Systems.default().

Obfuscation script for NodeJS

We currently manually obfuscate the PortalsJS code via a website, instead it would be good practice to have a nodejs script that can be called instead. This is to be used on the PortalsJS branch to obfuscate the generated js code.

More information on the obfuscation can be found at https://github.com/portals-project/portals/tree/portalsjs/portals-js.

The default params from https://obfuscator.io/ are a good guideline. The code can be obfuscated using https://github.com/javascript-obfuscator/javascript-obfuscator.

Interactive shell

There should be an implementation of an interactive shell that allows us to perform live interactive queries with the portals system. As first suggestions this should involve querying the runtime for what are the current running applications, and what atomic streams are available. Further, it should be possible to send events into the system, and read events, and also to connect to a portal to send queries and receive the query responses. We should first come up with some concrete use cases of how we want this interaction to be.

Ports

We should implement ports. Atomic streams can span multiple lanes, each lane corresponds to a port. For this, we also need the ability to process such composite streams, by having Workflows and Tasks also support Ports.

We have done some initial testing, and it seems like Tuple Types are well suited for this in Scala 3. We should finalize the ports test, and then integrate it into the current API, and runtime.

Portals URL should point to www.

The current Portals website runs on https://portals-project.org; we should make it so that it is accessed on www.portals-project.org instead. This way we can also host the docs on https://docs.portals-project.org, for example.

Problems with nesting portal tasks within an Init

There is a pervasive issue with nesting portal tasks within an init. The problem is that the type of the task will be init and not one of the portal tasks (asker, replier); this is a problem as launching the tasks corretly in the runtime requires knowing this information. For this reason, there is a compilation well-formedness check that checks for this.

A better solution would be to leak the information about the nested task-type to the outer init, such that we can access this information in the compilation and the runtime dispatching.

Example:

TaskBuilder.init[Int, Nothing] {
  // initialize the aggregate
  val sum = PerTaskState[Int]("sum", 0)
  TaskBuilder
    .replier[Int, Nothing, Query, QueryReply](portal) { x =>
      sum.set(sum.get() + x) // aggregates sum
    } { case Query() =>
      reply(QueryReply(PerTaskState("sum", 0).get()))
    }
}

The example will be problematic down the line, as explained above, it will not be recognized as a replier by the dispatcher or by the compiler.

Portals Tutorial

We need a Portals tutorial, that steps through the Portals programming model, and that showcases the core features of Portals. This is then to be integrated with the docs #3, and also to be provided as a stand-alone worksheet or similar for giving step-by-step tutorials on the Portals programming model.

Portals-Project or Portals Project

I have spotted both version in various places of the docs, website, etc. We should decide on one or the other, perhaps "Portals Project" without the hyphen makes more sense.

Replace Akka dependency

We should stop using Akka due to its coming license change https://www.lightbend.com/blog/why-we-are-changing-the-license-for-akka. For now we can continue using version 2.6.20, and later transition to any forks, such as: https://github.com/mdedetrich/akka-apache-project.

PerTaskState Reserved Key

PerTaskState uses a reserved key for the moment for the Task State, see https://github.com/portals-project/portals/blob/main/core/src/main/scala/portals/tasks/TaskStates.scala#L50. This is strange, and will cause a bug if a key has the same value as the reserved key. We should ensure that the PerTaskState is always disjoint from the PerKeyState, through some more resilient means.

portals-project / portals Goto Github PK

portals's Introduction

Portals

Project Information

Project Status and Roadmap

Project Setup

Getting Started Guide

Examples

Support and Contact

Contributing

Cite Our Work

portals's People

Contributors

Stargazers

Watchers

Forkers

portals's Issues

Canonical Stateful Serverless Applications

Other examples

1

2

TL;DR

Recommend Projects

Recommend Topics

Recommend Org