sakerbuild / saker.build Goto Github PK

View Code? Open in Web Editor NEW

53.0 53.0 1.0 4.74 MB

A modern build system focusing on fast and incremental builds for all project sizes

Home Page: https://saker.build

License: GNU General Public License v3.0

Java 99.23% C++ 0.53% C 0.05% Shell 0.19%

build-system distributed-builds incremental-builds saker-build

saker.build's People

Contributors

Stargazers

Watchers

Forkers

deenu713

saker.build's Issues

Allow setting global default inputs for tasks

There are many cases where repeating configuration may occur. In build scripts where the same tasks are invoked multiple time with slightly different inputs, some task inputs need to be duplicated.
An example:

saker.maven.resolve(
    Dependencies: #...
    Configuration: {
        Repositories: #...
    }
)
saker.maven.resolve(
    Dependencies: # different from above
    Configuration: # same as above, but repeated
)

A quick solution for this is to export the value of the recurring configuration into a static or global variable:

static(THE_CONFIGURATION) = # ...
saker.maven.resolve(
    Dependencies: #...
    Configuration: static(THE_CONFIGURATION)
)
saker.maven.resolve(
    Dependencies: # different from above
    Configuration: static(THE_CONFIGURATION)
)

Works, but the developers still needs to assign the same parameter for each task invocation. This makes the build script harder to write and maintain.

A solution should be made to avoid repeating the same inputs for the tasks.

Possible solutions

1. with build system support

We could add support for this in the core build system. This can cross scripting language boundaries, and needs an unified way to define default inputs for given tasks.

This is harder to achieve, as this may limit different script language developments.

2. scripting language support

This solution requires support from each scripting language. This has the advantage that the default values can be handled by the scripting language implementation, therefore the values can be defined in a way similar to the script language.

The location of the build script that contains the default values could be set with a scripting option. This allows full customization form the build language part.

Proposal

Go with option 2. The defaults support should be implemented by each build language by themselves.

For SakerScript:

A new script option is added, that is a path to the defaults file.
E.g.

-SOdefaults.file=defaults.build

The option value is to be interpreted against the working directory of the build execution.

The defaults file should be part of the script language configuration. That is, the language configuration wildcard should match the defaults file as well.

The syntax of the defaults file is similar to normal build script files, additionally with the following rules:

No build targets.
- Only the top level scope is available for expressions
A new built-in build task is added that can only be used in this file
- The defaults() build task. See below.

`defaults()` task

This built-in task can only be used in the defaults build file.

This task has an unnamed first parameter that is a literal or list of literal task names that the defaults apply to. Any additional parameter is the default value for the given task.

The task can only be used as a top level expression. That is, cannot be used in ifs, foreaches, and as nested expressions. It has no return value.

Example

The defaults file:

defaults(
    my.task,
    Parameter: val1
    Default2: 123
)

With a build file:

my.task(
    Second: foo
    Default2: abc
)

Will equal to writing the build file:

my.task(
    Second: foo
    Default2: abc
    Parameter: val1
)

Any parameter that was set as default in the defaults file can be overridden at the call site.

Implementation notes

This should not require any modification to build system API. The defaults file should be read as part of a new internal build task. This ensures that a single defaults file is only read at most once per build execution, and is not re-read in case of incremental builds.

The defaults file should report no build targets.

The script positions in the defaults file should be correctly reported in case of exceptions.

The script modelling implementation should be modified to add proposals newly added defaults() task, but only in the defaults file. Information retrieval for the defaults() task should be available in non-defaults files as well.

Classpath loading test flakiness

There are some unreleased resources for the loaded classpaths by the build system. This may include script and repository classpaths.

Some tests of the build system fails sometimes when the loaded classpath metric is asserted. Each test should clean up the loaded classpaths after themselves, however, there are still failures sometimes for the tests.

The tests are flaky, they don't always fail. This might be an issue with the test metric instrumentation, or the classpath loading.

This is not a breaking issue for using the build system. If confirmed, it may cause higher resource usage for long running daemons, or locked file-system resources if classpaths are loaded from JARs.

When this issue can be closed if fixed?

Well, lets say 2 months after the fix if the issue no longer surfaces. Due to the flakiness of the bug, we can't reliably determine if it has been actually fixed. Or create a test that reproduces the issue.

Build cache

This issue servers as a place of discussion for build cache related implementation.

As of the current state (2020.01.12.), there is a basic implementation of the build cache that passes tests. It is not finished, the build daemons doesn't support this feature yet, and there is no persistence behind the build caches. There is a memory based implementation that is used for testing only.

For the implementation, we should consider the following.

Task requirements

Evaluate what kind of requirements do we impose on tasks that can be cacheable.

Communication

Communication with the build cache can be done in two ways. Either using the saker.rmi library, or using a more common protocol.

Since the usage of the build cache for other purpose than with the saker.build system is not a design goal, the saker.rmi solution seems more appropriate.

Either way, the build cache will be accessible through an abstract interface and the protocol could be replaced without disruption.

Security

The build caches will usually run on a shared server that is accessible from outside. There needs to be some security measures that ensure that only the authorized clients can access data from the cache, and only the authorized clients can publish to the build cache.

The authorization could be implemented using certificates that will be used for an SSL connection with the build cache server. The server examines this certificate, and provides access to the features that the client is allowed to use.

The certificates doesn't need to be issued by some known provider, it can be managed in-house by the maintainer of the build cache. In general, there should be read and write certificates that the server recognizes.

The read certificates can be used to download content from the build cache, but doesn't allow publishing. This can be used by the developers. The write certificate allows publishing to the cache. It should be used on CI servers that publishes the results to the cache. These results can later retrieved by the clients of the cache.

Performance

We need to determine when are the suitable use-cases for the build cache to be used during build execution. If the build cache is contacted for small incremental changes, then it can degrade performance. However, if we only use the build cache for clean project builds, then it may be used too rarely to provide an advantage.

This part is open for discussion. We probably should do some heuristic based cache tries.

Settings

In order for the build cache to work, one very likely needs hash based file change tracking.

When a build is run with a build cache, we could change the default mechanism to be hash based insteda of file attributes. It can be overridden by the user, but the default may be changed.

Persistence

The published build cache data should be persisted by the server. The frequently queried data could be kept in memory.

A mechanism for efficient lookup and storing should be implemented. The build cache works with byte blobs and not structured data. We could either use some third party library/software, or implement our own solution. Using third party software may impose license and other maintenance related restrictions on the build system.

Warn about unrecognized build task input parameters

If an input parameter is specified for a build task, and that task doesn't recognize that parameter, a warning should be issued during build.

This should be done by enhancing the TaskUtils.initParametersOfTask method to warn about the unrecognized parameters.

This enhancement is only applicable if a task uses the default parameter initializing behaviour specified by TaskUtils (I.e. the @SakerInput and related annotations). If a task initializes the parameters manually by overriding ParameterizableTask.initParameters, then it is the responsibility of the task to issue warnings.

The issued warnings should report the position of the build task in the script.

Add option for performing build without incremental state (CI mode)

CI builds are often performed from a clean state, and rarely use incremental builds. This means that there is no reason for the build system to store and persist the incremental state between builds, as they aren't going to be used anyway.

There should be an option that instructs the build system to throw the incremental state that was created during the build away, and don't write it to the disk, or attempt to load it at the start of the build. This could improve the performance of the build, as well as could lower the memory usage somewhat.

This configuration options should also be available for build tasks to query, as they themselves can optimize their behaviour based on this.

Important to note that the build system itself shouldn't automatically configure itself for CI mode by querying environment variables or others. This option should be explicit.

Relevancy

This option could increase the performance of the saker.java.test task, as it will need less instrumentation to run. It would still need some instrumentation, as the file synchronization needs to be performed for the test cases. That could also be avoided if there was an option that marks the tests as not using files. In that case, any kind of instrumentation can be avoided.

Make comma (',') optional as a separator character in build script

Currently we require commas to be used in parameter lists, map entries, list entries, and in other places. Examine if these commas could be omitted in some cases.

In general, if one declares the parameters, entries, elements on separate lines, using commas are not stricly necessary as the new line could be interpreted as a separator.

Example:

my.task(
    Param1: 123,
    Param2: 456,
)
$list = [
    1,
    2,
    3,
]
$map = {
    Key1: val1,
    Key2: val2,
}
buildtarget(
    in inparam,
    in defin = 123,
    out outparam,
    out defoutparam = 456,
)

Becomes:

my.task(
    Param1: 123
    Param2: 456
)
$list = [
    1
    2
    3
]
$map = {
    Key1: val1
    Key2: val2
}
buildtarget(
    in inparam
    in defin = 123
    out outparam
    out defoutparam = 456
)

While this solution is still as readable as the above, it can be easier to edit the script itself in an IDE.
Inserting a new element in the enumeration doesn't require the developer to start with the insertion of a new comma, but rather can just insert the entry with a new line.

Support dynamic reallocation of computation tokens

The build system should support dynamic reallocation of the computation tokens of duplicated inner tasks to increase concurrency.

A scenario when this matters:

Task A runs high number of inner tasks with computation tokens. E.g. C++ compilation task
Task B wants to run with computation tokens.
Task C depends on task B but on on task A.

In cases where task A is started first, it could quickly exhaust all of the computation tokens available on a PC. This causes task B to wait until A is finished as it cannot start due to the lack of computation tokens.

With some additional timing information, we can see how this affects build times.

Task A: 25 CPUmin
Task B: 1 CPUmin
Task C: 4 CPUmin

If the current PC has 5 computation tokens, it will run as follows without token reallocation:
Task A, Task B, Task C = 5 + 1 + 4 = 10 min

With token reallocation, Task B and C can run alongside Task A (as 1 token from A was reallocated to B and C), however, Task A would take longer.
5 mins of running 20 CPUmin of Task A (4 tokens) and 1+4 CPUmin for Task B and C.
1 min for running the remaining 5 CPUmin of Task A.
This totals in 6 minutes that is much less than without reallocation.

Solution is to dynamically reduce the allocated inner task computation tokens of a given task until a minimum of 1 (so it still runs) and reallocate them to new tasks.

Examine possible issues with case-insensitive file systems

Saker.build primarily uses case-sensitive representation of files and their hierarchies. This may cause some synchronization or other file management issues when dealing with case-insensitive file systems.

In general, we expect developers to use proper casing when referencing files. That is, they should reference the files in a case-sensitive way in case-insensitive file systems as well. It is an acceptable error if there is a case-insensitive file conflict during build.

However, there may be issues when the developers change the capitalization of files, therefore triggering build tasks and various other file related operations. (E.g. synchronization) These effects and possible scenarios should be examined as part of this issue.

Context based build task invocation

Feature description

The build tasks in the script could be declared based on the enclosing task context. It should reduce the boilerplate that surrounds a task invocation. E.g.:

some.task(
    Configuration: .config(data),
    # note the two dots
    Parameter: ..foo.bar(baz),
)

The above would be equivalent to:

some.task(
    Configuration: some.task.config(data),
    Parameter: some.foo.bar(baz),
)

Starting a task declaration with a dot can mean that the following task identifier is to be interpreted against the enclosing task.

Workarounds
Use fully qualified task names.

Use-case
For long task names this can reduce the boilerplate surrounding it. This is mostly prevalent when tasks are called in a way that they can be configured with other tasks that bear the same name.

Non-goals

The solution should not take multiple levels of task declarations into account. The solution should only be a syntactic enhancement, and the behaviour of actual task invocation and name resolution should not change.

Other aspects

We should also examine how this feature integrates with task names. Whether or not we should allow multiple dots at the start of task identifiers, and what may be the limit of it. In other aspects, we may not want to use too many preceeding dots, as that would limit readability.

Deadlock detection fails if threads are transitively waited for

Given the following scenario:

The build task T is being invoked.
The task starts a new thread W.
The thread W starts to wait for other task D. (However, the task D is never started.)
The build task attempts to wait for W.
The execution deadlocks, as the waiting for D will never complete. However, the deadlock is not detected by the build system.

The build execution can be stopped by manually interrupting the execution thread. This can be done inside an IDE, however, it's unclear for command line execution.

This behaviour can be distruptive in case of CI builds, as the build will never stop.

Workaround

Don't configure the build to deadlock. It can be usually avoided in a straightforward way.
Don't implement tasks that concurrently perform waiting for threads and waiting for tasks.

Solution

In general, advising task authors to perform the waiting for input task first, and do the work last should be enough. This is already the recommended workflow for build tasks implementations, therefore there won't be much change.

Another solution is to allow the above transitive waiting, but require the build task authors to delegete the Thread.join call through the build system. In this case we can detect the number of waiting threads.

Notice

There's a chance that this issue may remain open for a prolonged amount of time and be a known bug of saker.build. Generally, this is a rarely occurring bug that can be mitigated by proper implementations of the build tasks. The delegating through build system solution is still a viable partial solution that has a high chance to be implemented.

Allow directly calling build targets in same script

We should be able to call a build target in the same file directly as a build task rather than using the include task.

That is, the following:

build() {
    $compile = include(compile, Input: 123)
}
compile(
    in input
) {
    # ...
}

Should turn into this:

build() {
    $compile = compile(Input: 123)
}
compile(
    in input
) {
    # ...
}

Reasoning

It's much more simpler. The intention is clear, and it can't really be confused with tasks that come from other places.

We already limit task names which consist of only a single component to be reserved by the scripting language. This enhancement takes advantage of this as it automatically includes the declared build targets to be directly callable.

Conflicts

If the user declares a build target that conflicts in name with already existing builtin tasks, then the name resolution would be ambiguous.

In this case the build target should not shadow the builtin tasks. That is, they cannot be replaced by a build target.

In cases where the user declares a build target with the same name as a builtin task, a warning can be emitted.

Reasons:

The main reason why we don't allow shadowing builtin tasks is that modifying one part of the build script should not affect the behaviour of another part without clear indication. This makes the build scripts more maintainable.

Revise deadlock detection

The deadlock detection mechanism of saker.build depends on the ThreadGroup API. The code itself is not really robust as lingering threads may cause the deadlock not to be noticed. Additionally, the issue #2 is also a side effect of the current mechanism.

As per https://mail.openjdk.java.net/pipermail/loom-dev/2020-July/001471.html, the intetion for the ThreadGroup API is to be deprecated over time. It also includes heavy synchronization and may not be reliable enough for proper use for deadlock detection. The solution for this is to revise the current implementation and come up with some different mechanism for deadlock detection. This may include placing additional restrictions on the build tasks.

One solution might be is to only allow waiting for other tasks on the main thread of a task. The main thread is the one that Task.run() is invoked on. This could make it easier to detect the deadlocks as only a single thread needs to be kept in check for the detection.

In order for this to work additional APIs may need to be added to allow waiting for multiple tasks at once. Existing task implementations also need to be checked and tested to be conforming.

With this solution the build will be considered as deadlocked if all running main threads are in waiting state.

This would also make the deadlock detection faster as no polling would be required, the deadlock would be detected instantaneously as the last thread enters the waiting state.

The solution would still allow retrieving task results without waiting on worker threads. E.g. getFinished should work.

Inner task threads also need to be checked for deadlock. Starting new tasks should also be constrained to the main task threads or main inner task threads.

AL2.0 and/or MIT

Is there a chance to change the license of Saker and related projects to use a permissive license like AL2.0 and/or MIT.

Build execution deadlocks if the executor machine is also present as a cluster

If the build is configured to have the build executor machine present as a cluster, then the build may deadlock as the machine will not be able to initialize as a cluster.

This is caused by the common locking scheme that is employed for execution and cluster initialization. The build actually finishes successfully, however, the cluster initialization of the executor machine will halt.

Workaround

Don't configure the executor machine as a cluster.

Solution

Fix the locking scheme in SakerProjectCache class.

java.lang.NoSuchFieldError: VERSION_FULL_COMPOUND when running saker build

Describe the bug
I was following the guide on how to publish packages to maven and the error occurred
To Reproduce
To reproduce follow This
Expected behavior
It to successfully build the jar into build/saker.jar.create/output.jar
Environment information

Operating System: Linux, Ubuntu 20.04
Build system full version: 0.8.0
Java runtime version:
openjdk version "1.8.0_265"
OpenJDK Runtime Environment (build 1.8.0_265-8u265-b01-0ubuntu2~20.04-b01)
OpenJDK 64-Bit Server VM (build 25.265-b01, mixed mode)
IDE: Command Line

Bug nature
Build execution
$ java -jar saker.build.jar -bd build export
java.lang.NoSuchFieldError: VERSION_FULL_COMPOUND
at saker.build:9:14-79
at saker.build:7:11-152
at saker.build:7:2-152
at saker.build:4:1-182
at saker.build:20:11-26
at saker.build:20:11-33
at saker.build:20:2-33
at saker.build:17:1-173
at saker.maven.classpath.main.MavenClassPathTaskFactory$1.run(MavenClassPathTaskFactory.java:99)
Exception in thread "main" saker.build.exception.BuildExecutionFailedException: saker.build.task.exception.MultiTaskExecutionFailedException: BuildTargetBootstrapperTaskIdentifier[buildFilePath=wd:/saker.build, buildTargetName=export, workingDirectory=wd:, buildDirectory=]
at saker.build.launching.BuildCommand.runBuild(BuildCommand.java:661)
at saker.build.launching.BuildCommand.call(BuildCommand.java:558)
at saker.build.launching.Launcher.lambda$parse$1(Launcher.java:1404)
at saker.build.launching.Launcher.callCommand(Launcher.java:1807)
at saker.build.launching.Launcher.main(Launcher.java:1785)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at saker.build.launching.Main.main(Main.java:67)
Build configuration
https://github.com/Sipkab/example-gh-maven
cloned to https://github.com/unlimitedcoder2/example-gh-maven
so I could upload the package to gh maven

Use locking primitives instead of synchronized blocks

Project Loom introduces virtual threads for the JVM. It doesn't support unscheduling virtual threads that are in a synchronized block. As virtual threads will be a very important feature for the build system, and it can greatly improve performance, it is necessary to switch to using locking primitives instead of synchronized blocks where appropriate.

The usages of synchronized blocks should be refactored where I/O or RMI operations are expected to happen.

Switch expression in build script

Add syntax for switch expressions in build script.

One commonly occurring scenario is to assign some variable or perform some action based on the value of another variable. Currently this can be done using if-else statements:

$platform = # ...
if $platform == win32 {
    # ...
} else if $platform == macos || $platform == ios {
    # ...
} else {
    abort("Unrecognized platform { $platform }")
}

This is harder to maintain as the variable name needs to be repeated and is uncomfortable to edit.
This can be solved by introducing switch expressions.

switch $platform {
    case win32 {
        # ...
    }
    case macos, ios {
        # ...
    }
    default {
        abort("Unrecognized platform { $platform }")
    }
}

There is no fallthrough between the case blocks.
Multiple values to match for can be separated using a comma.

The statement basically checks the value of the subject for equality with each case label in order and executing the first one that matches.

The execution order is the following:

Start the subject statement task and the labels all at once
- If the label declarations contain complex expressions, they will always be executed.
With the value of the subject, check each label for equality
Execute the first one that equals
If no equal label is found, run the default branch if present.

Further improvements

The switch expression can also be improved to return a value. In this case the block may be omitted, and a returned value can be represented with a separator colon :. Similarly to the foreach expression:

$value = switch $platform {
    case win32: 123
    case macos, ios: 456
    default: 999
}

It is not a syntactic error to not declare a return value for a given label, in that case the expression will return no-value that will result in an exception when its dereferenced.

The result value can be combined with blocks:

$value = switch $platform {
    case win32 {
        print("platform is win32")
    }: 123
    case macos, ios: 456
    default {
        abort("Unrecognized platform { $platform }")
    }
}

Essentially both the block and result value are optional, but at least one of them must be present. (Similarly to foreach)
No local variables can be declared for switch expression blocks. (As they don't run in a loop, target-level variables can be used.)

Support rebuilding a subset of the outputs

Analyze if rerunning a subset of the tasks given a set of inputs is possible.

Example:

A previous run of a build gave multiple files as the output: A, B, C...
Run an incremental build by giving a subset of the files that should be rebuilt as a new build configuration option. E.g. B, C
Run only the necessary tasks to rebuild the files B and C, but possibly avoid rerunning tasks that are necessary to rebuild A.

Build trace

This issue servers as a place of discussion for build trace related implementation.

A build trace is a collection of data about a build execution. It collects information about various aspects of the build and presents it to the user in a way that allows easier debugging of performance and build issues.

The build trace should provide the following information:

Tasks visualized on a timeline chart.
Task dependencies on other tasks.
Task environment and execution property dependencies.
Task inner task executions.
Task delta and incrementality information.
Task remote dispatching information.
- Where the tasks were dispatched. Why they were dispatched there, and why not if not.
Task file dependency information.
Report possible file dependency conflicts.
Other task related meta-datas.

Implementation

The build trace implementation should consist of two parts.

The tracing component that collects the build information and persists/provides them in a defined format.
The rendering component that takes the build information and presents it to the user in a specific way. (E.g. HTML webpage as that is easily portable)
- The build trace should be viewable offline and shouldn't require a server or internet connection.
- The rendering component should be pluggable.

sakerbuild / saker.build Goto Github PK

saker.build's People

Contributors

Stargazers

Watchers

Forkers

saker.build's Issues

Possible solutions

1. with build system support

2. scripting language support

Proposal

defaults() task

Example

Implementation notes

Workaround

Solution

Notice

Workaround

Solution

Recommend Projects

Recommend Topics

Recommend Org

`defaults()` task