Code Monkey home page Code Monkey logo

docker-images's People

Contributors

adwk67 avatar dependabot[bot] avatar fhennig avatar jimvin avatar lfrancke avatar maleware avatar maltesander avatar nightkr avatar razvan avatar renovate[bot] avatar sbernauer avatar siegfriedweber avatar soenkeliebau avatar stackable-bot avatar zaultooz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

zaultooz

docker-images's Issues

Activate python/pyspark in spark image

As a user I would like to be able to use the spark cluster for python-based jobs. This requires that pyspark (and python) are made available in the docker image.

Summary

The current spark docker-image contains the entire spark code and can be used with scala (spark-shell) or sql (spark-sql): pyspark is also installed but to use it requires the following:

  • python3 (>=3.6)
  • pip (during image creation)
  • pyspark dependencies, as listed in this list
  • (plus other libs TBD)

Notes

Some things which I've quickly tested:

  • to install pip and python
RUN microdnf update && \
    ...
    microdnf install python3 python3-pip && \
    ...
  • setting paths etc.
RUN ln -s /usr/bin/python3 /usr/bin/python
RUN ln -s /usr/bin/pip3 /usr/bin/pip

ENV SPARK_HOME=/stackable/spark
ENV PATH=$SPARK_HOME:$PATH:/bin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
ENV PYSPARK_PYTHON=/usr/bin/python
ENV PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
  • installing dependencies with pip: doing this as root triggers a warning, but results in permissions erros otherwise and we are not using any virtualenvs anyway. Maybe there is a better way of doing this:
# pyspark dependencies
USER root
RUN pip install --upgrade pip
RUN pip install --no-cache-dir numpy pandas pyarrow

USER stackable
...

Rationale

In the context MLOps it is planned to provide one or more environments for the following tasks:

  • feature engineering
  • model training
    • distributed computations (in line or partitioned feature creation)
    • hyperparameter tuning (a grid-search type arrangement where the model + one combination of parameters is sent to a single worker)
  • model management (experimentation, versioning etc.)
  • model pipelines

Many of these use python libraries.

Add hadoop-fuse to the Hadoop HDFS image

The fuse binary is not built and shipped by default by the Hadoop project. That means we'll have to build it ourselves and include it in the image.

cd hadoop-hdfs-project/hadoop-hdfs-native-client
mvn clean package -P native

Results in:

[WARNING] make[2]: *** [main/native/libhdfspp/lib/fs/CMakeFiles/fs_obj.dir/build.make:104: main/native/libhdfspp/lib/fs/CMakeFiles/fs_obj.dir/filehandle.cc.o] Error 1
[WARNING] make[2]: *** Waiting for unfinished jobs....
[WARNING] make[1]: *** [CMakeFiles/Makefile2:2629: main/native/libhdfspp/lib/fs/CMakeFiles/fs_obj.dir/all] Error 2
[WARNING] make[1]: *** Waiting for unfinished jobs....
[WARNING] make: *** [Makefile:146: all] Error 2

plus more stuff before that.

Devise an LTS strategy for our Docker Images

Context: For our 22.11 release we wanted to upgrade base images for our older product images. This raised the question of how far back we should do this and whether we should do it for all images etc.

An outcome of this could also be to keep everything updated for now until we find a reason not to do this anymore.

Implement new image tagging scheme

Images should be tagged using the following scheme:

tag: version - dependencies - stackable_version
version: \d+(\.\d+)?
dependencies: dep (-dep)?
dep: name version
name: [a-z]+
stackable_version: "stackable" version

For example, the full name of the Kafka image should be: docker.stackable.tech/stackable/kafka:2.6.1-scala2.13-stackable0.0.1

Refactoring Docker Image Versioning and Build System

The current purely tag based system for tracking product and image versions does not allow us to update older product images with newer base images, without also upgrading the product version.

Goal

We implement a system that supports upgrading base images for older product images. This system should be easy to use without much repetitive manual work.

New Branch System

In the new branching system, we will maintain a release branch per platform release, i.e. a branch release-22.09, release-22.11.
On these branches we will tag platform versions, i.e. 22.11.0-rc1, 22.11.0, 22.11.1.

As we now tag a whole platform release instead of individual products, the build system/GitHub actions should build all images from these tags.

Next Steps

Roadmap

As this issue turned out to be more complex than anticipated, we decided to divide it into multiple stages.

Keep our next Release current under the new Scheme

LTS Strategy and Upgrading older Images

Do we want to upgrade base images for all previous release, all the time, indefinitely? This is to be decided. Upgrading older product images is not strictly necessary for the next release, so we decoupled this.

Lastly, manual steps should be automated as best we can.


Old Ticket Content

Description

As part of our release tracking epic we have an item to update the ubi8 base image version in a new release cycle.
As this affects all supported product images this can potentially require quite a bit of manual effort and we think is worth creating an automation for.

Since releasing an image is done automatically upon pushing a tag, all this automation needs to do is retrieve existing tags from the repository, filter and mutate these tags to get a list of new tags that represent the updated images and push these tags.

It probably makes sense to write a python script for this instead of a bash script, as we'll need to parse semver at least in the filtering step.

Update

I have thought on this a bit more and am afraid that it is more complex than I initially wrote in this ticket / I forgot to mention a part of the issue.
When I write "push tags" we of course need to also identify the correct commit to push these tags to, and that cannot be HEAD because then we would include all current changes to the image in older versions as well.

For example we have kafka3.3.0-stackable0.7.0 as current image and also kafka3.3.0-stackable0.4.2 as an older version, if we now tag HEAD as kafka3.3.0-stackable0.4.3 then that image will suddenly include all changes that were made between 0.4.x and 0.7.x as well and potentially break compatibility.

So we'll nead release branches for image versions as well and then cherrypick the commit that contains the base image update to those branches.

This is done when

  • An action has been commited to the repository that allows re-releasing all images

Example

Step 1 - Get tags

To retrieve the tags something like this migth be done, which filters on a regex that pretty much just requires a stackable version to be present.

➜  docker-images git:(main) ✗ git tag -l | grep -E ".*-stackable[0-9]+\.[0-9]+\.[0-9]+"
airflow-stackable0.2.0
antora-stackable0.1.0
hadoop3.3.3-stackable0.1.0
hbase-stackable0.4.0
hbase2.4.11-stackable0.7.0
hbase2.4.12-stackable0.1.0
hbase2.4.12-stackable0.2.0
hbase2.4.6-stackable0.7.0
hbase2.4.8-stackable0.7.0
hbase2.4.9-stackable0.7.0
hive2.3.9-stackable0.5.0
hive3.1.3-stackable0.1.0
java-base-stackable0.2.1
kafka-stackable0.3.0
kafka3.2.0-stackable0.1.0
nifi1.15.1-stackable0.1.0
nifi1.15.2-stackable0.1.0
nifi1.15.3-stackable0.1.0
nifi1.16.0-stackable0.1.0
nifi1.16.1-stackable0.1.0
nifi1.16.2-stackable0.1.0
nifi1.16.3-stackable0.1.0
opa0.41.0-stackable0.1.0
pyspark-k8s3.3.0-stackable0.1.0
pyspark-k8s3.3.0-stackable0.2.0
spark-k8s-stackable0.2.0
spark-k8s-stackable0.3.0
spark-k8s3.3.0-stackable0.1.0
spark-k8s3.3.0-stackable0.2.0
superset-stackable0.1.0
superset-stackable0.2.0
superset-stackable0.3.0
superset-stackable1.0.0
superset-stackable2.0.0
superset-stackable2.0.1
superset-stackable2.1.0
superset1.5.1-stackable0.1.0
superset1.5.1-stackable0.2.0
testing-tools0.1.0-stackable0.1.0
tools0.2.0-stackable0.3.0
trino387-stackable0.1.0
trino395-stackable0.1.0
trino396-stackable0.1.0
zookeeper-stackable0.4.0
zookeeper-stackable0.7.1
➜  docker-images git:(main) ✗

Step 2 - Filter tags

Taking the superset images as an example we now need to remove images for which a subsequent version already exists.

superset-stackable0.1.0
superset-stackable0.2.0
superset-stackable0.3.0
superset-stackable1.0.0
superset-stackable2.0.0
superset-stackable2.0.1
superset-stackable2.1.0

In this case that would be 'superset-2.0.0' which is superceded by 'superset-2.0.1'.
So that would leave us with this list:

superset-stackable0.1.0
superset-stackable0.2.0
superset-stackable0.3.0
superset-stackable1.0.0
superset-stackable2.0.1
superset-stackable2.1.0

Step 3 - Create new tags

Now we increase the patch level for every tag:

superset-stackable0.1.1
superset-stackable0.2.1
superset-stackable0.3.1
superset-stackable1.0.1
superset-stackable2.0.2
superset-stackable2.1.1

Step 4 - Push tags

Push all tags that were just created.

Add --git-tag option to image builder python script

Description

String manipulation and especially regex matching is a pain in GH actions (bash). Push tag validation down into the Python script.

@soenkeliebau :

(kafka|zookeeper|nifi|druid|opa|hbase|hadoop|trino|airflow|superset|spark-k8s|pyspark-k8s)([0-9.]+)-stackable([0-9]+.[0-9]+.[0-9]+)$

Base Airflow image on the Red Hat UBI

All images must fulfill the requirements for OpenShift (see stackabletech/issues#207). One requirement is that all images must be based on a Red Hat Enterprise Linux or Red Hat Universal Base Image (see https://redhat-connect.gitbook.io/partner-guide-for-red-hat-openshift-and-container/program-on-boarding/technical-prerequisites#dockerfile-requirements). The Stackable Airflow image is based on the upstream Airflow image which in the end is based on Debian. Therefore the base image must be replaced with a Red Hat UBI.

The upstream Airflow image contains an argument PYTHON_BASE_IMAGE (see https://github.com/apache/airflow/blob/2.2.5/Dockerfile#L47). It should be evaluated if this can be set to an image which is based on a Red Hat UBI.

LTS Strategy and Upgrading older Images

Goal

We want to implement a system that supports upgrading base images for older product images. This system should be easy to use without much repetitive manual work.

LTS Strategy and Upgrading older Images

Do we want to upgrade base images for all previous release, all the time, indefinitely? This is to be decided. Upgrading older product images is not strictly necessary for the next release, so we decoupled this.

Lastly, manual steps should be automated as best we can.

Supply hadoop compression libraries with HBase product image

To use HBase/Phoenix tables with compression (e.g. snappy) there have to be particular hadoop native libraries present in the product image. These have been provisionally added for all current supported HBase versions (2.4.6, 2.4.8, 2.4.9, 2.4.11, 2.4.12) in this branch: https://github.com/stackabletech/docker-images/tree/compression-native-libs

For HBase 2.5(.3) it is possible to use an HBase image compiled against Hadoop 3 (3.2.4). There is also a corresponding Phoenix library.

Using these libraries adds about 0.2 GB to the image size (from 1.2 GB to 1.4 GB), though this can be reduced if only the native libraries are retained. This ticket covers implementing a strategy for supplying compression capabilities for supported HBase versions.

Open issues

  • using the HBase archive that is bundled with Hadoop 3 results in compression libs (as .jars) being present in the /libs folder e.g. snappy, lz4. These .jars however are not detected by HBase when creating a table with compression (classpath issue?)
  • most of the HBase documentation regarding this issue assumes that HBase and Hadoop are co-located, which is obviously not the case when running these products in Kubernetes. We have tried compiling Hadoop with the snappy flag and importing the libraries to HBase pods but this did not get around versioning issues.

Links

https://issues.apache.org/jira/browse/HADOOP-17125

Acceptance criteria

  • decide how or whether older (i.e. pre-2.5.3) versions of HBase should have these supporting libraries (if so, then that will probably be in the form of copying native libs into the image, as done in the branch mentioned above. This will probably necessitate some conditional logic when using different approaches for different HBase versions).
  • verify that HBase 2.5.3 (compiled with Hadoop 3) supports compression
    • Just using HBase-compiled-with-Hadoop may not be enough as the compression codecs are supplied as .jar files, but the hadoop native libraries are still assumed to be present (this must be done manually as described here)
  • decide if the Hadoop inclusions should be "trimmed" (i.e. just compression) or not

Document ways of deriving environment variables within a Dockerfile

As discussed in architecture meeting 04.05.2022.
Sometimes an argument is passed in to a Dockerfile which needs to be used in different forms e.g. for --build-arg PYTHON=3.8 we may need 3.8 in places (for e.g. product downloads), but 38 in others (microdnf commands). There are different ways of deriving and persisting one variable from another, and this issue covers documenting them with examples in the README e.g.

  • using the moby approach of choosing a specific base image (see here)
  • use an inline script wrapped by a RUN so that the derived variable is available for the scope of that command
  • source the environment variables and echo the environment to a file
  • etc.

Base Superset image on the Red Hat UBI

All images must fulfill the requirements for OpenShift (see stackabletech/issues#207). One requirement is that all images must be based on a Red Hat Enterprise Linux or Red Hat Universal Base Image (see https://redhat-connect.gitbook.io/partner-guide-for-red-hat-openshift-and-container/program-on-boarding/technical-prerequisites#dockerfile-requirements). The Stackable Superset image is based on the upstream Superset image which in the end is based on Debian. Therefore the base image must be replaced with a Red Hat UBI.

Request - Installing Trino Provider Dependencies

Description

I was exploring an use-case where we want to use Trino in our DAGs. Looking into how Airflow runs and how the dependencies are handled, I have reach the possible conclusion it might quite easy to add with the way you have created the docker images.

I have build and run it locally with the added dependency and was able to configure the Trino connection and run simple queries against a test Trino cluster, which by the looks of it, works.

Add HBase 2.4.9

HBASE 2.4.9 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.


  • HBASE-26542 | Minor | Apply a `package` to test protobuf files

The protobuf structures used in test are all now scoped by the package name `hbase.test.pb`.


  • HBASE-26512 | Major | Make timestamp format configurable in HBase shell scan output

HBASE-23930 changed the formatting of the timestamp attribute on each Cell as displayed by the HBase shell to be formatted as an ISO-8601 string rather that milliseconds since the epoch. Some users may have logic which expects the timestamp to be displayed as milliseconds since the epoch. This change introduces the configuration property hbase.shell.timestamp.format.epoch which controls whether the shell will print an ISO-8601 formatted timestamp (the default "false") or milliseconds since the epoch ("true").

HBASE Changelog

Release 2.4.9 - 2021-12-23

IMPROVEMENTS:

JIRA Summary Priority Component
HBASE-26601 maven-gpg-plugin failing with "Inappropriate ioctl for device" Major build
HBASE-26556 IT and Chaos Monkey improvements Minor integration tests
HBASE-26525 Use unique thread name for group WALs Major wal
HBASE-26517 Add auth method information to AccessChecker audit log Trivial security
HBASE-26512 Make timestamp format configurable in HBase shell scan output Major shell
HBASE-26485 Introduce a method to clean restore directory after Snapshot Scan Minor snapshots
HBASE-26475 The flush and compact methods in HTU should skip processing secondary replicas Major test
HBASE-26267 Master initialization fails if Master Region WAL dir is missing Major master
HBASE-26337 Optimization for weighted random generators Major Balancer
HBASE-26309 Balancer tends to move regions to the server at the end of list Major Balancer

BUG FIXES:

JIRA Summary Priority Component
HBASE-26541 hbase-protocol-shaded not buildable on M1 MacOSX Major .
HBASE-26527 ArrayIndexOutOfBoundsException in KeyValueUtil.copyToNewKeyValue() Major wal
HBASE-26462 Should persist restoreAcl flag in the procedure state for CloneSnapshotProcedure and RestoreSnapshotProcedure Critical proc-v2, snapshots
HBASE-26533 KeyValueScanner might not be properly closed when using InternalScan.checkOnlyMemStore() Minor .
HBASE-26482 HMaster may clean wals that is replicating in rare cases Critical Replication
HBASE-26468 Region Server doesn't exit cleanly incase it crashes. Major regionserver
HBASE-25905 Shutdown of WAL stuck at waitForSafePoint Blocker regionserver, wal
HBASE-26450 Server configuration will overwrite HStore configuration after using shell command 'update_config' Minor Compaction, conf, regionserver
HBASE-26476 Make DefaultMemStore extensible for HStore.memstore Major regionserver
HBASE-26465 MemStoreLAB may be released early when its SegmentScanner is scanning Critical regionserver
HBASE-26467 Wrong Cell Generated by MemStoreLABImpl.forceCopyOfBigCellInto when Cell size bigger than data chunk size Critical in-memory-compaction
HBASE-26463 Unreadable table names after HBASE-24605 Trivial UI
HBASE-26438 Fix flaky test TestHStore.testCompactingMemStoreCellExceedInmemoryFlushSize Major test
HBASE-26311 Balancer gets stuck in cohosted replica distribution Major Balancer
HBASE-26384 Segment already flushed to hfile may still be remained in CompactingMemStore Major in-memory-compaction
HBASE-26410 Fix HBase TestCanaryTool for Java17 Major java
HBASE-26429 HeapMemoryManager fails memstore flushes with NPE if enabled Major Operability, regionserver
HBASE-25322 Redundant Reference file in bottom region of split Minor .
HBASE-26406 Can not add peer replicating to non-HBase Major Replication
HBASE-26404 Update javadoc for CellUtil#createCell with tags methods. Major .
HBASE-26398 CellCounter fails for large tables filling up local disk Minor mapreduce

TESTS:

JIRA Summary Priority Component
HBASE-26542 Apply a `package` to test protobuf files Minor Protobufs, test

SUB-TASKS:

JIRA Summary Priority Component
HBASE-24870 Ignore TestAsyncTableRSCrashPublish Major .
HBASE-26470 Use openlabtesting protoc on linux arm64 in HBASE 2.x Major build
HBASE-26327 Replicas cohosted on a rack shouldn't keep triggering Balancer Major Balancer
HBASE-26308 Sum of multiplier of cost functions is not populated properly when we have a shortcut for trigger Critical Balancer
HBASE-26319 Make flaky find job track more builds Major flakies, jenkins

OTHER:

JIRA Summary Priority Component
HBASE-26549 hbaseprotoc plugin should initialize maven Major jenkins
HBASE-26444 BucketCacheWriter should log only the BucketAllocatorException message, not the full stack trace Major logging, Operability
HBASE-26443 Some BaseLoadBalancer log lines should be at DEBUG level Major logging, Operability

Airflow: Reduce image size by 200MB

Pulling the airflow image i noticed that the last layer (200MB) is duplicated. This results in a unnecessary big image and longer download times.
image
The command that duplicated the whole layer seems to be chown -R stackable:stackable /stackable here

DoD:

  • Remove 200MB from the airflow image

Fix RELEASE and VERSION labels in product images

Our docker images currently have a hard coded label RELEASE=1 which was required for Openshift certification. Another label is VERSION which currently is set from the product version (e.g. 3.8.0 for a ZooKeeper 3.8.0).

For RELEASE we want to update to our actual naming of our releases. The next one would be 22.09. This should probably be parameterized in the build script instead of each product image.

The version label does not make sense currently. It will always be 3.8.0 (for a ZooKeeper 3.8.0) even if we adapt, add to or change that image.

The VERSION label should become the full tag we create from that image like zookeeper3.8.0-stackable0.1.0 to clearly identify the image.

This is done when:

  • The RELEASE label is set "globally" via a CLI parameter in the build script
  • The VERSION label is set to the actually name of the created docker images like zookeeper3.8.0-stackable0.1.0

ubi8-minimal microdnf --enablerepo=ubi-8-appstream can't be used currently

The current situation for e.g. ubi8-rust-builder:
we use FROM registry.access.redhat.com/ubi8/ubi-minimal AS builder as base image, to install stuff we use && microdnf install --disablerepo=* --enablerepo=ubi-8-appstream-rpms --enablerepo=ubi-8-baseos curl findutils gcc gcc-c++ make cmake openssl-devel pkg-config systemd-devel unzip pearl -y \ && rm -rf /var/cache/yum

This leads to the error that ubi-8-appstream is not found

Ubi8-rust-builder: Enable multi-arch with cargo

To update ubi8-rust-builder we need to have the following changes to the dockerfile:

  • rustup for target architecture:
    ONBUILD RUN \
    if [ $(arch) = "aarch64" ]; then \
    . $HOME/.cargo/env && rustup target add aarch64-unknown-linux-gnu; \
    else \
    . $HOME/.cargo/env && rustup target add x86_64-unknown-linux-gnu; \
    fi

  • set environment to target architecture:
    ONBUILD ENV \
    . $HOME/.cargo/env && \
    if [ $(arch) = "aarch64" ]; then \
    CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER=aarch64-linux-gnu-gcc \
    CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc \
    CXX_aarch64_unknown_linux_gnu=aarch64-linux-gnu-g++; \
    else \
    CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=x86_64-linux-gnu-gcc \
    CC_X86_64_unknown_linux_gnu=x86_64-linux-gnu-gcc \
    CXX_X86_64_unknown_linux_gnu=x86_64-linux-gnu-g++; \
    fi

  • set --target for cargo
    if [ $(arch) = "aarch64" ]; then \
    . $HOME/.cargo/env && cargo build --release --target=aarch64-unknown-linux-gnu; \
    else \
    . $HOME/.cargo/env && cargo build --release --target=x86_64-unknown-linux-gnu; \
    fi

  • change search and copy of compiled binary:
    ONBUILD RUN find /src/target \
    -regextype egrep \
    # The interesting binaries are all directly in ${BUILD_DIR}.
    -maxdepth 3 \
    # Well, binaries are executable.
    -executable \
    # Well, binaries are files.
    -type f \
    # Filter out tests.
    ! -regex ".*\-[a-fA-F0-9]{16,16}$" \
    # Ignore some .so files
    ! -regex ".*\.so$" \
    # Copy the matching files into /app.
    -exec cp {} /app \;

This is the preparation for multi-arch operator images to work.

Sidenote: Working branch https://github.com/stackabletech/docker-images/tree/ubi8-rust-builder-multi-arch

Cross-Platform building for Docker images

To enable cross-platform building for Docker Images of products and operators, we agreed on using docker buildx build.

To make this happen we need some checks and steps:

  • Enable choice of platform in build_product_images.py
  • If no particular system is chosen, default to the current system (platform.machine() )
  • Check availability of dependencies (Java-Base, Tools etc. ), if not build them first

Use consistent versioning for Druid images with the authorizer plugin

As a user of druid images I want to be able to use images that apply image versioning consistently. Currently we have:

0.22.1-authorizer0.1.0-stackable0
0.22.1-authorizer0.1.0-stackable0.2.0
0.22.1-stackable0
0.22.1-stackable0.1.0

where 0.22.1-stackable0 does not to resolve to 0.22.1-authorizer0.1.0-stackable0.2.0

Add Superset 1.4.1

Release Notes for Superset 1.4

Superset 1.4 focuses heavily on continuing to polish the core Superset experience. This release has a very very long list of fixes from across the community.

User Facing Features

  • Charts and dashboards in Superset can now be certified! In addition, the Edit Dataset modal more accurately reflects state of Certification (especially for Calculated Columns). (#17335, #16454)

Tab Column

  • Parquet files can now be uploaded into an existing connected database that has Data Upload enabled. Eventually, the contributor hopes that this foundation can be used to accommodate feather and orc files. (#14449)

  • Tabs can now be added to Column elements in dashboards. (#16593)

Tab Column

  • The experience of using alerts and reports have improved in a few minor ways. (#16335,#16281)

  • Drag and drop now has a clickable ghost button for an improved user experience. (#16119)

Database Experience

  • Apache Drill: Superset can now connect to Apache Drill (thru ODBC / JDBC) and impersonate the currently logged in user. (#17353).

  • Firebolt: Superset now supports the cloud data warehouse Firebolt! (#16903).

  • Databricks: Superset now supports the new SQL Endpoints in Databricks. (#16862)

  • Apache Druid: Superset Explore now can take advantage of support for JOIN's in Druid (note: the DRUID_JOINS feature flag needs to be enabled). (#16770)

  • AWS Aurora: Superset now has a separate db_engine_spec for Amazon Aurora. (#16535)

  • Clickhouse: Superset now includes function names in the auto-complete for SQL Lab. (#16234)

  • Google Sheets: Better support for private Google Sheets was added. (#16228)

Developer Experience

  • The Makefile for Superset has gone through a number of improvements. (#16327, #16533)

  • Add Python instrumentation to pages, showing method calls used to build the page & how long each one took. This requires a configuration flag (see PR for more info). (#16136)

Pyinstrument

Breaking Changes and Full Changelog

Breaking Changes

  • 16660: The columns Jinja parameter has been renamed table_columns to make the columns query object parameter available in the Jinja context.
  • 16711: The url_param Jinja function will now by default escape the result. For instance, the value O'Brien will now be changed to O''Brien. To disable this behavior, call url_param with escape_result set to False: url_param("my_key", "my default", escape_result=False).

Changelog

To see the complete changelog in this release, head to CHANGELOG.MD. As mentioned earlier, this release has a MASSIVE amount of bug fixes. The full changelog lists all of them!

Implement new Branch System for Docker Images

In #208 we decided to use a new branching system, to be able to update old docker images to use new base images.

In this ticket we want to implement it, and create a branch for the upcoming release. This ticket does not deal with new base images for older releases. Any sort of automation is also not part of this ticket.

New Branching System

In the new branching system, we will maintain a release branch per platform release, i.e. a branch release-22.09, release-22.11.
On these branches we will tag platform versions, i.e. 22.11.0-rc1, 22.11.0, 22.11.1.

As we now tag a whole platform release instead of individual products, the build system/GitHub actions should build all images from these tags.

Acceptance Criteria

  • A new branch has been made for the release 22.11
  • a new tag for 22.11 release candidate 1 exists
  • The GitHub action has been changed, to build images based on the new tag
  • We are ready to release the 22.11 images

Requirements

#236 Should be done first!

Build Docker images for all products

Description

We need to Dockerize all our products.
These Docker images will probably evolve over time but initially we need them for all our supported versions, they need to include JMX where relevant and any authorizers needed.

We decided to keep the Dockerfiles in one central location (i.e. this repository) and not colocate them with the operators.

We would like to use the RedHat UBI images as a starting point (they do exist with JDKs as well).

One thing I've always been unsure about is a tagging strategy (e.g. when to use "latest") so that should be part of this.
Our "old" tar.gz packages could serve as a starting point but they were created manually and this might be a good time to make this workflow repeatable and automated.

Please also investigate "buildah" as a replacement for Docker.

Questions

Q: Should we add the packages to our Nexus before building the images ?
A: ??? (It would simplify the image pipeline a lot. For example different Nifi versions have different download urls)

Q: Kafka - which Scala version 2.12, 1.13 or both ?
A: Images are now build with both and also tagged with the scala version.

Q: Trino - should the images also contain the trino cli and trino jdbc ?
A: ???

Q: Tagging with latest is not possible with the current tagging scheme. See meeting notes on versions and tags below. Is this a problem ?
A: ??? (operators need to know exact versions when they create pods anyway)

Acceptance criteria

Docker images

  • OPA
  • Apache ZooKeeper
  • Apache NiFi
  • Apache Kafka
  • Trino
  • Apache Hadoop
  • Apache Spark
  • Apache HBase
  • Apache Hive
  • Apache Superset
  • (Maybe) Apache Druid - Coordinate with Felix

Operators adapted

moved to -> stackabletech/issues#136

Docs updated (thx Felix)

  • OPA
  • Apache ZooKeeper
  • Apache NiFi
  • Apache Kafka
  • Trino
  • Apache Hadoop
  • Apache Spark
  • Apache HBase
  • Apache Hive
  • Apache Superset
  • Apache Druid

Introduce DNS cache timeouts for Java in all products

As a user of our operators I'd like to have low Java DNS cache timeouts so the products don't crash on startup.

See these links for reference:

This is done when all Java based products have a java.security file (for me it's in /etc/java/java-11-openjdk/java-11-openjdk-11.0.14.1.1-2.el8_5.x86_64/conf/security but make sure it's stable) that includes the setting networkaddress.cache.ttl and a value of 5.

For this it might be worth creating one or more (for different Java versions) Stackable Java base images that do this once and we can reuse it for all our images.
See also this for reference: stackabletech/hdfs-operator#147

Superset version is hard coded in the Dockerfile

The Superset version is hard coded in the Dockerfile but the version from the configuration file conf.py should be taken.

Superset is similar to AirFlow because both are based on the same Python framework and their images can be built in the same way. The Dockerfile for AirFlow already uses a parameterized approach which should also be used in the Dockerfile for Superset.

Images are fairly large, can we shrink them a bit?

We received feedback that our product images are fairly large and have been asked to investigate if this can be reduced with reasonable effort.

The operators are fine, this is just in reference to the product images.

Specifically mentioned were:

  • NiFi
  • Druid
  • Trino

For this specific use case Trino would be the most important one to have a look at.

Reduce size of Docker images

The Docker images are much larger than the official ones:

$ docker images | grep trino
docker.stackable.tech/stackable/trino        362-stackable0                 4a6cc6f9f9dd   2 weeks ago     4GB
trinodb/trino                                362                            8c4c3cc8b6b5   2 months ago    1.33GB

The size of the images should be decreased to speed up the download. This would also make the integration tests faster and decrease the chance of hitting timeouts.

One possible optimization is to not rename the installation directory in a separate layer:

RUN mkdir /stackable/trino-server && \
    mv /stackable/trino-server-${PRODUCT}/* /stackable/trino-server && \
    rm -rf /stackable/trino-server-${PRODUCT}

No linux/arm64/v8 images available

I am attempting to install and get an hbase cluster running using the helm charts. The operators install fine but when actually attempting to apply manifests to create the cluster objects (zookeepercluster, zookeepernode etc...) I am met with Warning Failed 4s (x2 over 18s) kubelet Failed to pull image "docker.stackable.tech/stackable/zookeeper:3.8.0-stackable23.1.0": rpc error: code = Unknown desc = no matching manifest for linux/arm64/v8 in the manifest list entries. Is this a known issue?

spark-k8s and pyspark-k8s: Add s3a and abfs libs

S3 libs are needed for basically everything (e.g. reading/writing data and history-server) therefore we want to pre-ship them with our images.
Another reason is to put the correct versions in the image, as many users will be confused on which versions to put there

Implementation can be copied from hive image

  • Must: Libs added to new Image version. Only adding it to Spark 3.3.0 only is ok, as older Spark versions have older Hadoop versions and need other aws libs
  • Must: kuttl test added
  • Could: kuttl tests using s3a libs via pvcs or similar are removed
  • Documentation & changelog updated to mention the inclusion of these libs

Update all images on the main branch

This is about updating everything on main. This is not about updating base images for older product images.

Some products might need to be upgraded directly, others depend on the java-base image. This needs to be upgraded.

The java-base image will need to be split, and versioned differently. Some products require Java 17, while i.e. Druid only supports java 11. The Java base image needs the latest ubi8 base image though, so we will have a java-11 base image, and a java-17 base image (at least).

Acceptance criteria

  • Java base image has been split, and all new java base images use the latest ubi8 base
  • java based products are linked to their new respective java base image
  • all non-java products have been upgraded as well (airflow, opa, superset, others?)
  • Use newest image tags in:
    • Operator repository examples
    • Operator documentation
    • Getting started guides
    • Integration tests
    • Optional: tools image in init containers

Progress Tags

Products

  • Airflow
    • 2.2.3-stackable0.4.0
    • 2.2.4-stackable0.4.0
    • 2.2.5-stackable0.4.0
    • 2.4.1-stackable0.5.0
  • Druid
    • 0.23.0-stackable0.2.0
    • 24.0.0-stackable0.2.0
  • Hadoop HDFS
    • 3.2.2-stackable0.6.0
    • 3.3.1-stackable0.6.0
    • 3.3.3-stackable0.2.0
    • 3.3.4-stackable0.2.0
  • HBase
    • 2.4.6-stackable0.8.0
    • 2.4.8-stackable0.8.0
    • 2.4.9-stackable0.8.0
    • 2.4.11-stackable0.8.0
    • 2.4.12-stackable0.3.0
  • Hive
    • 2.3.9-stackable0.6.0
    • 3.1.3-stackable0.2.0
  • Kafka
    • 2.7.1-stackable0.6.0
    • 2.8.1-stackable0.6.0
    • 3.1.0-stackable0.6.0
    • 3.2.0-stackable0.2.0
    • 3.3.1-stackable0.2.0
  • NiFi
    • 1.13.2-stackable0.5.0
    • 1.15.0-stackable0.5.0
    • 1.15.1-stackable0.2.0
    • 1.15.2-stackable0.2.0
    • 1.15.3-stackable0.2.0
    • 1.16.0-stackable0.2.0
    • 1.16.1-stackable0.2.0
    • 1.16.2-stackable0.2.0
    • 1.16.3-stackable0.2.0
    • 1.18.0-stackable0.2.0
  • pyspark-k8s
    • 3.2.1-stackable0.7.0
    • 3.3.0-stackable0.3.0
  • spark-k8s
    • 3.2.1-stackable0.6.0
    • 3.3.0-stackable0.3.0
  • Superset
    • 1.3.2-stackable2.2.0
    • 1.4.1-stackable2.2.0
    • 1.5.1-stackable0.3.0
  • Trino
    • 377-stackable0.2.0
    • 387-stackable0.2.0
    • 395-stackable0.2.0
    • 396-stackable0.2.0
  • ZooKeeper
    • 3.5.8-stackable0.8.0
    • 3.6.3-stackable0.8.0
    • 3.7.0-stackable0.8.0
    • 3.8.0-stackable0.8.0

Internal

  • OPA
    • 0.27.1-stackable0.3.0
    • 0.28.0-stackable0.3.0
    • 0.37.2-stackable0.3.0
    • 0.41.0-stackable0.2.0
    • 0.45.0-stackable0.2.0
  • Tools
    • 0.2.0-stackable0.4.0

Progress Operator repository image adaptation

Products

Internal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.