stackabletech / docker-images Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
As a user I would like to be able to use the spark cluster for python-based jobs. This requires that pyspark (and python) are made available in the docker image.
The current spark docker-image contains the entire spark code and can be used with scala (spark-shell
) or sql (spark-sql
): pyspark
is also installed but to use it requires the following:
Some things which I've quickly tested:
RUN microdnf update && \
...
microdnf install python3 python3-pip && \
...
RUN ln -s /usr/bin/python3 /usr/bin/python
RUN ln -s /usr/bin/pip3 /usr/bin/pip
ENV SPARK_HOME=/stackable/spark
ENV PATH=$SPARK_HOME:$PATH:/bin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
ENV PYSPARK_PYTHON=/usr/bin/python
ENV PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
# pyspark dependencies
USER root
RUN pip install --upgrade pip
RUN pip install --no-cache-dir numpy pandas pyarrow
USER stackable
...
In the context MLOps it is planned to provide one or more environments for the following tasks:
Many of these use python libraries.
The fuse binary is not built and shipped by default by the Hadoop project. That means we'll have to build it ourselves and include it in the image.
cd hadoop-hdfs-project/hadoop-hdfs-native-client
mvn clean package -P native
Results in:
[WARNING] make[2]: *** [main/native/libhdfspp/lib/fs/CMakeFiles/fs_obj.dir/build.make:104: main/native/libhdfspp/lib/fs/CMakeFiles/fs_obj.dir/filehandle.cc.o] Error 1
[WARNING] make[2]: *** Waiting for unfinished jobs....
[WARNING] make[1]: *** [CMakeFiles/Makefile2:2629: main/native/libhdfspp/lib/fs/CMakeFiles/fs_obj.dir/all] Error 2
[WARNING] make[1]: *** Waiting for unfinished jobs....
[WARNING] make: *** [Makefile:146: all] Error 2
plus more stuff before that.
Context: For our 22.11 release we wanted to upgrade base images for our older product images. This raised the question of how far back we should do this and whether we should do it for all images etc.
An outcome of this could also be to keep everything updated for now until we find a reason not to do this anymore.
Images should be tagged using the following scheme:
tag: version - dependencies - stackable_version
version: \d+(\.\d+)?
dependencies: dep (-dep)?
dep: name version
name: [a-z]+
stackable_version: "stackable" version
For example, the full name of the Kafka image should be: docker.stackable.tech/stackable/kafka:2.6.1-scala2.13-stackable0.0.1
The current purely tag based system for tracking product and image versions does not allow us to update older product images with newer base images, without also upgrading the product version.
We implement a system that supports upgrading base images for older product images. This system should be easy to use without much repetitive manual work.
In the new branching system, we will maintain a release branch per platform release, i.e. a branch release-22.09
, release-22.11
.
On these branches we will tag platform versions, i.e. 22.11.0-rc1
, 22.11.0
, 22.11.1
.
As we now tag a whole platform release instead of individual products, the build system/GitHub actions should build all images from these tags.
As this issue turned out to be more complex than anticipated, we decided to divide it into multiple stages.
Do we want to upgrade base images for all previous release, all the time, indefinitely? This is to be decided. Upgrading older product images is not strictly necessary for the next release, so we decoupled this.
Lastly, manual steps should be automated as best we can.
As part of our release tracking epic we have an item to update the ubi8 base image version in a new release cycle.
As this affects all supported product images this can potentially require quite a bit of manual effort and we think is worth creating an automation for.
Since releasing an image is done automatically upon pushing a tag, all this automation needs to do is retrieve existing tags from the repository, filter and mutate these tags to get a list of new tags that represent the updated images and push these tags.
It probably makes sense to write a python script for this instead of a bash script, as we'll need to parse semver at least in the filtering step.
I have thought on this a bit more and am afraid that it is more complex than I initially wrote in this ticket / I forgot to mention a part of the issue.
When I write "push tags" we of course need to also identify the correct commit to push these tags to, and that cannot be HEAD because then we would include all current changes to the image in older versions as well.
For example we have kafka3.3.0-stackable0.7.0 as current image and also kafka3.3.0-stackable0.4.2 as an older version, if we now tag HEAD as kafka3.3.0-stackable0.4.3 then that image will suddenly include all changes that were made between 0.4.x and 0.7.x as well and potentially break compatibility.
So we'll nead release branches for image versions as well and then cherrypick the commit that contains the base image update to those branches.
To retrieve the tags something like this migth be done, which filters on a regex that pretty much just requires a stackable version to be present.
➜ docker-images git:(main) ✗ git tag -l | grep -E ".*-stackable[0-9]+\.[0-9]+\.[0-9]+"
airflow-stackable0.2.0
antora-stackable0.1.0
hadoop3.3.3-stackable0.1.0
hbase-stackable0.4.0
hbase2.4.11-stackable0.7.0
hbase2.4.12-stackable0.1.0
hbase2.4.12-stackable0.2.0
hbase2.4.6-stackable0.7.0
hbase2.4.8-stackable0.7.0
hbase2.4.9-stackable0.7.0
hive2.3.9-stackable0.5.0
hive3.1.3-stackable0.1.0
java-base-stackable0.2.1
kafka-stackable0.3.0
kafka3.2.0-stackable0.1.0
nifi1.15.1-stackable0.1.0
nifi1.15.2-stackable0.1.0
nifi1.15.3-stackable0.1.0
nifi1.16.0-stackable0.1.0
nifi1.16.1-stackable0.1.0
nifi1.16.2-stackable0.1.0
nifi1.16.3-stackable0.1.0
opa0.41.0-stackable0.1.0
pyspark-k8s3.3.0-stackable0.1.0
pyspark-k8s3.3.0-stackable0.2.0
spark-k8s-stackable0.2.0
spark-k8s-stackable0.3.0
spark-k8s3.3.0-stackable0.1.0
spark-k8s3.3.0-stackable0.2.0
superset-stackable0.1.0
superset-stackable0.2.0
superset-stackable0.3.0
superset-stackable1.0.0
superset-stackable2.0.0
superset-stackable2.0.1
superset-stackable2.1.0
superset1.5.1-stackable0.1.0
superset1.5.1-stackable0.2.0
testing-tools0.1.0-stackable0.1.0
tools0.2.0-stackable0.3.0
trino387-stackable0.1.0
trino395-stackable0.1.0
trino396-stackable0.1.0
zookeeper-stackable0.4.0
zookeeper-stackable0.7.1
➜ docker-images git:(main) ✗
Taking the superset images as an example we now need to remove images for which a subsequent version already exists.
superset-stackable0.1.0
superset-stackable0.2.0
superset-stackable0.3.0
superset-stackable1.0.0
superset-stackable2.0.0
superset-stackable2.0.1
superset-stackable2.1.0
In this case that would be 'superset-2.0.0' which is superceded by 'superset-2.0.1'.
So that would leave us with this list:
superset-stackable0.1.0
superset-stackable0.2.0
superset-stackable0.3.0
superset-stackable1.0.0
superset-stackable2.0.1
superset-stackable2.1.0
Now we increase the patch level for every tag:
superset-stackable0.1.1
superset-stackable0.2.1
superset-stackable0.3.1
superset-stackable1.0.1
superset-stackable2.0.2
superset-stackable2.1.1
Push all tags that were just created.
The PR #226 changed all the java base image versions. This should not be happening. E.g. Druid only supports Java 11.
String manipulation and especially regex matching is a pain in GH actions (bash). Push tag validation down into the Python script.
(kafka|zookeeper|nifi|druid|opa|hbase|hadoop|trino|airflow|superset|spark-k8s|pyspark-k8s)([0-9.]+)-stackable([0-9]+.[0-9]+.[0-9]+)$
All images must fulfill the requirements for OpenShift (see stackabletech/issues#207). One requirement is that all images must be based on a Red Hat Enterprise Linux or Red Hat Universal Base Image (see https://redhat-connect.gitbook.io/partner-guide-for-red-hat-openshift-and-container/program-on-boarding/technical-prerequisites#dockerfile-requirements). The Stackable Airflow image is based on the upstream Airflow image which in the end is based on Debian. Therefore the base image must be replaced with a Red Hat UBI.
The upstream Airflow image contains an argument PYTHON_BASE_IMAGE
(see https://github.com/apache/airflow/blob/2.2.5/Dockerfile#L47). It should be evaluated if this can be set to an image which is based on a Red Hat UBI.
We want to implement a system that supports upgrading base images for older product images. This system should be easy to use without much repetitive manual work.
Do we want to upgrade base images for all previous release, all the time, indefinitely? This is to be decided. Upgrading older product images is not strictly necessary for the next release, so we decoupled this.
Lastly, manual steps should be automated as best we can.
https://github.com/apache/hbase-operator-tools
https://hbase.apache.org/downloads.html
Unfortunately, we need to make sure that the version the operator tools are compiled for matches the HBase version itself.
I think it makes sense to include this directly in the HBase image but a separate client image would also work
To use HBase/Phoenix tables with compression (e.g. snappy) there have to be particular hadoop native libraries present in the product image. These have been provisionally added for all current supported HBase versions (2.4.6, 2.4.8, 2.4.9, 2.4.11, 2.4.12) in this branch: https://github.com/stackabletech/docker-images/tree/compression-native-libs
For HBase 2.5(.3) it is possible to use an HBase image compiled against Hadoop 3 (3.2.4). There is also a corresponding Phoenix library.
Using these libraries adds about 0.2 GB to the image size (from 1.2 GB to 1.4 GB), though this can be reduced if only the native libraries are retained. This ticket covers implementing a strategy for supplying compression capabilities for supported HBase versions.
https://issues.apache.org/jira/browse/HADOOP-17125
As discussed in architecture meeting 04.05.2022.
Sometimes an argument is passed in to a Dockerfile which needs to be used in different forms e.g. for --build-arg PYTHON=3.8
we may need 3.8
in places (for e.g. product downloads), but 38
in others (microdnf commands). There are different ways of deriving and persisting one variable from another, and this issue covers documenting them with examples in the README e.g.
RUN
so that the derived variable is available for the scope of that commandWith Issue stackabletech/issues#301 we identified the operators using the tools image.
From this List we are going to distribute those tools into their corresponding product images.
Currently we do not properly set all necessary tags on all docker images when they are pushed into the repository by the github actions.
We should extend the github actions to set all needed tags in all stages (pr, nightly, release)
and remove dependabot
All images must fulfill the requirements for OpenShift (see stackabletech/issues#207). One requirement is that all images must be based on a Red Hat Enterprise Linux or Red Hat Universal Base Image (see https://redhat-connect.gitbook.io/partner-guide-for-red-hat-openshift-and-container/program-on-boarding/technical-prerequisites#dockerfile-requirements). The Stackable Superset image is based on the upstream Superset image which in the end is based on Debian. Therefore the base image must be replaced with a Red Hat UBI.
I was exploring an use-case where we want to use Trino in our DAGs. Looking into how Airflow runs and how the dependencies are handled, I have reach the possible conclusion it might quite easy to add with the way you have created the docker images.
I have build and run it locally with the added dependency and was able to configure the Trino connection and run simple queries against a test Trino cluster, which by the looks of it, works.
These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.
The protobuf structures used in test are all now scoped by the package name `hbase.test.pb`.
HBASE-23930 changed the formatting of the timestamp attribute on each Cell as displayed by the HBase shell to be formatted as an ISO-8601 string rather that milliseconds since the epoch. Some users may have logic which expects the timestamp to be displayed as milliseconds since the epoch. This change introduces the configuration property hbase.shell.timestamp.format.epoch which controls whether the shell will print an ISO-8601 formatted timestamp (the default "false") or milliseconds since the epoch ("true").
JIRA | Summary | Priority | Component |
---|---|---|---|
HBASE-26601 | maven-gpg-plugin failing with "Inappropriate ioctl for device" | Major | build |
HBASE-26556 | IT and Chaos Monkey improvements | Minor | integration tests |
HBASE-26525 | Use unique thread name for group WALs | Major | wal |
HBASE-26517 | Add auth method information to AccessChecker audit log | Trivial | security |
HBASE-26512 | Make timestamp format configurable in HBase shell scan output | Major | shell |
HBASE-26485 | Introduce a method to clean restore directory after Snapshot Scan | Minor | snapshots |
HBASE-26475 | The flush and compact methods in HTU should skip processing secondary replicas | Major | test |
HBASE-26267 | Master initialization fails if Master Region WAL dir is missing | Major | master |
HBASE-26337 | Optimization for weighted random generators | Major | Balancer |
HBASE-26309 | Balancer tends to move regions to the server at the end of list | Major | Balancer |
JIRA | Summary | Priority | Component |
---|---|---|---|
HBASE-26541 | hbase-protocol-shaded not buildable on M1 MacOSX | Major | . |
HBASE-26527 | ArrayIndexOutOfBoundsException in KeyValueUtil.copyToNewKeyValue() | Major | wal |
HBASE-26462 | Should persist restoreAcl flag in the procedure state for CloneSnapshotProcedure and RestoreSnapshotProcedure | Critical | proc-v2, snapshots |
HBASE-26533 | KeyValueScanner might not be properly closed when using InternalScan.checkOnlyMemStore() | Minor | . |
HBASE-26482 | HMaster may clean wals that is replicating in rare cases | Critical | Replication |
HBASE-26468 | Region Server doesn't exit cleanly incase it crashes. | Major | regionserver |
HBASE-25905 | Shutdown of WAL stuck at waitForSafePoint | Blocker | regionserver, wal |
HBASE-26450 | Server configuration will overwrite HStore configuration after using shell command 'update_config' | Minor | Compaction, conf, regionserver |
HBASE-26476 | Make DefaultMemStore extensible for HStore.memstore | Major | regionserver |
HBASE-26465 | MemStoreLAB may be released early when its SegmentScanner is scanning | Critical | regionserver |
HBASE-26467 | Wrong Cell Generated by MemStoreLABImpl.forceCopyOfBigCellInto when Cell size bigger than data chunk size | Critical | in-memory-compaction |
HBASE-26463 | Unreadable table names after HBASE-24605 | Trivial | UI |
HBASE-26438 | Fix flaky test TestHStore.testCompactingMemStoreCellExceedInmemoryFlushSize | Major | test |
HBASE-26311 | Balancer gets stuck in cohosted replica distribution | Major | Balancer |
HBASE-26384 | Segment already flushed to hfile may still be remained in CompactingMemStore | Major | in-memory-compaction |
HBASE-26410 | Fix HBase TestCanaryTool for Java17 | Major | java |
HBASE-26429 | HeapMemoryManager fails memstore flushes with NPE if enabled | Major | Operability, regionserver |
HBASE-25322 | Redundant Reference file in bottom region of split | Minor | . |
HBASE-26406 | Can not add peer replicating to non-HBase | Major | Replication |
HBASE-26404 | Update javadoc for CellUtil#createCell with tags methods. | Major | . |
HBASE-26398 | CellCounter fails for large tables filling up local disk | Minor | mapreduce |
JIRA | Summary | Priority | Component |
---|---|---|---|
HBASE-26542 | Apply a `package` to test protobuf files | Minor | Protobufs, test |
JIRA | Summary | Priority | Component |
---|---|---|---|
HBASE-24870 | Ignore TestAsyncTableRSCrashPublish | Major | . |
HBASE-26470 | Use openlabtesting protoc on linux arm64 in HBASE 2.x | Major | build |
HBASE-26327 | Replicas cohosted on a rack shouldn't keep triggering Balancer | Major | Balancer |
HBASE-26308 | Sum of multiplier of cost functions is not populated properly when we have a shortcut for trigger | Critical | Balancer |
HBASE-26319 | Make flaky find job track more builds | Major | flakies, jenkins |
JIRA | Summary | Priority | Component |
---|---|---|---|
HBASE-26549 | hbaseprotoc plugin should initialize maven | Major | jenkins |
HBASE-26444 | BucketCacheWriter should log only the BucketAllocatorException message, not the full stack trace | Major | logging, Operability |
HBASE-26443 | Some BaseLoadBalancer log lines should be at DEBUG level | Major | logging, Operability |
Pulling the airflow image i noticed that the last layer (200MB) is duplicated. This results in a unnecessary big image and longer download times.
The command that duplicated the whole layer seems to be chown -R stackable:stackable /stackable
here
DoD:
Our docker images currently have a hard coded label RELEASE=1
which was required for Openshift certification. Another label is VERSION
which currently is set from the product version (e.g. 3.8.0 for a ZooKeeper 3.8.0).
For RELEASE
we want to update to our actual naming of our releases. The next one would be 22.09
. This should probably be parameterized in the build script instead of each product image.
The version label does not make sense currently. It will always be 3.8.0 (for a ZooKeeper 3.8.0) even if we adapt, add to or change that image.
The VERSION
label should become the full tag we create from that image like zookeeper3.8.0-stackable0.1.0
to clearly identify the image.
This is done when:
RELEASE
label is set "globally" via a CLI parameter in the build scriptVERSION
label is set to the actually name of the created docker images like zookeeper3.8.0-stackable0.1.0
The current situation for e.g. ubi8-rust-builder:
we use FROM registry.access.redhat.com/ubi8/ubi-minimal AS builder
as base image, to install stuff we use && microdnf install --disablerepo=* --enablerepo=ubi-8-appstream-rpms --enablerepo=ubi-8-baseos curl findutils gcc gcc-c++ make cmake openssl-devel pkg-config systemd-devel unzip pearl -y \ && rm -rf /var/cache/yum
This leads to the error that ubi-8-appstream is not found
To update ubi8-rust-builder we need to have the following changes to the dockerfile:
rustup for target architecture:
ONBUILD RUN \
if [ $(arch) = "aarch64" ]; then \
. $HOME/.cargo/env && rustup target add aarch64-unknown-linux-gnu; \
else \
. $HOME/.cargo/env && rustup target add x86_64-unknown-linux-gnu; \
fi
set environment to target architecture:
ONBUILD ENV \
. $HOME/.cargo/env && \
if [ $(arch) = "aarch64" ]; then \
CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER=aarch64-linux-gnu-gcc \
CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc \
CXX_aarch64_unknown_linux_gnu=aarch64-linux-gnu-g++; \
else \
CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=x86_64-linux-gnu-gcc \
CC_X86_64_unknown_linux_gnu=x86_64-linux-gnu-gcc \
CXX_X86_64_unknown_linux_gnu=x86_64-linux-gnu-g++; \
fi
set --target for cargo
if [ $(arch) = "aarch64" ]; then \
. $HOME/.cargo/env && cargo build --release --target=aarch64-unknown-linux-gnu; \
else \
. $HOME/.cargo/env && cargo build --release --target=x86_64-unknown-linux-gnu; \
fi
change search and copy of compiled binary:
ONBUILD RUN find /src/target \
-regextype egrep \
# The interesting binaries are all directly in ${BUILD_DIR}.
-maxdepth 3 \
# Well, binaries are executable.
-executable \
# Well, binaries are files.
-type f \
# Filter out tests.
! -regex ".*\-[a-fA-F0-9]{16,16}$" \
# Ignore some .so files
! -regex ".*\.so$" \
# Copy the matching files into /app.
-exec cp {} /app \;
This is the preparation for multi-arch operator images to work.
Sidenote: Working branch https://github.com/stackabletech/docker-images/tree/ubi8-rust-builder-multi-arch
To enable cross-platform building for Docker Images of products and operators, we agreed on using docker buildx build.
To make this happen we need some checks and steps:
As a user of druid images I want to be able to use images that apply image versioning consistently. Currently we have:
0.22.1-authorizer0.1.0-stackable0
0.22.1-authorizer0.1.0-stackable0.2.0
0.22.1-stackable0
0.22.1-stackable0.1.0
where 0.22.1-stackable0
does not to resolve to 0.22.1-authorizer0.1.0-stackable0.2.0
For the sake of maintainability we should document where additional scripts and metrics exporters used in our docker images are coming from.
Superset 1.4 focuses heavily on continuing to polish the core Superset experience. This release has a very very long list of fixes from across the community.
Parquet files can now be uploaded into an existing connected database that has Data Upload enabled. Eventually, the contributor hopes that this foundation can be used to accommodate feather
and orc
files. (#14449)
Tabs can now be added to Column elements in dashboards. (#16593)
The experience of using alerts and reports have improved in a few minor ways. (#16335,#16281)
Drag and drop now has a clickable ghost button for an improved user experience. (#16119)
Apache Drill: Superset can now connect to Apache Drill (thru ODBC / JDBC) and impersonate the currently logged in user. (#17353).
Firebolt: Superset now supports the cloud data warehouse Firebolt! (#16903).
Databricks: Superset now supports the new SQL Endpoints in Databricks. (#16862)
Apache Druid: Superset Explore now can take advantage of support for JOIN's in Druid (note: the DRUID_JOINS
feature flag needs to be enabled). (#16770)
AWS Aurora: Superset now has a separate db_engine_spec for Amazon Aurora. (#16535)
Clickhouse: Superset now includes function names in the auto-complete for SQL Lab. (#16234)
Google Sheets: Better support for private Google Sheets was added. (#16228)
The Makefile
for Superset has gone through a number of improvements. (#16327, #16533)
Add Python instrumentation to pages, showing method calls used to build the page & how long each one took. This requires a configuration flag (see PR for more info). (#16136)
Breaking Changes
columns
Jinja parameter has been renamed table_columns
to make the columns query object parameter available in the Jinja context.O'Brien
will now be changed to O''Brien
. To disable this behavior, call url_param
with escape_result
set to False: url_param("my_key", "my default", escape_result=False)
.Changelog
To see the complete changelog in this release, head to CHANGELOG.MD. As mentioned earlier, this release has a MASSIVE amount of bug fixes. The full changelog lists all of them!
In #208 we decided to use a new branching system, to be able to update old docker images to use new base images.
In this ticket we want to implement it, and create a branch for the upcoming release. This ticket does not deal with new base images for older releases. Any sort of automation is also not part of this ticket.
In the new branching system, we will maintain a release branch per platform release, i.e. a branch release-22.09
, release-22.11
.
On these branches we will tag platform versions, i.e. 22.11.0-rc1
, 22.11.0
, 22.11.1
.
As we now tag a whole platform release instead of individual products, the build system/GitHub actions should build all images from these tags.
#236 Should be done first!
We need to Dockerize all our products.
These Docker images will probably evolve over time but initially we need them for all our supported versions, they need to include JMX where relevant and any authorizers needed.
We decided to keep the Dockerfiles in one central location (i.e. this repository) and not colocate them with the operators.
We would like to use the RedHat UBI images as a starting point (they do exist with JDKs as well).
One thing I've always been unsure about is a tagging strategy (e.g. when to use "latest") so that should be part of this.
Our "old" tar.gz
packages could serve as a starting point but they were created manually and this might be a good time to make this workflow repeatable and automated.
Please also investigate "buildah" as a replacement for Docker.
Q: Should we add the packages to our Nexus before building the images ?
A: ??? (It would simplify the image pipeline a lot. For example different Nifi versions have different download urls)
Q: Kafka - which Scala version 2.12, 1.13 or both ?
A: Images are now build with both and also tagged with the scala version.
Q: Trino - should the images also contain the trino cli and trino jdbc ?
A: ???
Q: Tagging with latest is not possible with the current tagging scheme. See meeting notes on versions and tags below. Is this a problem ?
A: ??? (operators need to know exact versions when they create pods anyway)
moved to -> stackabletech/issues#136
Annotation prometheus.io/scrape: "true"
should be added to role services
As a user of our operators I'd like to have low Java DNS cache timeouts so the products don't crash on startup.
See these links for reference:
This is done when all Java based products have a java.security
file (for me it's in /etc/java/java-11-openjdk/java-11-openjdk-11.0.14.1.1-2.el8_5.x86_64/conf/security
but make sure it's stable) that includes the setting networkaddress.cache.ttl
and a value of 5
.
For this it might be worth creating one or more (for different Java versions) Stackable Java base images that do this once and we can reuse it for all our images.
See also this for reference: stackabletech/hdfs-operator#147
This is a patch upgrade, it fixes the log4j issue among others things.
The Superset version is hard coded in the Dockerfile but the version from the configuration file conf.py
should be taken.
Superset is similar to AirFlow because both are based on the same Python framework and their images can be built in the same way. The Dockerfile for AirFlow already uses a parameterized approach which should also be used in the Dockerfile for Superset.
We received feedback that our product images are fairly large and have been asked to investigate if this can be reduced with reasonable effort.
The operators are fine, this is just in reference to the product images.
Specifically mentioned were:
For this specific use case Trino would be the most important one to have a look at.
This requires #237.
Once we know how, when and until when we want to keep older product versions maintained, we should implement this strategy for our previous platform releases.
The Docker images are much larger than the official ones:
$ docker images | grep trino
docker.stackable.tech/stackable/trino 362-stackable0 4a6cc6f9f9dd 2 weeks ago 4GB
trinodb/trino 362 8c4c3cc8b6b5 2 months ago 1.33GB
The size of the images should be decreased to speed up the download. This would also make the integration tests faster and decrease the chance of hitting timeouts.
One possible optimization is to not rename the installation directory in a separate layer:
RUN mkdir /stackable/trino-server && \
mv /stackable/trino-server-${PRODUCT}/* /stackable/trino-server && \
rm -rf /stackable/trino-server-${PRODUCT}
I am attempting to install and get an hbase cluster running using the helm charts. The operators install fine but when actually attempting to apply manifests to create the cluster objects (zookeepercluster, zookeepernode etc...) I am met with Warning Failed 4s (x2 over 18s) kubelet Failed to pull image "docker.stackable.tech/stackable/zookeeper:3.8.0-stackable23.1.0": rpc error: code = Unknown desc = no matching manifest for linux/arm64/v8 in the manifest list entries
. Is this a known issue?
S3 libs are needed for basically everything (e.g. reading/writing data and history-server) therefore we want to pre-ship them with our images.
Another reason is to put the correct versions in the image, as many users will be confused on which versions to put there
Implementation can be copied from hive image
This is about updating everything on main. This is not about updating base images for older product images.
Some products might need to be upgraded directly, others depend on the java-base image. This needs to be upgraded.
The java-base image will need to be split, and versioned differently. Some products require Java 17, while i.e. Druid only supports java 11. The Java base image needs the latest ubi8 base image though, so we will have a java-11 base image, and a java-17 base image (at least).
It is not yet clear if automation will be needed.
From: https://kubernetes.io/blog/2023/03/17/upcoming-changes-in-kubernetes-v1-27/ - any k8s.gcr.io images will be moved to registry.k8s.io. The git-sync implementation for Airflow uses k8s.gcr.io/git-sync/git-sync:v3.6.4 and this should be mirrored in our Nexus.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.