Comments (12)
@johnSchnake Thanks for pointing that out, I have found the root cause for the missing testcases. It was an unfortunate misconfiguration of our test case list file. I will fix it soon and reupload the test results. Let me explain how this test results have been obtained:
At first, the description of how to reproduce the test results sap-cp-aws is outdated. I will update it soon. Further below I also describe how we currently run the conformance tests (and other) in detail.
We are continuously working on test coverage of gardener, which is why readme descriptioins like the mentioned one are getting outdated pretty fast. At the beginning we were using suonoboy, which was fine for a moment. But since we wanted to test additional k8s e2e test cases, besides conformance and we wanted to automate test runs to ensure quality, we introduced Test Machinery and a kubetest wrapper which allow us to test clusters in a structured and consistant way. Currently we are running over 260 e2e test suites a day (on different landscsape, k8s versions, cloud providers and operating systems) which covers 1099 unique testscases and 113025 executed testcases in total per day. We are uploading conformance test results to testgrid (conformance, others) on daily basis for k8s versions 1.10 to 1.16 and multiple cloud provider.
Reasons why we were forced to build a kubetest wrapper were to gain more control over testcase grouping and test result evaluation. With that we now have:
- The possibility to define custom testcase groups with individual tagging like fast, slow, flaky, etc. because we sometimes want custom grouping and since existing tags in test case names are not reliable. E.g. some testcases are slow, although they don't have a
[Slow]
tag, and same for[Flaky]
. - Filter false positive test cases
- Parse resulting junit.xml files and e2e logs to our JSON format to ingest into Elasticsearch
- Annotate testcases to run only for dedicated cloud providers
- Merge junit.xml files of parallel kubetest executions (this is not used for conformance)
Regarding this discussion here, would be great to have some validation on testgrid side for e.g. testcase numbers to avoid such incomplete executions.
You will be able to reproduce the test results if you run:
#first set KUBECONFIG to your cluster
docker run -ti -e --rm -v $KUBECONFIG:/mye2e/shoot.config golang:1.13 bash
# run all commands below within container
export E2E_EXPORT_PATH=/tmp/export; export KUBECONFIG=/mye2e/shoot.config; export GINKGO_PARALLEL=false
go get github.com/gardener/test-infra; cd /go/src/github.com/gardener/test-infra
export GO111MODULE=on
go run -mod=vendor ./integration-tests/e2e -debug=true -k8sVersion=1.16.1 -cloudprovider=azure -testcasegroup="conformance"
echo "testsuite finished"
Which internally runs:
kubetest --provider=skeleton --extract=v1.16.2 --deployment=local --test --check-version-skew=false --test_args=--clean-start --ginkgo.dryRun=false --ginkgo.focus=\\[k8s\\.io\\]\\sSecurity\\sContext\\sWhen\\screating\\sa\\spod\\swith\\sprivileged\\sshould\\srun\\sthe\\scontain... -dump=/tmp/e2e.log
from k8s-conformance.
I don't have time to dig into why there is a delta @johnSchnake can you look quick to see what the delta is here?
from k8s-conformance.
Making notes as I review this:
- The line
Conformance test: not doing test setup.
comes fromhack/ginkgo_e2e.sh
which would make me assume this wasn't generated via running Sonobuoy (which is fine, but just FYI). Sonobuoy is recommended but not required. - The instructions for the first item in your list sap-cp-aws gives instructions that refer to a URL that gives me a 404. It indicates that they ran the tests via Sonobuoy using some stored YAML manifest. This, in theory, would be an acceptable way to run the tests (you dont have to use the CLI) but again the logs dont seem to indicate it was run via Sonobuoy (which doesn't use the ginkgo_e2e.sh script)
- I ran
sonobuoy run --plugin-env e2e.E2E_DRYRUN=true --kube-conformance-image-version=v1.15.2
to check the list of conformance tests for that version and confirmed (as stated in the ticket) that 215 tests are conformance tests despite only 212 being run on these. - The missing tests in the first case (we'd have to confirm them for the others) are:
- [sig-apps] Job should delete a job [Conformance]
- [sig-apps] ReplicationController should surface a failure condition on a common issue like exceeded quota [Conformance]
- [sig-network] DNS should provide DNS for ExternalName services [Conformance]
I will take a look and see if I can find a reason why those wouldn't have been chosen either by that hack script or some other reason.
What needs to happen IMO going forward is:
- the logs should print the focus/skip values used for ginkgo. This is printed in the pod logs but not as part of the e2e.log output.
- A validation script automates the checks that should occur (# of tests pass/skip/etc and the versions of the server/test match)
from k8s-conformance.
/assign @hh @johnSchnake
from k8s-conformance.
All 3 of those tests were also added to Conformance in v1.15; but the SHA in the logs for the test matches the SHA in my test run with 215 test so I don't think that could account for it.
@OlegLoewen You were the one to submit for at least some of these; could you clarify how these runs were obtained? They indicate they used a sonobuoy yaml script which hasn't existed for some time and also wouldn't have matched the version of tests used here.
from k8s-conformance.
@johnSchnake and I will take a closer look at this together tomorrow
from k8s-conformance.
@johnSchnake and I will take a closer look at this together tomorrow
Let me know if I can helpful in any way.
I looked through the docs for conformance tests and noticed that the Reviewing Guidelines
only make loose statements on the number of test cases in number 5: https://github.com/cncf/k8s-conformance/blob/master/reviewing.md#technical-requirements
Would it make sense to set the number of test cases? Changing the number of test cases would then also require an update to the reviewing guidelines. But some verification through CI as you mentioned previously would be superior.
Additionally I noticed that the wording around sonobuoy
usage is quite vague. I understand that it is only recommended
from what @johnSchnake wrote - however the content description for a PR here makes it seem like sonobuoy
is required (which was my understanding so far).
Some guidelines around which execution methods for the test cases are actually allowed would help debugging situations like this IMO. I am not sure how feasible that would be as I have no overview of the currently used execution methods except sonobuoy.
from k8s-conformance.
The config here: https://github.com/gardener/test-infra/blob/68c3d60171fcb7d36b39d86935d14f5ea55064d6/integration-tests/e2e/kubetest/description/1.16/working.json#L3 makes it look the link between the testcasegroup
you mention and how to set the focus/skip on the actual run, is that correct?
FWIW, for conformance certification runs you should be just setting the focus to [Conformance]
and not setting a skip
value at all. I understand that in basic CI situations you may want to skip serial/slow though (Sonobuoy does something similar to avoid disruptive tests).
Here is the dir for all the v1.15 tests https://github.com/gardener/test-infra/tree/68c3d60171fcb7d36b39d86935d14f5ea55064d6/integration-tests/e2e/kubetest/description/1.15 and it just seems like those tests aren't listed in any of the working/skip/false-positive files so I must be misunderstanding a component of how the test list is run.
from k8s-conformance.
The mentioned test cases weren't conformance-tagged in previous k8s release version. So since I have reused the description file of previous k8s release version for 1.15, these test cases were already assigned to the group fast/slow but not conformance, which is the reason why they have been discarded during the test suite execution. This commit is the quick fix, but I will think of something to prevent this kind of an issue in future. Also the commit somehow helps to understand the actual issue/bug.
Regarding fokus skip fields in the description file:
Our main test case groups are fast, slow and conformance (which can be combined). Test cases of the fast group are executed parallely with --ginkgo-parallel=8
to have faster results. If we run only the fast group, we still want to have the fast conformance test cases to be run as well but we don't want to have any serial or slow tagged conformance tests. To accomplish this, we use the skip and focus fields.
If we run e2e with -testcasegroup=conformance
, then these line are taking into account to calculate the conformance testcases:
{ "testcase": "[Conformance]", "focus": "Serial|Slow", "groups": ["slow", "conformance"]},
{ "testcase": "[Conformance]", "skip": "Serial|Slow", "groups": ["fast", "conformance"]},
FWIW, for conformance certification runs you should be just setting the focus to [Conformance] and not setting a skip value at all [...]
If this is a requirement and using a list of test cases is not an alternative, we can do this as well.
from k8s-conformance.
I think this raises good questions about what is required for conformance certification.
As I read it, Sonobuoy is optional but that the conformance run should be executed in a single test run rather than multiple which are spliced together. I think that makes the most sense as the instructions request a single log and junit result file.
You could argue that you want to merge the junit results together but in that case you'd at least need to upload all of the log files separately.
In addition, allowing N runs to be run at separate times but combined to meet certification seems like a loosening of requirements since clusters could undergo changes between the different runs. Everything is much more clear/standardized if we just require a single run (though again, parallelizing for CI makes sense).
from k8s-conformance.
I agree with @johnSchnake regarding enforcement a specific list of tests in a single test run.
I've created a ticket for creating a Prow Job to run before submissions can be accepted.
from k8s-conformance.
AFAICT test runs are now automatically checked for completeness. Should this issue be closed?
from k8s-conformance.
Related Issues (20)
- README.md : Fix spelling and grammar HOT 2
- conformance v1.18.7 & v1.18.8 fails with Pod entered a fatal terminal status: failed HOT 1
- Missing conformance docs for 1.16-1.21
- Update verify-conformance-tests to have better xml parsing HOT 6
- Our installer product doesn't have a website yet HOT 2
- After conformance certification, service not displayed in 'Certified Kubernetes - Hosted '. HOT 2
- Kubernetes Conformance - Expired Certification HOT 3
- [cncf-ci checks failed] The acutal conformance test has passed. HOT 1
- Windows Operational Readiness formatting HOT 2
- Branch protection rules HOT 5
- Missing conformance docs for 1.22-1.25
- Add contact email as requiement for PRODUCT.yaml with Conformance submissions HOT 3
- GHA to Automate the generation of the Conformance Document HOT 1
- Scrape commit email addresses for CNCF/K8s-Conformance for Conformance submissions for release 1.23-1.25 HOT 2
- Roll out Prow Github actions on cncf/k8s-conformance for test in of /assign of multi github handles HOT 5
- Update reviewing.md HOT 5
- doc for extend conformance time period
- Reduce repo size to improve end user experience on submitting K8s Conformance pull requests HOT 2
- Document `hydrophone` as yet another option for running conformance tests HOT 3
- go get no longer working
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from k8s-conformance.