-
Notifications
You must be signed in to change notification settings - Fork 631
CI workflow with Github Actions #2192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4b29bf3
to
7081138
Compare
## Summary - initial PR for the series of the PRs - to migrate circleci to github actions ## Testing Done - TBD
0716d38
to
ac6f6af
Compare
Hi @mhratson , thanks for the change! Let's have more details included in the PR description. E.g. the issue link, what will happen after this merged, what will be the follow up steps, etc. This is a change that impacts every developer, so it would be better to be more clear. Thanks! |
This looks amazing. Thanks Maryan! We can ship this and improve things along the way when we find anything are missing. We probably need a few other PR check runs to validate this actually works fine. |
One more question: have we tested new tag publishing, or is that the plan when we merge this PR? I think we need to ensure the release publishing pipeline also works after this migration. |
This is tricky, as to test tag pushing we have to disable the circleCI. So won't be validated until cirlceCI workflow is removed. |
Got it. So it means that merging this PR itself won't trigger the switch from circleCI to github actions. We probably need to have another PR to disable it? |
Yup. Yeah, for now both pipelines run in parallel. While running tests in parallel is harmless running publishing in parallel is a race. So maybe i'll disable automatic publishing via GH actions until CircleCI is disabled. |
disable event handler until circleCI workflow is disabled to avoid race
This makes sense to me. Let's do it. |
Ran publishing1 for the latest tag, and looks like
CircleCI secrets[^3]
Footnotes |
## Summary ### Why 1. GIthub Actions workflow are native GH workflows 2. Github Actions do not require additional non-github accounts unlike CircleCI 3. plenty of compute resources[^0] available for OSS projects 4. unlike CircleCI resource limits (don't have details) [^0]:https://docs.github.com/en/actions/administering-github-actions/usage-limits-billing-and-administration#availability ### What 1. creates CI workflow `ci.yaml` 2. creates Artifactory workflow: `artifactory.yaml` Workflow structure is documented in the spec[^1] [^1]:https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions ## Expected Behavior CI is expected to 1. execute unit tests 1. execute integration tests 1. execute hw platform unit tests 1. publish artifacts to the artifactory when a tag is published 1. provide ability to re-run tests on failures 1. report results to corresponding PR/branch which is to be used as quality gates for PR merging. ## Actual Behavior 1. current Circle CI integration provides [1] [2] [3] [4] from the expected behavior 4. but re-run-ing checks requires additional efforts like logging in into the Circle CI 5. which slows PR feedback loop as users may not have CircleCI credentials and knowledge of the system [1]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L51-L53 [2]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L51-L53 [3]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L5-L34 [4]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L94-L103 ## Steps to reproduce 1. see failed PR checks, ie linkedin#2133 ## Known Workarounds 1. asking PR authors to trigger build ## Migration Plan 1. add GH Actions integration along with CircleCI 2. confirm GH Actions provide equivalent or better functionality 3. remove CircleCI integration 4. ensure publishing via GH actions works ## Categorization - [x] refactor
## Summary ### Why 1. GIthub Actions workflow are native GH workflows 2. Github Actions do not require additional non-github accounts unlike CircleCI 3. plenty of compute resources[^0] available for OSS projects 4. unlike CircleCI resource limits (don't have details) [^0]:https://docs.github.com/en/actions/administering-github-actions/usage-limits-billing-and-administration#availability ### What 1. creates CI workflow `ci.yaml` 2. creates Artifactory workflow: `artifactory.yaml` Workflow structure is documented in the spec[^1] [^1]:https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions ## Expected Behavior CI is expected to 1. execute unit tests 1. execute integration tests 1. execute hw platform unit tests 1. publish artifacts to the artifactory when a tag is published 1. provide ability to re-run tests on failures 1. report results to corresponding PR/branch which is to be used as quality gates for PR merging. ## Actual Behavior 1. current Circle CI integration provides [1] [2] [3] [4] from the expected behavior 4. but re-run-ing checks requires additional efforts like logging in into the Circle CI 5. which slows PR feedback loop as users may not have CircleCI credentials and knowledge of the system [1]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L51-L53 [2]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L51-L53 [3]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L5-L34 [4]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L94-L103 ## Steps to reproduce 1. see failed PR checks, ie linkedin#2133 ## Known Workarounds 1. asking PR authors to trigger build ## Migration Plan 1. add GH Actions integration along with CircleCI 2. confirm GH Actions provide equivalent or better functionality 3. remove CircleCI integration 4. ensure publishing via GH actions works ## Categorization - [x] refactor
* Upgrade simplekdc to 2.1.0 (linkedin#2186) This PR resolves linkedin#2178 Upgrading simplekdc version to "2.1.0" which supports a change that can correctly use security classes based on what version of IBM Semeru JDK(if applicable) is being used. There is no regression observed using Semeru, OpenJDK and Temurin JDKs. This newer version(released on 14 August 2024) also caters vulnerability in deps mentioned linkedin#2179 as **org.jboss.xnio:xnio-api** is updated to **3.8.16**[^1] [^1]:https://github.com/apache/directory-kerby/releases/tag/kerby-all-2.1.0#:~:text=Bump%20org.jboss.xnio%3Axnio%2Dapi%20from%203.8.15.Final%20to%203.8.16.Final). * remove unused KafkaSampleStore#_skipSampleStoreTopicRackAwarenessCheck (linkedin#2183) left over from linkedin#1572 (6ae3f41) * Test logging fix, by default log4j2 looks for log4j2.properties file (linkedin#2181) `log4j.properties` files are ignored in the test resources, after renamed, finally I was able to change the loglevels while unit/integration testing. I'm not sure if it was the issue on issue linkedin#2152, but this would be the fix for tests. Prod should work with the log4j.properties file as that is passed with -Dlog4j.configurationFile java opt * fix typo in comment (linkedin#2189) Fix 'the the' in the comments * new PR template (linkedin#2191) ## Summary Why: Improve PR quality and review-ability. What: modifies current PR template to be structured and require more details when submitting PRs. ## Expected Behavior PR must come with sufficient details to address or explain the issue. ## Actual Behavior PR template only requires link to the issue: ``` This PR resolves #<Replace-Me-With-The-Issue-Number-Addressed-By-This-PR>. ``` ## Steps to reproduce 1. either create a new PR or 2. see [the current template](https://github.com/linkedin/cruise-control/blob/c5545ef04618b5b42290edda2ee63eb6bfa2e1a6/docs/pull_request_template.md) ## Known Workarounds People voluntarily provide additional details ## Additional Evidence - n/a ## Categorization - [x] refactor * CI workflow with Github Actions (linkedin#2192) ## Summary ### Why 1. GIthub Actions workflow are native GH workflows 2. Github Actions do not require additional non-github accounts unlike CircleCI 3. plenty of compute resources[^0] available for OSS projects 4. unlike CircleCI resource limits (don't have details) [^0]:https://docs.github.com/en/actions/administering-github-actions/usage-limits-billing-and-administration#availability ### What 1. creates CI workflow `ci.yaml` 2. creates Artifactory workflow: `artifactory.yaml` Workflow structure is documented in the spec[^1] [^1]:https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions ## Expected Behavior CI is expected to 1. execute unit tests 1. execute integration tests 1. execute hw platform unit tests 1. publish artifacts to the artifactory when a tag is published 1. provide ability to re-run tests on failures 1. report results to corresponding PR/branch which is to be used as quality gates for PR merging. ## Actual Behavior 1. current Circle CI integration provides [1] [2] [3] [4] from the expected behavior 4. but re-run-ing checks requires additional efforts like logging in into the Circle CI 5. which slows PR feedback loop as users may not have CircleCI credentials and knowledge of the system [1]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L51-L53 [2]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L51-L53 [3]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L5-L34 [4]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L94-L103 ## Steps to reproduce 1. see failed PR checks, ie linkedin#2133 ## Known Workarounds 1. asking PR authors to trigger build ## Migration Plan 1. add GH Actions integration along with CircleCI 2. confirm GH Actions provide equivalent or better functionality 3. remove CircleCI integration 4. ensure publishing via GH actions works ## Categorization - [x] refactor * Update README.md * Set Embedded Zookeeper listen on 127.0.0.1 (linkedin#2196) ## Summary 1. Why: when on VPN, I can't run Cruise Control tests as ZK is binding to local real ip address and local network is restricted. 2. What: changing to bind to 127.0.0.1 fixes it (got the idea from Kafka embedded ZK setup. I think it won't make any difference how automation or human would run the tests, pls correct me if I'm wrong. * Add "documentation" category to PR template (linkedin#2195) ## Summary 1. Why: to categorize documentation PRs 2. What: adds "documentation" category to the PR template ## Expected Behavior - when users make documentation changes - they should be able to specify documentation as a change category ## Actual Behavior - no documentation category to specify * Add missing documentation for minNumBrokersViolateMetricLimit ## Summary 1. Why: Documentation for **min.num.brokers.violate.metric.limit.to.decrease.cluster.concurrency** is missing. 2. What: document the setting * Add more logging to help debugging the time spent on goal based operation (linkedin#2202) * Add more logging to help debugging the time spent on goal based operation * Update cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/async/progress/OperationProgress.java Co-authored-by: Maryan Hratson <[email protected]> * Update cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/servlet/handler/async/runnable/GoalBasedOperationRunnable.java Co-authored-by: Maryan Hratson <[email protected]> * Update cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/servlet/handler/async/runnable/GoalBasedOperationRunnable.java Co-authored-by: Maryan Hratson <[email protected]> --------- Co-authored-by: Maryan Hratson <[email protected]> * Fix the Circle CI error by removing the test-multi-arch job (linkedin#2203) * Remove test-multi-arch job in Circle CI * Remove test-multi-arch definitions * Improve per task observability through additional logging (linkedin#2204) * Fix the issue that uuid is null in the log after execution * Add more logging to track the task with its UUID * Rename "task" to "User task" in logging * Reformat logging * Fixing Unexpected method calls: HttpSession.invalidate (linkedin#2201) ## Summary 1. Why: The test failed sometimes with unexpected method calls. 2. What: The fix is preparing the test to accept invalidate method call too ## Expected Behavior Tests are running without failure ## Actual Behavior Tests are failing sometimes with unexpected method call. ## Steps to Reproduce 1. setup repeated run on e.g. `testCreateUserTask` in IDE 2. observe failure after multiple successful runs (for me it was failing after around 250 successful runs) ## Additional evidence ``` java.lang.AssertionError: On mock #2 (zero indexed): Unexpected method calls: HttpSession.invalidate() at org.easymock.EasyMock.getAssertionError(EasyMock.java:2230) at org.easymock.EasyMock.verify(EasyMock.java:2058) at com.linkedin.kafka.cruisecontrol.servlet.UserTaskManagerTest.testCreateUserTask(UserTaskManagerTest.java:59) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) at com.sun.proxy.$Proxy5.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176) at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129) at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100) at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60) at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56) at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:113) at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65) at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69) at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74) ``` ## Categorization - [x] bugfix - [ ] new feature - [ ] refactor - [ ] CVE - [ ] other * Update the README (linkedin#2216) This is a minor improvement to the README.md. * fix: Fix CVEs (linkedin#2220) Update dependencies to fix CVEs: Zookeeper, Netty, Jetty, Nimbus JOSE+JWT * Fix: intra.broker.goals cannot be configured as default.goals (linkedin#2221) * Kerberos auth to local rules support (linkedin#2043) * Expose AdminClient exception when failing to describe the cluster (linkedin#2222) * Fix PartitionSizeAnomalyFinder, to be able to handle custom SELF_HEALING_PARTITION_SIZE_THRESHOLD_MB values (linkedin#2212) * Upgrade Kafka to 3.8.0 (linkedin#2180) * Upgrading kafka to 3.8.0 - config properties rewriting and adding necessary dependencies # Conflicts: # gradle.properties * Upgrading kafka to 3.8.0 - using alternative for removed getAllTopicConfigs zk admin client method * Upgrading kafka to 3.8.0 - adding 3.8 zk client creation way * Upgrading kafka to 3.8.0 - adding 3.8 network client creation way * replication/quota/topic log constants moved in 3.8 again its value hasn't changed, only where it was stored, this way it's backward compatible * Update usages of Metadata to conform to kafka 3.7 interface --------- Co-authored-by: David Simon <[email protected]> * Rectify docker run command for s390x (linkedin#2249) * Make startup more robust and prevent auto topic creation when using CruiseControlMetricsReporterSampler (linkedin#2211) * Update license to reflect the latest status (linkedin#2256) * Catch NoSuchFileException on load failed brokers list (linkedin#2255) * Replace deprecated methods to support Kafka 4.0.0 (linkedin#2254) * Upgrade Kafka to 3.8.0 (linkedin#2180) * Upgrading kafka to 3.8.0 - config properties rewriting and adding necessary dependencies * Upgrading kafka to 3.8.0 - using alternative for removed getAllTopicConfigs zk admin client method * Upgrading kafka to 3.8.0 - adding 3.8 zk client creation way * Upgrading kafka to 3.8.0 - adding 3.8 network client creation way * replication/quota/topic log constants moved in 3.8 again its value hasn't changed, only where it was stored, this way it's backward compatible * Update usages of Metadata to conform to kafka 3.7 interface --------- Co-authored-by: David Simon <[email protected]> * Use literal config name for listeners and broker.id config (linkedin#2169) * Disabling test and integration-test in github-workflows --------- Co-authored-by: yasiribmcon <[email protected]> Co-authored-by: Lee Dongjin <[email protected]> Co-authored-by: Andras Katona <[email protected]> Co-authored-by: wonjong-yoo <[email protected]> Co-authored-by: Maryan Hratson <[email protected]> Co-authored-by: ik <[email protected]> Co-authored-by: Allen Wang <[email protected]> Co-authored-by: Hao Geng <[email protected]> Co-authored-by: Aswin A <[email protected]> Co-authored-by: Kondrat Bertalan <[email protected]> Co-authored-by: Tamas Barnabas Egyed <[email protected]> Co-authored-by: harmadasg <[email protected]> Co-authored-by: David Simon <[email protected]> Co-authored-by: Paolo Patierno <[email protected]> Co-authored-by: Shubham Rawat <[email protected]> Co-authored-by: Henry Haiying Cai <[email protected]> Co-authored-by: Daniel Vaseekaran <[email protected]>
Summary
Why
What
ci.yaml
artifactory.yaml
Workflow structure is documented in the spec2
Expected Behavior
CI is expected to
which is to be used as quality gates for PR merging.
Actual Behavior
Steps to reproduce
Known Workarounds
Migration Plan
Categorization
Example
This PR has both GH Action and CircleCI checks running:
Note how CircleCI doesn't report timing, and is actually slower in current configuration, making GH action complete workflows in 32m vs 40m

Footnotes
https://docs.github.com/en/actions/administering-github-actions/usage-limits-billing-and-administration#availability ↩
https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions ↩