Skip to content

Commit ac6d7c5

Browse files
authored
Merge pull request #28 from intel/2025Q1
Release 24Q1
2 parents bb8bfcc + f7a4056 commit ac6d7c5

File tree

137 files changed

+5918
-1022
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

137 files changed

+5918
-1022
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,6 @@
1212
*.sublime-project
1313
*.sublime-workspace
1414
*~
15-
15+
*.o
16+
*.so
1617
*.out

DEV.md

Lines changed: 46 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -23,20 +23,6 @@ version = 2
2323
cdi_specs_dir = ["/etc/cdi", "/var/run/cdi"]
2424
```
2525

26-
# Generated source code
27-
28-
Custom resource definitions are in `pkg/intel.com/resource/<accelerator>/<apiversion>/*.go`
29-
(except generated `zz_deepcopy`) and in `pkg/intel.com/resource/<accelerator/<apiversion>/api/`.
30-
31-
When changing those CRDs, remember to re-generate the YAMLs and clientset by running:
32-
```bash
33-
make generate
34-
```
35-
36-
## Required tools
37-
38-
Above step needs `controller-gen` and `client-gen` tools to generate CRD YAMLs (in `deployments/gpu/static/crd/...`).
39-
4026
### Determine your go binaries location from `go install --help`, quote:
4127
> Executables are installed in the directory named by the GOBIN environment
4228
> variable, which defaults to $GOPATH/bin or $HOME/go/bin if the GOPATH
@@ -75,3 +61,49 @@ cp controller-tools/controller-gen code-generator/client-gen $HOME/go/bin
7561
# ensure it's in the path. You may want to add export to $HOME/.bashrc
7662
echo $PATH | grep -q $HOME/go/bin || export PATH=$HOME/go/bin:$PATH
7763
```
64+
# Running tests
65+
66+
Since Q2 '25 Gaudi DRA driver uses `gohlml` to retrieve health-related information.
67+
There is a hardcoded path to the HLML shared library, and `hack/fake_libhlml` was created based
68+
on the `hlml.h` from `gohlml` project - it is effectively a stub / mock with flow control support.
69+
70+
When health-related tests call `gohlml` - it should in turn call fake `libhlml`, instead of the real
71+
one, on the nodes where there is no real Gaudi HW and SW installed (e.g. CI). This means, if the
72+
tests are run on your development machine - you should either deploy fresh fake `libhlml.so`, or
73+
run tests in a `gaudi-dra-driver-test-image` container like CI does.
74+
75+
Deploying fake hlml instead of real `libhlml` should allow running tests in VSCode and other IDEs,
76+
after `ldconfig` is [configured properly](hack/fake_libhlml/README.md)
77+
78+
## Deploying
79+
```shell
80+
$ cd hack/fake_libhlml
81+
$ make clean
82+
rm -f fake_libhlml.o fake_libhlml.so
83+
$ make
84+
gcc -O -Wall -Wextra -Wno-unused-parameter -fPIC -c fake_libhlml.c -o fake_libhlml.o
85+
gcc -shared -o fake_libhlml.so fake_libhlml.o
86+
$ sudo cp ./fake_libhlml.so /usr/lib/habanalabs/libhlml.so
87+
$ cat << EOF | sudo tee /etc/ld.so.conf.d/habanalabs.conf
88+
/usr/lib/habanalabs/
89+
EOF
90+
91+
$ sudo ldconfig
92+
```
93+
94+
## Running tests in container
95+
96+
To have your own user ID inside container image without access / permission issues, build a fresh
97+
container image, then run tests. The CI uses its own user ID.
98+
99+
```shell
100+
$ make test-image
101+
$ make test-containerized
102+
```
103+
104+
Tests provide coverage data. If you need to see the coverage report, just run Make target for needed
105+
coverage target, e.g.
106+
107+
```
108+
make gaudi-coverage
109+
```

Dockerfile.device-faker

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
FROM golang:1.23.4@sha256:70031844b8c225351d0bb63e2c383f80db85d92ba894e3da7e13bcf80efa9a37 as build
2+
ARG LOCAL_LICENSES
3+
WORKDIR /build
4+
COPY . .
5+
6+
RUN make bin/device-faker && \
7+
mkdir -p /install_root && \
8+
if [ -z "$LOCAL_LICENSES" ]; then \
9+
make licenses; \
10+
fi && \
11+
cp -r licenses /install_root/ && \
12+
cp bin/device-faker /install_root/
13+
14+
15+
FROM alpine AS template
16+
COPY --from=build /install_root/device-faker /device-faker
17+
18+
19+
RUN mkdir -p /opt/templates && \
20+
/device-faker gpu -n && \
21+
mv /tmp/gpu-template-*.json /opt/templates/gpu-template.json && \
22+
/device-faker gaudi -n && \
23+
mv /tmp/gaudi-template-*.json /opt/templates/gaudi-template.json && \
24+
chmod 644 /opt/templates/*.json
25+
26+
FROM scratch
27+
LABEL description="Intel Device Faker"
28+
COPY --from=build /install_root/device-faker /device-faker
29+
COPY --from=template /opt/templates /opt/templates
30+
ENTRYPOINT ["/device-faker"]

Dockerfile.gaudi

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,22 +12,59 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
FROM golang:1.23.4@sha256:70031844b8c225351d0bb63e2c383f80db85d92ba894e3da7e13bcf80efa9a37 as build
15+
ARG HTTP_PROXY
16+
ARG HTTPS_PROXY
17+
ARG NO_PROXY
18+
19+
FROM golang:1.23.4@sha256:ccdca3b3bde3bfee2518a087b467f2b452fad9ba3e378d3c1578db494c8cb13b as build
1620
ARG LOCAL_LICENSES
1721
WORKDIR /build
1822
COPY . .
1923

24+
# install libhlml.so
25+
RUN \
26+
export http_proxy=${HTTP_PROXY} https_proxy=${HTTPS_PROXY} no_proxy=${NO_PROXY} && \
27+
curl -fsSL https://vault.habana.ai/artifactory/api/gpg/key/public | gpg --dearmor | tee /etc/apt/trusted.gpg.d/habanalabs.gpg > /dev/null && \
28+
wget -q -O /etc/apt/sources.list.d/habanalabs_synapseai.list "https://vault.habana.ai/artifactory/gaudi-installer/repos/1.16.2/debian10.10/habanalabs_synapseai.list" > /dev/null && \
29+
apt-get update && \
30+
apt-get download habanalabs-firmware-tools && \
31+
ls -al && \
32+
dpkg --force-all -i *.deb
33+
2034
RUN make gaudi && \
2135
mkdir -p /install_root && \
2236
if [ -z "$LOCAL_LICENSES" ]; then \
2337
make licenses; \
2438
fi && \
2539
cp -r licenses /install_root/ && \
40+
mkdir /install_root/licenses/habanalabs && \
41+
cp /usr/share/doc/habanalabs-firmware-tools/* /install_root/licenses/habanalabs/ && \
2642
cp bin/kubelet-gaudi-plugin /install_root/
2743

44+
# Get libc and sources from Ubuntu24, libhlml needs GLIBC_2.38
45+
FROM ubuntu:24.04@sha256:80dd3c3b9c6cecb9f1667e9290b3bc61b78c2678c02cbdae5f0fea92cc6734ab as ubuntu
46+
RUN \
47+
cat /etc/apt/sources.list.d/ubuntu.sources && \
48+
sed -i 's/^Types: deb$/Types: deb deb-src/' /etc/apt/sources.list.d/ubuntu.sources && \
49+
apt-get update && \
50+
apt-get install -y dpkg-dev && \
51+
mkdir /tmp/src && \
52+
cd /tmp/src && \
53+
apt-get source libc6 coreutils dash
2854

2955
FROM scratch
30-
WORKDIR /
3156
LABEL description="Intel Gaudi resource driver for Kubernetes"
3257

3358
COPY --from=build /install_root /
59+
COPY --from=build /usr/lib/habanalabs/libhlml.so /usr/lib/habanalabs/libhlml.so
60+
COPY --from=ubuntu /lib/x86_64-linux-gnu/libc.so.6 /lib/x86_64-linux-gnu/libc.so.6
61+
COPY --from=ubuntu /lib64/ld-linux-x86-64.so.2 /lib64/ld-linux-x86-64.so.2
62+
COPY --from=ubuntu /usr/lib/x86_64-linux-gnu/libm.so.6 /usr/lib/x86_64-linux-gnu/libm.so.6
63+
COPY --from=ubuntu /usr/lib/x86_64-linux-gnu/libdl.so.2 /usr/lib/x86_64-linux-gnu/libdl.so.2
64+
COPY --from=ubuntu /usr/lib/x86_64-linux-gnu/libz.so.1 /usr/lib/x86_64-linux-gnu/libz.so.1
65+
COPY --from=ubuntu /bin/cat /bin/cat
66+
COPY --from=ubuntu /bin/sh /bin/sh
67+
COPY --from=ubuntu /tmp/src/*tar.xz /src/
68+
69+
ENV LD_LIBRARY_PATH=/usr/lib/habanalabs:/lib/x86_64-linux-gnu:/lib64:/usr/lib/x86_64-linux-gnu
70+
ENV PATH=/bin

Dockerfile.gaudi-test

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Copyright (c) 2025, Intel Corporation. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
FROM golang:1.23.4@sha256:ccdca3b3bde3bfee2518a087b467f2b452fad9ba3e378d3c1578db494c8cb13b as build
15+
WORKDIR /build
16+
COPY . .
17+
18+
RUN cd hack/fake_libhlml && \
19+
make clean && make
20+
21+
FROM golang:1.23.4@sha256:ccdca3b3bde3bfee2518a087b467f2b452fad9ba3e378d3c1578db494c8cb13b
22+
ARG UID=1001
23+
ARG GID=1001
24+
25+
COPY --from=build /build/hack/fake_libhlml/fake_libhlml.so /usr/lib/habanalabs/libhlml.so
26+
27+
RUN \
28+
echo "existing user: $(id $UID)" && \
29+
groupadd -g ${GID} ubuntu && \
30+
useradd -m -g ${GID} -u ${UID} -s /bin/bash ubuntu && \
31+
mkdir /github && \
32+
chmod 777 /github
33+
34+
RUN \
35+
mkdir -m 755 /home/ubuntu/.cache/ && \
36+
mkdir -m 755 /home/ubuntu/.cache/go-build && \
37+
mkdir -m 755 /home/ubuntu/.cache/go-mod && \
38+
chown -R ubuntu:ubuntu /home/ubuntu/.cache && \
39+
mkdir /home/ubuntu/src && \
40+
git config --global --add safe.directory /home/ubuntu/src
41+
42+
ENV GOCACHE=/home/ubuntu/.cache/go-build
43+
ENV GOMODCACHE=/home/ubuntu/.cache/go-mod
44+
45+
USER ubuntu
46+
WORKDIR /home/ubuntu

Makefile

Lines changed: 61 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,11 @@ GIT_COMMIT = $(shell git rev-parse HEAD)
2727
BUILD_DATE = $(shell date -u +"%Y-%m-%dT%H:%M:%SZ")
2828
GIT_BRANCH ?= $(shell git branch --show-current)
2929

30+
TEST_IMAGE ?= gaudi-dra-driver-test-image:latest
31+
3032
EXT_LDFLAGS = -static
3133
LDFLAGS = \
32-
-s -w -extldflags $(EXT_LDFLAGS) \
34+
-s -w \
3335
-X ${PKG}/pkg/version.gitCommit=${GIT_COMMIT} \
3436
-X ${PKG}/pkg/version.buildDate=${BUILD_DATE}
3537

@@ -69,13 +71,16 @@ include $(CURDIR)/qat.mk
6971
build: gpu gaudi qat bin/intel-cdi-specs-generator bin/device-faker
7072

7173

74+
7275
bin/intel-cdi-specs-generator: cmd/cdi-specs-generator/*.go $(GPU_COMMON_SRC)
7376
CGO_ENABLED=0 GOOS=linux GOARCH=${ARCH} \
74-
go build -a -ldflags "${LDFLAGS}" -mod vendor -o $@ ./cmd/cdi-specs-generator
77+
go build -a -ldflags "${LDFLAGS} -extldflags $(EXT_LDFLAGS)" \
78+
-mod vendor -o $@ ./cmd/cdi-specs-generator
7579

7680
bin/device-faker: cmd/device-faker/*.go
7781
CGO_ENABLED=0 GOOS=linux GOARCH=${ARCH} \
78-
go build -a -ldflags "${LDFLAGS}" -mod vendor -o $@ ./cmd/device-faker
82+
go build -a -ldflags "${LDFLAGS} -extldflags ${EXT_LDFLAGS}" \
83+
-mod vendor -o $@ ./cmd/device-faker
7984

8085

8186
.PHONY: branch-build
@@ -109,9 +114,6 @@ cleanall: clean
109114
.PHONY: rm-clientsets
110115
rm-clientsets: rm-gpu-clientset rm-gaudi-clientset
111116

112-
.PHONY: generate
113-
generate: generate-gpu-crd generate-gaudi-crd
114-
115117
.PHONY: generate-deepcopy
116118
generate-deepcopy: generate-gpu-deepcopy generate-gaudi-deepcopy
117119

@@ -159,7 +161,7 @@ licenses: clean-licenses
159161
# linting targets for Go and other code
160162
.PHONY: lint format cilint vet shellcheck yamllint
161163

162-
lint: format cilint vet klogformat shellcheck yamllint
164+
lint: vendor format cilint vet klogformat shellcheck yamllint
163165

164166
format:
165167
gofmt -w -s -l ./
@@ -187,18 +189,62 @@ yamllint:
187189
@echo -e "\nyamllint: lint non-templated YAML files:"
188190
git ls-files '*.yaml' | xargs grep -L '^ *{{-' | xargs yamllint -d relaxed --no-warnings
189191

192+
.PHONE: test-image test-image-push
193+
test-image: vendor
194+
@echo "Building container image with fake HLML for Gaudi tests with user $(shell id -u):$(shell id -g)"
195+
$(DOCKER) build \
196+
--build-arg UID=$(shell id -u) --build-arg GID=$(shell id -g) \
197+
--platform="linux/$(ARCH)" \
198+
-t "$(TEST_IMAGE)" -f Dockerfile.gaudi-test .
199+
200+
test-image-push: test-image
201+
$(DOCKER) push "$(TEST_IMAGE)"
190202

191-
.PHONY: test coverage
203+
.PHONY: test html-coverage test-containerized
192204
COVERAGE_FILE := coverage.out
205+
# Gaudi tests expect fake HLML library to be present at /usr/lib/habanalabs/libhlml.so
206+
# Dependency comes from gohlml package hardcoded LD_LIBRARY_PATH pointing to it.
193207
test:
194-
go test -v -coverprofile=$(COVERAGE_FILE) $(shell go list ./... | grep -v "test/e2e")
208+
ifeq ("$(container)","yes")
209+
@echo setting safe directory
210+
go test -buildvcs=false -v -coverprofile=$(COVERAGE_FILE) $(shell go list ./... | grep -v "test/e2e")
211+
else
212+
@echo running tests
213+
go test -v -coverprofile=$(COVERAGE_FILE) $(shell go list ./... | grep -v "test/e2e")
214+
endif
215+
216+
test-containerized:
217+
$(DOCKER) run \
218+
-it -e container=yes \
219+
--user 1000:1000 \
220+
-v "$(shell pwd)":/home/ubuntu/src:rw \
221+
"$(TEST_IMAGE)" \
222+
bash -c "cd src && make test"
195223

196-
coverage: test
224+
html-coverage: $(COVERAGE_FILE)
197225
go tool cover -html=$(COVERAGE_FILE) -o coverage.html
198226
@echo coverage file: coverage.html
199-
@echo "average coverage (except main.go files)"
200-
grep '<option value=' coverage.html | grep -v 'main.go' | grep -o '(.*)' | tr -d '()%' | awk 'BEGIN{s=0;}{s+=$$1;}END{print s/NR;}'
201227

202-
.PHONY: e2e-qat
203-
e2e-qat:
204-
go test -v ./test/e2e/... --clean-start=true -ginkgo.v -ginkgo.trace -ginkgo.show-node-events
228+
$(COVERAGE_FILE): $(shell find cmd pkg -name '*.go')
229+
go test -v -coverprofile=$(COVERAGE_FILE) $(shell go list ./... | grep -v "test/e2e")
230+
231+
.PHONY: gpu-coverage gaudi-coverage qat-coverage cdispecsgen-coverage excluded-coverage
232+
233+
gpu-coverage: COVERAGE_EXCLUDE="cdi-specs-generator|device-faker|kubelet-gaudi-plugin|kubelet-qat-plugin|qat-showdevice|pkg/qat|pkg/gaudi|pkg/fakesysfs|plugintesthelpers"
234+
gpu-coverage: excluded-coverage
235+
# See: https://www.gnu.org/software/make/manual/html_node/Target_002dspecific.html
236+
237+
gaudi-coverage: COVERAGE_EXCLUDE="cdi-specs-generator|device-faker|kubelet-gpu-plugin|kubelet-qat-plugin|qat-showdevice|pkg/qat|pkg/gpu|pkg/fakesysfs|plugintesthelpers"
238+
gaudi-coverage: excluded-coverage
239+
240+
qat-coverage: COVERAGE_EXCLUDE="cdi-specs-generator|device-faker|kubelet-gpu-plugin|kubelet-gaudi-plugin|pkg/gpu|pkg/gaudi|pkg/fakesysfs|plugintesthelpers"
241+
qat-coverage: excluded-coverage
242+
243+
cdispecsgen-coverage: COVERAGE_EXCLUDE="device-faker|kubelet-gpu-plugin|kubelet-gaudi-plugin|kubelet-qat-plugin|qat-showdevice|pkg/qat|pkg/gpu|pkg/gaudi|pkg/fakesysfs|plugintesthelpers"
244+
cdispecsgen-coverage: excluded-coverage
245+
246+
COVERAGE_EXCLUDE ?= "$^"
247+
excluded-coverage: $(COVERAGE_FILE)
248+
@grep -v -E $(COVERAGE_EXCLUDE) $(COVERAGE_FILE) > $(COVERAGE_FILE).tmp && \
249+
go tool cover -func=$(COVERAGE_FILE).tmp && \
250+
rm $(COVERAGE_FILE).tmp

NOTICE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
These contents may have been developed with support from one or more Intel-operated generative artificial intelligence solutions.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Patterns to ignore when building packages.
2+
# This supports shell glob matching, relative path matching, and
3+
# negation (prefixed with !). Only one pattern per line.
4+
.DS_Store
5+
# Common VCS dirs
6+
.git/
7+
.gitignore
8+
# Common backup files
9+
*.swp
10+
*.bak
11+
*.tmp
12+
*.orig
13+
*~
14+
# Various IDEs
15+
.project
16+
.idea/
17+
*.tmproj
18+
.vscode/
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
apiVersion: v2
2+
name: intel-gaudi-resource-driver
3+
description: A Helm chart for a Dynamic Resource Allocation (DRA) Intel Gaudi Resource Driver
4+
5+
type: application
6+
version: 0.3.0
7+
appVersion: "v0.3.0"
8+
home: https://github.com/intel/intel-resource-drivers-for-kubernetes/charts
9+
10+
dependencies:
11+
- name: node-feature-discovery
12+
alias: nfd
13+
version: "0.17.1"
14+
condition: nfd.enabled
15+
repository: https://kubernetes-sigs.github.io/node-feature-discovery/charts
16+
17+
annotations:
18+
org.opencontainers.image.url: "https://github.com/intel/intel-resource-drivers-for-kubernetes"
19+
org.opencontainers.image.source: "https://github.com/intel/intel-resource-drivers-for-kubernetes"
20+
org.opencontainers.image.version: "0.3.0"
21+
org.opencontainers.image.title: "Intel Gaudi Resource Driver"
22+
org.opencontainers.image.description: "This chart installs the Intel Gaudi resource driver on Kubernetes."

0 commit comments

Comments
 (0)