-
Notifications
You must be signed in to change notification settings - Fork 67
providers/base : add tests for Intel QAT (New) #1795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c762ecf
to
6cacc6d
Compare
076bda5
to
5650fda
Compare
I made a couple inline comments... I'm interested in including this in the server suite. Mostly my questions are around the dependency on the detect job, which simply checks that someone ticked Y on a manifest entry (server suite doesn't use manifests, it uses resources). Would it be reasonable for the detect job to pass if either condition is there (e.g. the resource job returns a and add that extra bit to the resource similar to other resource constraints in use? so... something like this for the detect job:
And keep the rest of it as-is with the dependency on the detect job? |
8432a90
to
a889a1d
Compare
Thanks @bladernr for your feedback ! Based on your comments, I re-designed the tests, please take a look and let me know things you would want to be improved. |
d462f2b
to
0561f5c
Compare
@bladernr I just saw your previous comment :
Manifest entries can be entered manually when running Checkbox, but they can also be pre-filled (it's just a filke stored in Also, manifest entries and resources do not serve exactly the same purpose. With a manifest, you tell Checkbox that this device does have a given piece of hardware, or a specific feature enabled. Resources retrieve the information automatically from the system, which may lead in jobs being skipped if the resource script fails, or if the driver is not properly loaded and therefore the feature is not exposed to the user. A typical example is The tutorial has a whole page about how manifests work, you can check it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long time to provide feedback! Please check my comments to see if they make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide unit tests for this file?
unit: template | ||
template-resource: qat | ||
template-engine: jinja2 | ||
template-unit: job | ||
id: intel-qat-common/{{ available }}-attach-devices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit: template | |
template-resource: qat | |
template-engine: jinja2 | |
template-unit: job | |
id: intel-qat-common/{{ available }}-attach-devices | |
unit: template | |
template-resource: qat | |
template-unit: job | |
id: intel-qat-common/{available}-attach-devices |
Template jobs use python string formatting by default. I don't think jinja2 is needed here (nor in any of the following template jobs in this file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! you are right but I tend to use jinja as the default every time I have to deal with template job, I do not see any strong reason to switch to a less powerful template engine, can we consider using it even if it is not mandatory right now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually have a plan to remove the jinja2 templating option in the future, because it creates more problems than it solves.
In your template job, the only use of templates are to replace variables, so it's really a matter of replacing {{ variable }}
with {variable}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S.: you're not the only one to get tricked into using jinja2 template engine, so we created a document with a list of dos and don'ts about Checkbox test authoring, and one section covers this:
https://github.com/canonical/checkbox/tree/main/providers#checkbox-templates
package.name == 'qatlib-examples' | ||
package.name == 'qatlib-service' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these package made available on the system? Is it a package to pull from the official repos? Is it something that needs building?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These packages are available in the archive (universe) from Ubuntu 24.04 onwards
# switch all devices to crypto sym mode | ||
printf "POLICY=0\nServicesEnabled=sym\n" | tee /etc/sysconfig/qat | ||
systemctl restart qat | ||
cpa_sample_code runTests=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is cpa_sample_code
made available? (I assume it's part of the packages mentioned above?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tool is part of the qatlib-examples
package
rmmod vfio-pci || true | ||
nb_vfio=$(qatctl.py status --devices {{ pf }} --vfio | wc -l) | ||
[ "$nb_vfio" -le 0 ] || (echo "nb vfio devices should be <= 0" && exit 1) | ||
# we have to pass the VF device ids | ||
modprobe vfio-pci ids=8086:4941,8086:4943,8086:4945,8086:4947 | ||
nb_vfio=$(qatctl.py status --devices {{ pf }} --vfio | wc -l) | ||
[ "$nb_vfio" -gt 0 ] || (echo "nb vfio devices should be > 0" && exit 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several suggestions (some apply to other jobs in this PR too):
- Put these into a separate bash script in
bin/
, and start it withset -e
so that it fails as soon as possible if anything goes wrong. - a
check
flag could be added to theqatctl.py
script to avoid relying on additional bash commands (such as[ "$nb_vfio" -le 0 ] || (echo "nb vfio devices should be <= 0" && exit 1)
) - maybe this test could be split in 2 (the second test would depend on the first):
- check that no VFIO files are present if the
vfio-pci
module is removed - check that VFIO are there when the module is reloaded
- if something goes wrong before you reload the
vfio-pci
module, all the tests running after will fail, so you probably have to make sure the modules are reloaded regardless of the outcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 - I would agree to put the command steps in a bash script if it can be reused some where else (via script arguments) or the command section is quite long, but in this case, there are only 6 steps, I m wondering if it is a good idea,
2 - Good idea !
3 - I m not totally convinced by this suggestion, even if we split this test into 2 tests, I don't think we want the second one to depend on the first one because that means the first one will unload the vfio_pci module and the second one will load it. In that case, can checkbox make sure that there is no test that is interleaved between those two tests and this test might need the vfio_pci to be loaded ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It's up to you. However, adding
set -e
is required, otherwise you may run things after failed commands, which is probably not desired. - 👍
- You're right, it's probably better to keep it coupled then.
for pf in ${PFS}; do | ||
driver_path=$(readlink /sys/bus/pci/devices/0000:"${pf}"/driver) | ||
driver=$(basename "${driver_path}") | ||
if [ "${driver}" == "4xxx" ] || [ "${driver}" == "420xx" ]; then | ||
echo "pf: ${pf}" | ||
echo "driver: ${driver}" | ||
echo "available: qat" | ||
echo "" | ||
break | ||
fi | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be put into the python script directly (maybe as a qatctl.py resource
command). Easier to test and to maintain.
# Hector Cao <[email protected]> | ||
|
||
unit: job | ||
id: qat_pf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resource job looks like it's doing exactly the same thing as qat
below, except it doesn't add the available
field. Consider removing this and rely on qat
only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is quite subtle but the 2 resources are not doing the samething, the qat
resource only output the first QAT PF (physical device) (see the break in the control loop)
bde46ad
to
e1cb4a0
Compare
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (88.92%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1795 +/- ##
==========================================
+ Coverage 51.66% 51.95% +0.29%
==========================================
Files 386 387 +1
Lines 41492 41830 +338
Branches 7711 7714 +3
==========================================
+ Hits 21435 21732 +297
- Misses 19293 19346 +53
+ Partials 764 752 -12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
QAT (Intel QuickAssist Technology) is an accelerator for crypto/compression operations. The hardward is available on recent Intel Xeon processors.
Co-authored-by: Pierre Equoy <[email protected]>
Co-authored-by: Pierre Equoy <[email protected]>
Co-authored-by: Pierre Equoy <[email protected]>
9eb7d51
to
3e4b307
Compare
3e4b307
to
a71d9a7
Compare
@hector-cao can you check the tox error?
|
Description
This MR adds tests for Intel Intel QuickAssist Technology (QAT) devices. QAT is an accelerator for crypto/compression operations, available on Intel server-class hardware equipped with XEON processors.
Resolved issues
Documentation
Tests
I run these tests on Intel hardwares with/without QAT devices.
On hardware without QAT support, no test will be run (just 1 test is skipped, no tests are generated from the job templates).
On a machine with
402xx
QAT device (402xx
driver), here is the sample output of the results: