Skip to content

providers/base : add tests for Intel QAT (New) #1795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hector-cao
Copy link
Collaborator

@hector-cao hector-cao commented Mar 14, 2025

Description

This MR adds tests for Intel Intel QuickAssist Technology (QAT) devices. QAT is an accelerator for crypto/compression operations, available on Intel server-class hardware equipped with XEON processors.

Resolved issues

Documentation

Tests

I run these tests on Intel hardwares with/without QAT devices.
On hardware without QAT support, no test will be run (just 1 test is skipped, no tests are generated from the job templates).

On a machine with 402xx QAT device (402xx driver), here is the sample output of the results:

==================================[ Results ]===================================
 ☑ : Hardware Manifest
 ☑ : Detect Intel QuickAssist Technology device (>= Gen4)
 ☑ : Check PF sysfs for 01:00.0
 ☑ : Check PF sysfs for 0b:00.0
 ☑ : Check PF sysfs for 81:00.0
 ☑ : Check PF sysfs for 8b:00.0
 ☑ : Check SR-IOV support 01:00.0
 ☑ : Check SR-IOV support 0b:00.0
 ☑ : Check SR-IOV support 81:00.0
 ☑ : Check SR-IOV support 8b:00.0
 ☑ : Check telemetry data in debugfs for 01:00.0
 ☑ : Check telemetry data in debugfs for 0b:00.0
 ☑ : Check telemetry data in debugfs for 81:00.0
 ☑ : Check telemetry data in debugfs for 8b:00.0
 ☑ : Check VFIO-PCI support 01:00.0
 ☑ : Check VFIO-PCI support 0b:00.0
 ☑ : Check VFIO-PCI support 81:00.0
 ☑ : Check VFIO-PCI support 8b:00.0
 ☑ : Bring up and down device 01:00.0
 ☑ : Bring up and down device 0b:00.0
 ☑ : Bring up and down device 81:00.0
 ☑ : Bring up and down device 8b:00.0
 ☑ : Attach devices list
 ☑ : Collect information about installed software packages
 ☑ : Run CPA symmetric crypto tests
 ☑ : Run CPA RSA tests
 ☑ : Run compression tests
 ☑ : Run CPA symmetric crypto tests (in standalone mode)
 ☑ : Run CPA RSA tests (in standalone mode)
 ☑ : Run compression tests (in standalone mode)

@hector-cao hector-cao force-pushed the dev-add-qat-tests branch 2 times, most recently from c762ecf to 6cacc6d Compare March 14, 2025 16:09
@hector-cao hector-cao changed the title providers/base : add tests for Intel QAT providers/base : add tests for Intel QAT (New) Mar 14, 2025
@hector-cao hector-cao force-pushed the dev-add-qat-tests branch 2 times, most recently from 076bda5 to 5650fda Compare March 14, 2025 16:19
@bladernr
Copy link
Collaborator

I made a couple inline comments... I'm interested in including this in the server suite. Mostly my questions are around the dependency on the detect job, which simply checks that someone ticked Y on a manifest entry (server suite doesn't use manifests, it uses resources).

Would it be reasonable for the detect job to pass if either condition is there (e.g. the resource job returns a pf: present or whatever if the QAT PFs are detected by qatctl.py, OR if the manifest is true)?

and add that extra bit to the resource similar to other resource constraints in use? so... something like this for the detect job:

requires: manifest.has_intel_qat == 'True' or pf.present == 'True'

And keep the rest of it as-is with the dependency on the detect job?

@hector-cao hector-cao force-pushed the dev-add-qat-tests branch 5 times, most recently from 8432a90 to a889a1d Compare March 17, 2025 13:08
@hector-cao
Copy link
Collaborator Author

I made a couple inline comments... I'm interested in including this in the server suite. Mostly my questions are around the dependency on the detect job, which simply checks that someone ticked Y on a manifest entry (server suite doesn't use manifests, it uses resources).

Would it be reasonable for the detect job to pass if either condition is there (e.g. the resource job returns a pf: present or whatever if the QAT PFs are detected by qatctl.py, OR if the manifest is true)?

and add that extra bit to the resource similar to other resource constraints in use? so... something like this for the detect job:

requires: manifest.has_intel_qat == 'True' or pf.present == 'True'

And keep the rest of it as-is with the dependency on the detect job?

Thanks @bladernr for your feedback !

Based on your comments, I re-designed the tests, please take a look and let me know things you would want to be improved.

QAT (Intel QuickAssist Technology) is an accelerator
for crypto/compression operations. The hardward is available
on recent Intel Xeon processors.
@pieqq
Copy link
Collaborator

pieqq commented May 12, 2025

@bladernr I just saw your previous comment :

Mostly my questions are around the dependency on the detect job, which simply checks that someone ticked Y on a manifest entry (server suite doesn't use manifests, it uses resources).

Manifest entries can be entered manually when running Checkbox, but they can also be pre-filled (it's just a filke stored in /var/tmp/checkbox-ng/machine-manifest.json).

Also, manifest entries and resources do not serve exactly the same purpose. With a manifest, you tell Checkbox that this device does have a given piece of hardware, or a specific feature enabled. Resources retrieve the information automatically from the system, which may lead in jobs being skipped if the resource script fails, or if the driver is not properly loaded and therefore the feature is not exposed to the user.

A typical example is has_wlan_adapter, which tells Checkbox whether or not the device has WiFi. The wireless/detect test will then try to find a wireless interface only if this manifest is set to True. This allows to catch situations where we know a device should have WiFi, but the driver failed to load.

The tutorial has a whole page about how manifests work, you can check it out.

Copy link
Collaborator

@pieqq pieqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long time to provide feedback! Please check my comments to see if they make sense.

@@ -0,0 +1,3 @@
unit: category
id: intel-qat
_name: Intel Quick-Assist Technology
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_name: Intel Quick-Assist Technology
_name: Intel QuickAssist Technology

As per the Intel page.

command:
PFS=$(qatctl.py list --short | wc -l)
if [ "${PFS}" -le 0 ]; then
echo "manifest.has_intel_qat is set to True but no device found !"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo "manifest.has_intel_qat is set to True but no device found !"
echo "This system is supposed to support Intel QuickAssist Technology, but no Intel QAT device were found!"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide unit tests for this file?

Comment on lines +24 to +28
unit: template
template-resource: qat
template-engine: jinja2
template-unit: job
id: intel-qat-common/{{ available }}-attach-devices
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unit: template
template-resource: qat
template-engine: jinja2
template-unit: job
id: intel-qat-common/{{ available }}-attach-devices
unit: template
template-resource: qat
template-unit: job
id: intel-qat-common/{available}-attach-devices

Template jobs use python string formatting by default. I don't think jinja2 is needed here (nor in any of the following template jobs in this file).

Comment on lines +137 to +138
package.name == 'qatlib-examples'
package.name == 'qatlib-service'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these package made available on the system? Is it a package to pull from the official repos? Is it something that needs building?

# switch all devices to crypto sym mode
printf "POLICY=0\nServicesEnabled=sym\n" | tee /etc/sysconfig/qat
systemctl restart qat
cpa_sample_code runTests=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is cpa_sample_code made available? (I assume it's part of the packages mentioned above?)

Comment on lines +104 to +110
rmmod vfio-pci || true
nb_vfio=$(qatctl.py status --devices {{ pf }} --vfio | wc -l)
[ "$nb_vfio" -le 0 ] || (echo "nb vfio devices should be <= 0" && exit 1)
# we have to pass the VF device ids
modprobe vfio-pci ids=8086:4941,8086:4943,8086:4945,8086:4947
nb_vfio=$(qatctl.py status --devices {{ pf }} --vfio | wc -l)
[ "$nb_vfio" -gt 0 ] || (echo "nb vfio devices should be > 0" && exit 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several suggestions (some apply to other jobs in this PR too):

  1. Put these into a separate bash script in bin/, and start it with set -e so that it fails as soon as possible if anything goes wrong.
  2. a check flag could be added to the qatctl.py script to avoid relying on additional bash commands (such as [ "$nb_vfio" -le 0 ] || (echo "nb vfio devices should be <= 0" && exit 1))
  3. maybe this test could be split in 2 (the second test would depend on the first):
  • check that no VFIO files are present if the vfio-pci module is removed
  • check that VFIO are there when the module is reloaded
  1. if something goes wrong before you reload the vfio-pci module, all the tests running after will fail, so you probably have to make sure the modules are reloaded regardless of the outcome.

@@ -0,0 +1,4 @@
unit: manifest entry
id: has_intel_qat
_name: A Intel Quick-Assist Technology (QAT) device
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_name: A Intel Quick-Assist Technology (QAT) device
_name: An Intel QuickAssist Technology (QAT) device

Comment on lines +54 to +64
for pf in ${PFS}; do
driver_path=$(readlink /sys/bus/pci/devices/0000:"${pf}"/driver)
driver=$(basename "${driver_path}")
if [ "${driver}" == "4xxx" ] || [ "${driver}" == "420xx" ]; then
echo "pf: ${pf}"
echo "driver: ${driver}"
echo "available: qat"
echo ""
break
fi
done
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be put into the python script directly (maybe as a qatctl.py resource command). Easier to test and to maintain.

# Hector Cao <[email protected]>

unit: job
id: qat_pf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resource job looks like it's doing exactly the same thing as qat below, except it doesn't add the available field. Consider removing this and rely on qat only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants