Skip to content

Run Neon NTT+iNTT through SLOTHY #221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Run Neon NTT+iNTT through SLOTHY #221

wants to merge 3 commits into from

Conversation

mkannwischer
Copy link
Contributor

This adds a Makefile that runs the Neon NTT through SLOTHY. To accomodate this the clean assembly is moved to dev/aarch64_clean/, while the mldsa/native/aarch64 contains the optimized assembly.

The main difference to mlkem-native is that we need set an explicit timeout as optimizing the second loop doesn't result reasonable performance, but a good solution is found within one minute on my Apple M4. I set the timeout to 2 minutes with the hope that it works on most platforms. We have have to increase that later.

For now the clean backend is not tested in CI - that's left for a follow-up PR. SLOTHY is also not run in CI, yet.
We probably want to put the assembly simplification scripts in place so we can follow the same structure as in mlkem-native.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 104025 cycles 103896 cycles 1.00
ML-DSA-44 sign 293235 cycles 293742 cycles 1.00
ML-DSA-44 verify 108371 cycles 108864 cycles 1.00
ML-DSA-65 keypair 183589 cycles 183054 cycles 1.00
ML-DSA-65 sign 469983 cycles 468458 cycles 1.00
ML-DSA-65 verify 174784 cycles 174225 cycles 1.00
ML-DSA-87 keypair 293799 cycles 293562 cycles 1.00
ML-DSA-87 sign 608497 cycles 605677 cycles 1.00
ML-DSA-87 verify 290924 cycles 291217 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 174942 cycles 174975 cycles 1.00
ML-DSA-44 sign 487617 cycles 487457 cycles 1.00
ML-DSA-44 verify 183603 cycles 183626 cycles 1.00
ML-DSA-65 keypair 298865 cycles 299128 cycles 1.00
ML-DSA-65 sign 774860 cycles 775038 cycles 1.00
ML-DSA-65 verify 297537 cycles 297742 cycles 1.00
ML-DSA-87 keypair 501562 cycles 501247 cycles 1.00
ML-DSA-87 sign 1022568 cycles 1021912 cycles 1.00
ML-DSA-87 verify 505856 cycles 506188 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 159961 cycles 161395 cycles 0.99
ML-DSA-44 sign 480983 cycles 484020 cycles 0.99
ML-DSA-44 verify 169863 cycles 171545 cycles 0.99
ML-DSA-65 keypair 272003 cycles 274720 cycles 0.99
ML-DSA-65 sign 771201 cycles 779308 cycles 0.99
ML-DSA-65 verify 274297 cycles 276934 cycles 0.99
ML-DSA-87 keypair 457865 cycles 461864 cycles 0.99
ML-DSA-87 sign 1011293 cycles 1017017 cycles 0.99
ML-DSA-87 verify 463530 cycles 467785 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 0cc3dd6 Previous: 9927cdd Ratio
ML-DSA-44 sign 499133 cycles 461021 cycles 1.08

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 454424 cycles 463407 cycles 0.98
ML-DSA-44 sign 1074356 cycles 1160741 cycles 0.93
ML-DSA-44 verify 458764 cycles 476938 cycles 0.96
ML-DSA-65 keypair 803945 cycles 823272 cycles 0.98
ML-DSA-65 sign 1806708 cycles 1937786 cycles 0.93
ML-DSA-65 verify 762459 cycles 785894 cycles 0.97
ML-DSA-87 keypair 1358752 cycles 1375667 cycles 0.99
ML-DSA-87 sign 2463895 cycles 2636221 cycles 0.93
ML-DSA-87 verify 1325055 cycles 1358739 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 103480 cycles 103825 cycles 1.00
ML-DSA-44 sign 292299 cycles 292489 cycles 1.00
ML-DSA-44 verify 108711 cycles 108604 cycles 1.00
ML-DSA-65 keypair 183291 cycles 183902 cycles 1.00
ML-DSA-65 sign 467554 cycles 470033 cycles 0.99
ML-DSA-65 verify 174684 cycles 174356 cycles 1.00
ML-DSA-87 keypair 293857 cycles 293887 cycles 1.00
ML-DSA-87 sign 605955 cycles 606228 cycles 1.00
ML-DSA-87 verify 290839 cycles 290916 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 159952 cycles 161516 cycles 0.99
ML-DSA-44 sign 480567 cycles 485670 cycles 0.99
ML-DSA-44 verify 169966 cycles 172036 cycles 0.99
ML-DSA-65 keypair 271877 cycles 274603 cycles 0.99
ML-DSA-65 sign 771229 cycles 779646 cycles 0.99
ML-DSA-65 verify 274254 cycles 276686 cycles 0.99
ML-DSA-87 keypair 457067 cycles 462210 cycles 0.99
ML-DSA-87 sign 1011249 cycles 1019380 cycles 0.99
ML-DSA-87 verify 463152 cycles 468098 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 174929 cycles 174970 cycles 1.00
ML-DSA-44 sign 488307 cycles 487240 cycles 1.00
ML-DSA-44 verify 183599 cycles 183589 cycles 1.00
ML-DSA-65 keypair 298564 cycles 298817 cycles 1.00
ML-DSA-65 sign 774925 cycles 774918 cycles 1.00
ML-DSA-65 verify 297315 cycles 297321 cycles 1.00
ML-DSA-87 keypair 501636 cycles 501797 cycles 1.00
ML-DSA-87 sign 1021722 cycles 1022029 cycles 1.00
ML-DSA-87 verify 505837 cycles 506359 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 138119 cycles 136021 cycles 1.02
ML-DSA-44 sign 405605 cycles 397587 cycles 1.02
ML-DSA-44 verify 146404 cycles 144004 cycles 1.02
ML-DSA-65 keypair 236231 cycles 233188 cycles 1.01
ML-DSA-65 sign 635347 cycles 621964 cycles 1.02
ML-DSA-65 verify 235401 cycles 232088 cycles 1.01
ML-DSA-87 keypair 391829 cycles 385890 cycles 1.02
ML-DSA-87 sign 806749 cycles 804243 cycles 1.00
ML-DSA-87 verify 393634 cycles 389513 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 142779 cycles 142973 cycles 1.00
ML-DSA-44 sign 307171 cycles 310072 cycles 0.99
ML-DSA-44 verify 142264 cycles 142899 cycles 1.00
ML-DSA-65 keypair 251952 cycles 252461 cycles 1.00
ML-DSA-65 sign 508510 cycles 512842 cycles 0.99
ML-DSA-65 verify 239261 cycles 240173 cycles 1.00
ML-DSA-87 keypair 429729 cycles 430243 cycles 1.00
ML-DSA-87 sign 698579 cycles 704438 cycles 0.99
ML-DSA-87 verify 420389 cycles 418947 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 237435 cycles 238388 cycles 1.00
ML-DSA-44 sign 534267 cycles 541250 cycles 0.99
ML-DSA-44 verify 238266 cycles 238965 cycles 1.00
ML-DSA-65 keypair 430945 cycles 432718 cycles 1.00
ML-DSA-65 sign 894406 cycles 908756 cycles 0.98
ML-DSA-65 verify 402455 cycles 404028 cycles 1.00
ML-DSA-87 keypair 716739 cycles 718922 cycles 1.00
ML-DSA-87 sign 1226948 cycles 1241966 cycles 0.99
ML-DSA-87 verify 701976 cycles 702572 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 137665 cycles 136245 cycles 1.01
ML-DSA-44 sign 406247 cycles 396646 cycles 1.02
ML-DSA-44 verify 145492 cycles 145158 cycles 1.00
ML-DSA-65 keypair 235836 cycles 233107 cycles 1.01
ML-DSA-65 sign 624009 cycles 622684 cycles 1.00
ML-DSA-65 verify 233961 cycles 232163 cycles 1.01
ML-DSA-87 keypair 390721 cycles 385947 cycles 1.01
ML-DSA-87 sign 808075 cycles 805580 cycles 1.00
ML-DSA-87 verify 392453 cycles 389798 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 155946 cycles 155932 cycles 1.00
ML-DSA-44 sign 427800 cycles 427980 cycles 1.00
ML-DSA-44 verify 163686 cycles 163699 cycles 1.00
ML-DSA-65 keypair 271752 cycles 271648 cycles 1.00
ML-DSA-65 sign 709427 cycles 709755 cycles 1.00
ML-DSA-65 verify 270832 cycles 270968 cycles 1.00
ML-DSA-87 keypair 454458 cycles 454389 cycles 1.00
ML-DSA-87 sign 918345 cycles 918870 cycles 1.00
ML-DSA-87 verify 458684 cycles 456027 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 153649 cycles 154270 cycles 1.00
ML-DSA-44 sign 327627 cycles 333860 cycles 0.98
ML-DSA-44 verify 153268 cycles 154493 cycles 0.99
ML-DSA-65 keypair 272834 cycles 275916 cycles 0.99
ML-DSA-65 sign 548853 cycles 557800 cycles 0.98
ML-DSA-65 verify 259520 cycles 260722 cycles 1.00
ML-DSA-87 keypair 465048 cycles 466597 cycles 1.00
ML-DSA-87 sign 762374 cycles 773846 cycles 0.99
ML-DSA-87 verify 452607 cycles 454706 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 257347 cycles 257449 cycles 1.00
ML-DSA-44 sign 705598 cycles 704775 cycles 1.00
ML-DSA-44 verify 269436 cycles 269334 cycles 1.00
ML-DSA-65 keypair 459726 cycles 460061 cycles 1.00
ML-DSA-65 sign 1160502 cycles 1159424 cycles 1.00
ML-DSA-65 verify 448353 cycles 448470 cycles 1.00
ML-DSA-87 keypair 754572 cycles 755423 cycles 1.00
ML-DSA-87 sign 1528198 cycles 1529030 cycles 1.00
ML-DSA-87 verify 761668 cycles 760763 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 1099004 cycles 1097834 cycles 1.00
ML-DSA-44 sign 4011403 cycles 4005934 cycles 1.00
ML-DSA-44 verify 1226891 cycles 1226139 cycles 1.00
ML-DSA-65 keypair 1872216 cycles 1871426 cycles 1.00
ML-DSA-65 sign 6556257 cycles 6554158 cycles 1.00
ML-DSA-65 verify 1982594 cycles 1982241 cycles 1.00
ML-DSA-87 keypair 3085068 cycles 3079280 cycles 1.00
ML-DSA-87 sign 8284467 cycles 8268658 cycles 1.00
ML-DSA-87 verify 3260678 cycles 3257505 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 166731 cycles 166638 cycles 1.00
ML-DSA-44 sign 439847 cycles 440090 cycles 1.00
ML-DSA-44 verify 173511 cycles 173579 cycles 1.00
ML-DSA-65 keypair 298120 cycles 293264 cycles 1.02
ML-DSA-65 sign 720692 cycles 720327 cycles 1.00
ML-DSA-65 verify 287773 cycles 287395 cycles 1.00
ML-DSA-87 keypair 491282 cycles 491577 cycles 1.00
ML-DSA-87 sign 960558 cycles 961835 cycles 1.00
ML-DSA-87 verify 491954 cycles 492189 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 237189 cycles 237959 cycles 1.00
ML-DSA-44 sign 533594 cycles 540773 cycles 0.99
ML-DSA-44 verify 237395 cycles 238436 cycles 1.00
ML-DSA-65 keypair 430679 cycles 431571 cycles 1.00
ML-DSA-65 sign 894053 cycles 907070 cycles 0.99
ML-DSA-65 verify 401531 cycles 402771 cycles 1.00
ML-DSA-87 keypair 716575 cycles 717699 cycles 1.00
ML-DSA-87 sign 1224440 cycles 1239192 cycles 0.99
ML-DSA-87 verify 699770 cycles 701138 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 555374 cycles 553931 cycles 1.00
ML-DSA-44 sign 1926012 cycles 1930812 cycles 1.00
ML-DSA-44 verify 617863 cycles 617741 cycles 1.00
ML-DSA-65 keypair 944757 cycles 943855 cycles 1.00
ML-DSA-65 sign 3135706 cycles 3138968 cycles 1.00
ML-DSA-65 verify 983324 cycles 983004 cycles 1.00
ML-DSA-87 keypair 1553050 cycles 1554438 cycles 1.00
ML-DSA-87 sign 3989443 cycles 3994563 cycles 1.00
ML-DSA-87 verify 1622527 cycles 1621599 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 257043 cycles 256619 cycles 1.00
ML-DSA-44 sign 704660 cycles 704711 cycles 1.00
ML-DSA-44 verify 268735 cycles 268744 cycles 1.00
ML-DSA-65 keypair 459184 cycles 458842 cycles 1.00
ML-DSA-65 sign 1159718 cycles 1158917 cycles 1.00
ML-DSA-65 verify 447225 cycles 447191 cycles 1.00
ML-DSA-87 keypair 754759 cycles 754208 cycles 1.00
ML-DSA-87 sign 1526515 cycles 1525738 cycles 1.00
ML-DSA-87 verify 758588 cycles 759570 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 314101 cycles 316562 cycles 0.99
ML-DSA-44 sign 830509 cycles 846454 cycles 0.98
ML-DSA-44 verify 313917 cycles 316682 cycles 0.99
ML-DSA-65 keypair 594114 cycles 596152 cycles 1.00
ML-DSA-65 sign 1259130 cycles 1249138 cycles 1.01
ML-DSA-65 verify 533157 cycles 536006 cycles 0.99
ML-DSA-87 keypair 939143 cycles 952431 cycles 0.99
ML-DSA-87 sign 1686994 cycles 1751049 cycles 0.96
ML-DSA-87 verify 921794 cycles 934432 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 349232 cycles 349752 cycles 1.00
ML-DSA-44 sign 1150738 cycles 1043462 cycles 1.10
ML-DSA-44 verify 368178 cycles 368630 cycles 1.00
ML-DSA-65 keypair 641818 cycles 641867 cycles 1.00
ML-DSA-65 sign 1688151 cycles 1685247 cycles 1.00
ML-DSA-65 verify 608231 cycles 609773 cycles 1.00
ML-DSA-87 keypair 1007145 cycles 1006865 cycles 1.00
ML-DSA-87 sign 2214903 cycles 2210773 cycles 1.00
ML-DSA-87 verify 1027941 cycles 1022137 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 122559 cycles 122514 cycles 1.00
ML-DSA-44 sign 277345 cycles 277666 cycles 1.00
ML-DSA-44 verify 123259 cycles 123322 cycles 1.00
ML-DSA-65 keypair 220363 cycles 220412 cycles 1.00
ML-DSA-65 sign 474696 cycles 475347 cycles 1.00
ML-DSA-65 verify 207491 cycles 207552 cycles 1.00
ML-DSA-87 keypair 372605 cycles 373260 cycles 1.00
ML-DSA-87 sign 657410 cycles 660107 cycles 1.00
ML-DSA-87 verify 367468 cycles 368696 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 keypair 139127 cycles 139137 cycles 1.00
ML-DSA-44 sign 421411 cycles 421295 cycles 1.00
ML-DSA-44 verify 148295 cycles 148285 cycles 1.00
ML-DSA-65 keypair 243948 cycles 243935 cycles 1.00
ML-DSA-65 sign 697776 cycles 697760 cycles 1.00
ML-DSA-65 verify 242884 cycles 242924 cycles 1.00
ML-DSA-87 keypair 403934 cycles 403934 cycles 1
ML-DSA-87 sign 906264 cycles 906420 cycles 1.00
ML-DSA-87 verify 412382 cycles 412419 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer mkannwischer force-pushed the slothy-ntt branch 3 times, most recently from ac1a01f to 2efa129 Compare May 15, 2025 10:04
@mkannwischer mkannwischer marked this pull request as ready for review May 15, 2025 10:16
@mkannwischer mkannwischer requested a review from a team as a code owner May 15, 2025 10:16
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 2efa129 Previous: 5e28164 Ratio
ML-DSA-44 sign 1150738 cycles 1043462 cycles 1.10

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer mkannwischer requested a review from hanno-becker May 17, 2025 12:47
This adds a Makefile that runs the Neon NTT through SLOTHY. To accomodate this
the clean assembly is moved to dev/aarch64_clean/, while the mldsa/native/aarch64
contains the optimized assembly.

The main difference to mlkem-native is that we need set an explicit timeout
as optimizing the second loop doesn't result reasonable performance, but a good
solution is found within one minute on my Apple M4. I set the timeout to 2
minutes with the hope that it works on most platforms. We have have to
increase that later.

For now the clean backend is not tested in CI - that's left for a follow-up PR.
SLOTHY is also not run in CI, yet.
We probably want to put the assembly simplification scripts in place so we
can follow the same structure as in mlkem-native.

Signed-off-by: Matthias J. Kannwischer <[email protected]>
Signed-off-by: Matthias J. Kannwischer <[email protected]>
Signed-off-by: Matthias J. Kannwischer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run Neon NTT/iNTT through SLOTHY
2 participants