-
Notifications
You must be signed in to change notification settings - Fork 4
Run Neon NTT+iNTT through SLOTHY #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
104025 cycles |
103896 cycles |
1.00 |
ML-DSA-44 sign |
293235 cycles |
293742 cycles |
1.00 |
ML-DSA-44 verify |
108371 cycles |
108864 cycles |
1.00 |
ML-DSA-65 keypair |
183589 cycles |
183054 cycles |
1.00 |
ML-DSA-65 sign |
469983 cycles |
468458 cycles |
1.00 |
ML-DSA-65 verify |
174784 cycles |
174225 cycles |
1.00 |
ML-DSA-87 keypair |
293799 cycles |
293562 cycles |
1.00 |
ML-DSA-87 sign |
608497 cycles |
605677 cycles |
1.00 |
ML-DSA-87 verify |
290924 cycles |
291217 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
174942 cycles |
174975 cycles |
1.00 |
ML-DSA-44 sign |
487617 cycles |
487457 cycles |
1.00 |
ML-DSA-44 verify |
183603 cycles |
183626 cycles |
1.00 |
ML-DSA-65 keypair |
298865 cycles |
299128 cycles |
1.00 |
ML-DSA-65 sign |
774860 cycles |
775038 cycles |
1.00 |
ML-DSA-65 verify |
297537 cycles |
297742 cycles |
1.00 |
ML-DSA-87 keypair |
501562 cycles |
501247 cycles |
1.00 |
ML-DSA-87 sign |
1022568 cycles |
1021912 cycles |
1.00 |
ML-DSA-87 verify |
505856 cycles |
506188 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
159961 cycles |
161395 cycles |
0.99 |
ML-DSA-44 sign |
480983 cycles |
484020 cycles |
0.99 |
ML-DSA-44 verify |
169863 cycles |
171545 cycles |
0.99 |
ML-DSA-65 keypair |
272003 cycles |
274720 cycles |
0.99 |
ML-DSA-65 sign |
771201 cycles |
779308 cycles |
0.99 |
ML-DSA-65 verify |
274297 cycles |
276934 cycles |
0.99 |
ML-DSA-87 keypair |
457865 cycles |
461864 cycles |
0.99 |
ML-DSA-87 sign |
1011293 cycles |
1017017 cycles |
0.99 |
ML-DSA-87 verify |
463530 cycles |
467785 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 0cc3dd6 | Previous: 9927cdd | Ratio |
---|---|---|---|
ML-DSA-44 sign |
499133 cycles |
461021 cycles |
1.08 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
454424 cycles |
463407 cycles |
0.98 |
ML-DSA-44 sign |
1074356 cycles |
1160741 cycles |
0.93 |
ML-DSA-44 verify |
458764 cycles |
476938 cycles |
0.96 |
ML-DSA-65 keypair |
803945 cycles |
823272 cycles |
0.98 |
ML-DSA-65 sign |
1806708 cycles |
1937786 cycles |
0.93 |
ML-DSA-65 verify |
762459 cycles |
785894 cycles |
0.97 |
ML-DSA-87 keypair |
1358752 cycles |
1375667 cycles |
0.99 |
ML-DSA-87 sign |
2463895 cycles |
2636221 cycles |
0.93 |
ML-DSA-87 verify |
1325055 cycles |
1358739 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
103480 cycles |
103825 cycles |
1.00 |
ML-DSA-44 sign |
292299 cycles |
292489 cycles |
1.00 |
ML-DSA-44 verify |
108711 cycles |
108604 cycles |
1.00 |
ML-DSA-65 keypair |
183291 cycles |
183902 cycles |
1.00 |
ML-DSA-65 sign |
467554 cycles |
470033 cycles |
0.99 |
ML-DSA-65 verify |
174684 cycles |
174356 cycles |
1.00 |
ML-DSA-87 keypair |
293857 cycles |
293887 cycles |
1.00 |
ML-DSA-87 sign |
605955 cycles |
606228 cycles |
1.00 |
ML-DSA-87 verify |
290839 cycles |
290916 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
159952 cycles |
161516 cycles |
0.99 |
ML-DSA-44 sign |
480567 cycles |
485670 cycles |
0.99 |
ML-DSA-44 verify |
169966 cycles |
172036 cycles |
0.99 |
ML-DSA-65 keypair |
271877 cycles |
274603 cycles |
0.99 |
ML-DSA-65 sign |
771229 cycles |
779646 cycles |
0.99 |
ML-DSA-65 verify |
274254 cycles |
276686 cycles |
0.99 |
ML-DSA-87 keypair |
457067 cycles |
462210 cycles |
0.99 |
ML-DSA-87 sign |
1011249 cycles |
1019380 cycles |
0.99 |
ML-DSA-87 verify |
463152 cycles |
468098 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
174929 cycles |
174970 cycles |
1.00 |
ML-DSA-44 sign |
488307 cycles |
487240 cycles |
1.00 |
ML-DSA-44 verify |
183599 cycles |
183589 cycles |
1.00 |
ML-DSA-65 keypair |
298564 cycles |
298817 cycles |
1.00 |
ML-DSA-65 sign |
774925 cycles |
774918 cycles |
1.00 |
ML-DSA-65 verify |
297315 cycles |
297321 cycles |
1.00 |
ML-DSA-87 keypair |
501636 cycles |
501797 cycles |
1.00 |
ML-DSA-87 sign |
1021722 cycles |
1022029 cycles |
1.00 |
ML-DSA-87 verify |
505837 cycles |
506359 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
138119 cycles |
136021 cycles |
1.02 |
ML-DSA-44 sign |
405605 cycles |
397587 cycles |
1.02 |
ML-DSA-44 verify |
146404 cycles |
144004 cycles |
1.02 |
ML-DSA-65 keypair |
236231 cycles |
233188 cycles |
1.01 |
ML-DSA-65 sign |
635347 cycles |
621964 cycles |
1.02 |
ML-DSA-65 verify |
235401 cycles |
232088 cycles |
1.01 |
ML-DSA-87 keypair |
391829 cycles |
385890 cycles |
1.02 |
ML-DSA-87 sign |
806749 cycles |
804243 cycles |
1.00 |
ML-DSA-87 verify |
393634 cycles |
389513 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
142779 cycles |
142973 cycles |
1.00 |
ML-DSA-44 sign |
307171 cycles |
310072 cycles |
0.99 |
ML-DSA-44 verify |
142264 cycles |
142899 cycles |
1.00 |
ML-DSA-65 keypair |
251952 cycles |
252461 cycles |
1.00 |
ML-DSA-65 sign |
508510 cycles |
512842 cycles |
0.99 |
ML-DSA-65 verify |
239261 cycles |
240173 cycles |
1.00 |
ML-DSA-87 keypair |
429729 cycles |
430243 cycles |
1.00 |
ML-DSA-87 sign |
698579 cycles |
704438 cycles |
0.99 |
ML-DSA-87 verify |
420389 cycles |
418947 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
237435 cycles |
238388 cycles |
1.00 |
ML-DSA-44 sign |
534267 cycles |
541250 cycles |
0.99 |
ML-DSA-44 verify |
238266 cycles |
238965 cycles |
1.00 |
ML-DSA-65 keypair |
430945 cycles |
432718 cycles |
1.00 |
ML-DSA-65 sign |
894406 cycles |
908756 cycles |
0.98 |
ML-DSA-65 verify |
402455 cycles |
404028 cycles |
1.00 |
ML-DSA-87 keypair |
716739 cycles |
718922 cycles |
1.00 |
ML-DSA-87 sign |
1226948 cycles |
1241966 cycles |
0.99 |
ML-DSA-87 verify |
701976 cycles |
702572 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
137665 cycles |
136245 cycles |
1.01 |
ML-DSA-44 sign |
406247 cycles |
396646 cycles |
1.02 |
ML-DSA-44 verify |
145492 cycles |
145158 cycles |
1.00 |
ML-DSA-65 keypair |
235836 cycles |
233107 cycles |
1.01 |
ML-DSA-65 sign |
624009 cycles |
622684 cycles |
1.00 |
ML-DSA-65 verify |
233961 cycles |
232163 cycles |
1.01 |
ML-DSA-87 keypair |
390721 cycles |
385947 cycles |
1.01 |
ML-DSA-87 sign |
808075 cycles |
805580 cycles |
1.00 |
ML-DSA-87 verify |
392453 cycles |
389798 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
155946 cycles |
155932 cycles |
1.00 |
ML-DSA-44 sign |
427800 cycles |
427980 cycles |
1.00 |
ML-DSA-44 verify |
163686 cycles |
163699 cycles |
1.00 |
ML-DSA-65 keypair |
271752 cycles |
271648 cycles |
1.00 |
ML-DSA-65 sign |
709427 cycles |
709755 cycles |
1.00 |
ML-DSA-65 verify |
270832 cycles |
270968 cycles |
1.00 |
ML-DSA-87 keypair |
454458 cycles |
454389 cycles |
1.00 |
ML-DSA-87 sign |
918345 cycles |
918870 cycles |
1.00 |
ML-DSA-87 verify |
458684 cycles |
456027 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
153649 cycles |
154270 cycles |
1.00 |
ML-DSA-44 sign |
327627 cycles |
333860 cycles |
0.98 |
ML-DSA-44 verify |
153268 cycles |
154493 cycles |
0.99 |
ML-DSA-65 keypair |
272834 cycles |
275916 cycles |
0.99 |
ML-DSA-65 sign |
548853 cycles |
557800 cycles |
0.98 |
ML-DSA-65 verify |
259520 cycles |
260722 cycles |
1.00 |
ML-DSA-87 keypair |
465048 cycles |
466597 cycles |
1.00 |
ML-DSA-87 sign |
762374 cycles |
773846 cycles |
0.99 |
ML-DSA-87 verify |
452607 cycles |
454706 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
257347 cycles |
257449 cycles |
1.00 |
ML-DSA-44 sign |
705598 cycles |
704775 cycles |
1.00 |
ML-DSA-44 verify |
269436 cycles |
269334 cycles |
1.00 |
ML-DSA-65 keypair |
459726 cycles |
460061 cycles |
1.00 |
ML-DSA-65 sign |
1160502 cycles |
1159424 cycles |
1.00 |
ML-DSA-65 verify |
448353 cycles |
448470 cycles |
1.00 |
ML-DSA-87 keypair |
754572 cycles |
755423 cycles |
1.00 |
ML-DSA-87 sign |
1528198 cycles |
1529030 cycles |
1.00 |
ML-DSA-87 verify |
761668 cycles |
760763 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
1099004 cycles |
1097834 cycles |
1.00 |
ML-DSA-44 sign |
4011403 cycles |
4005934 cycles |
1.00 |
ML-DSA-44 verify |
1226891 cycles |
1226139 cycles |
1.00 |
ML-DSA-65 keypair |
1872216 cycles |
1871426 cycles |
1.00 |
ML-DSA-65 sign |
6556257 cycles |
6554158 cycles |
1.00 |
ML-DSA-65 verify |
1982594 cycles |
1982241 cycles |
1.00 |
ML-DSA-87 keypair |
3085068 cycles |
3079280 cycles |
1.00 |
ML-DSA-87 sign |
8284467 cycles |
8268658 cycles |
1.00 |
ML-DSA-87 verify |
3260678 cycles |
3257505 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
166731 cycles |
166638 cycles |
1.00 |
ML-DSA-44 sign |
439847 cycles |
440090 cycles |
1.00 |
ML-DSA-44 verify |
173511 cycles |
173579 cycles |
1.00 |
ML-DSA-65 keypair |
298120 cycles |
293264 cycles |
1.02 |
ML-DSA-65 sign |
720692 cycles |
720327 cycles |
1.00 |
ML-DSA-65 verify |
287773 cycles |
287395 cycles |
1.00 |
ML-DSA-87 keypair |
491282 cycles |
491577 cycles |
1.00 |
ML-DSA-87 sign |
960558 cycles |
961835 cycles |
1.00 |
ML-DSA-87 verify |
491954 cycles |
492189 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
237189 cycles |
237959 cycles |
1.00 |
ML-DSA-44 sign |
533594 cycles |
540773 cycles |
0.99 |
ML-DSA-44 verify |
237395 cycles |
238436 cycles |
1.00 |
ML-DSA-65 keypair |
430679 cycles |
431571 cycles |
1.00 |
ML-DSA-65 sign |
894053 cycles |
907070 cycles |
0.99 |
ML-DSA-65 verify |
401531 cycles |
402771 cycles |
1.00 |
ML-DSA-87 keypair |
716575 cycles |
717699 cycles |
1.00 |
ML-DSA-87 sign |
1224440 cycles |
1239192 cycles |
0.99 |
ML-DSA-87 verify |
699770 cycles |
701138 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
555374 cycles |
553931 cycles |
1.00 |
ML-DSA-44 sign |
1926012 cycles |
1930812 cycles |
1.00 |
ML-DSA-44 verify |
617863 cycles |
617741 cycles |
1.00 |
ML-DSA-65 keypair |
944757 cycles |
943855 cycles |
1.00 |
ML-DSA-65 sign |
3135706 cycles |
3138968 cycles |
1.00 |
ML-DSA-65 verify |
983324 cycles |
983004 cycles |
1.00 |
ML-DSA-87 keypair |
1553050 cycles |
1554438 cycles |
1.00 |
ML-DSA-87 sign |
3989443 cycles |
3994563 cycles |
1.00 |
ML-DSA-87 verify |
1622527 cycles |
1621599 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
257043 cycles |
256619 cycles |
1.00 |
ML-DSA-44 sign |
704660 cycles |
704711 cycles |
1.00 |
ML-DSA-44 verify |
268735 cycles |
268744 cycles |
1.00 |
ML-DSA-65 keypair |
459184 cycles |
458842 cycles |
1.00 |
ML-DSA-65 sign |
1159718 cycles |
1158917 cycles |
1.00 |
ML-DSA-65 verify |
447225 cycles |
447191 cycles |
1.00 |
ML-DSA-87 keypair |
754759 cycles |
754208 cycles |
1.00 |
ML-DSA-87 sign |
1526515 cycles |
1525738 cycles |
1.00 |
ML-DSA-87 verify |
758588 cycles |
759570 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
314101 cycles |
316562 cycles |
0.99 |
ML-DSA-44 sign |
830509 cycles |
846454 cycles |
0.98 |
ML-DSA-44 verify |
313917 cycles |
316682 cycles |
0.99 |
ML-DSA-65 keypair |
594114 cycles |
596152 cycles |
1.00 |
ML-DSA-65 sign |
1259130 cycles |
1249138 cycles |
1.01 |
ML-DSA-65 verify |
533157 cycles |
536006 cycles |
0.99 |
ML-DSA-87 keypair |
939143 cycles |
952431 cycles |
0.99 |
ML-DSA-87 sign |
1686994 cycles |
1751049 cycles |
0.96 |
ML-DSA-87 verify |
921794 cycles |
934432 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
349232 cycles |
349752 cycles |
1.00 |
ML-DSA-44 sign |
1150738 cycles |
1043462 cycles |
1.10 |
ML-DSA-44 verify |
368178 cycles |
368630 cycles |
1.00 |
ML-DSA-65 keypair |
641818 cycles |
641867 cycles |
1.00 |
ML-DSA-65 sign |
1688151 cycles |
1685247 cycles |
1.00 |
ML-DSA-65 verify |
608231 cycles |
609773 cycles |
1.00 |
ML-DSA-87 keypair |
1007145 cycles |
1006865 cycles |
1.00 |
ML-DSA-87 sign |
2214903 cycles |
2210773 cycles |
1.00 |
ML-DSA-87 verify |
1027941 cycles |
1022137 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
122559 cycles |
122514 cycles |
1.00 |
ML-DSA-44 sign |
277345 cycles |
277666 cycles |
1.00 |
ML-DSA-44 verify |
123259 cycles |
123322 cycles |
1.00 |
ML-DSA-65 keypair |
220363 cycles |
220412 cycles |
1.00 |
ML-DSA-65 sign |
474696 cycles |
475347 cycles |
1.00 |
ML-DSA-65 verify |
207491 cycles |
207552 cycles |
1.00 |
ML-DSA-87 keypair |
372605 cycles |
373260 cycles |
1.00 |
ML-DSA-87 sign |
657410 cycles |
660107 cycles |
1.00 |
ML-DSA-87 verify |
367468 cycles |
368696 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
139127 cycles |
139137 cycles |
1.00 |
ML-DSA-44 sign |
421411 cycles |
421295 cycles |
1.00 |
ML-DSA-44 verify |
148295 cycles |
148285 cycles |
1.00 |
ML-DSA-65 keypair |
243948 cycles |
243935 cycles |
1.00 |
ML-DSA-65 sign |
697776 cycles |
697760 cycles |
1.00 |
ML-DSA-65 verify |
242884 cycles |
242924 cycles |
1.00 |
ML-DSA-87 keypair |
403934 cycles |
403934 cycles |
1 |
ML-DSA-87 sign |
906264 cycles |
906420 cycles |
1.00 |
ML-DSA-87 verify |
412382 cycles |
412419 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
ac1a01f
to
2efa129
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 2efa129 | Previous: 5e28164 | Ratio |
---|---|---|---|
ML-DSA-44 sign |
1150738 cycles |
1043462 cycles |
1.10 |
This comment was automatically generated by workflow using github-action-benchmark.
This adds a Makefile that runs the Neon NTT through SLOTHY. To accomodate this the clean assembly is moved to dev/aarch64_clean/, while the mldsa/native/aarch64 contains the optimized assembly. The main difference to mlkem-native is that we need set an explicit timeout as optimizing the second loop doesn't result reasonable performance, but a good solution is found within one minute on my Apple M4. I set the timeout to 2 minutes with the hope that it works on most platforms. We have have to increase that later. For now the clean backend is not tested in CI - that's left for a follow-up PR. SLOTHY is also not run in CI, yet. We probably want to put the assembly simplification scripts in place so we can follow the same structure as in mlkem-native. Signed-off-by: Matthias J. Kannwischer <[email protected]>
Signed-off-by: Matthias J. Kannwischer <[email protected]>
Signed-off-by: Matthias J. Kannwischer <[email protected]>
This adds a Makefile that runs the Neon NTT through SLOTHY. To accomodate this the clean assembly is moved to dev/aarch64_clean/, while the mldsa/native/aarch64 contains the optimized assembly.
The main difference to mlkem-native is that we need set an explicit timeout as optimizing the second loop doesn't result reasonable performance, but a good solution is found within one minute on my Apple M4. I set the timeout to 2 minutes with the hope that it works on most platforms. We have have to increase that later.
For now the clean backend is not tested in CI - that's left for a follow-up PR. SLOTHY is also not run in CI, yet.
We probably want to put the assembly simplification scripts in place so we can follow the same structure as in mlkem-native.