Skip to content

netkvm: Add a case for packet loss when slowly reusing memory buffers #4247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

heywji
Copy link
Contributor

@heywji heywji commented Jan 7, 2025

Under various conditions, when the host-to-device packet rate is high, we lose packets in QEMU due to a lack of guest-allocated buffers. Look also at virtio-win/kvm-guest-drivers-windows#1012

ID: 2405

Signed-off-by: wji [email protected]

@heywji heywji changed the title Add a case for packet loss when slowly reusing memory buffers netkvm: Add a case for packet loss when slowly reusing memory buffers Jan 7, 2025
@heywji
Copy link
Contributor Author

heywji commented Jan 8, 2025

depends on avocado-framework/avocado-vt#4045

@heywji heywji force-pushed the KVMAUTOMA-2405 branch 3 times, most recently from d1b7c9e to 0b9815b Compare January 14, 2025 05:27
@heywji
Copy link
Contributor Author

heywji commented Jan 14, 2025

@leidwang, call you to review this patch again. Thanks in advance.

Test result:

1-Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win2019.x86_64.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35 Finshed 2025-01-14 00:31:42 2025-01-14 00:44:10 PASS 747.58115
2-Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win2019.x86_64.io-github-autotest-qemu.netkvm_buffer_shortage.q35 Finshed 2025-01-14 00:44:10 2025-01-14 01:05:00 PASS 1250.003384
Summary: 
Finshed=2, PASS=2

@leidwang
Copy link
Contributor

Hi @heywji Please remember to sign the commits, thanks.

@heywji heywji force-pushed the KVMAUTOMA-2405 branch 2 times, most recently from d410739 to b2fea54 Compare January 22, 2025 02:15
@heywji
Copy link
Contributor Author

heywji commented Jan 22, 2025

The commits are signed now. Please help review it again when you are free. @leidwang

@leidwang
Copy link
Contributor

Hi @heywji I will review this MR once avocado-vt avocado-framework/avocado-vt#4045 done, thanks.

@heywji
Copy link
Contributor Author

heywji commented Feb 18, 2025

Hi @leidwang, I have removed the "avocado-vt avocado-framework/avocado-vt#4045" related codes.

Could you help review it again since this patch can be passed by applying Houqi's patch first?

Thanks,
Wecom

start_vm_vm2 = no
smp = 2
queues = ${smp}
vectors = 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a must for this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This part comes from our product bug: "RHEL-24992 BSOD occurs when the vector=1024 is set"., we need to set vectors = 1024.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks.

:param env: Dictionary of test environment details.
"""

def analyze_ping_results(session, count, timeout):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def analyze_ping_results(session, count, timeout):
def analyze_ping_results(dest, session, count, timeout):

"""

status, output = utils_net.ping(
s_vm_ip, session=c_session, count=count, timeout=timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s_vm_ip, session=c_session, count=count, timeout=timeout
dest, session=session, count=count, timeout=timeout

Comment on lines 38 to 40
pattern = r"(\d+)% loss"
match = re.search(pattern, output)
if match:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pattern = r"(\d+)% loss"
match = re.search(pattern, output)
if match:
if re.search(r"(\d+)% loss", output):

Comment on lines 50 to 52
param vm: the selected vm
param netkvmco_name: the netkvm driver parameter to modify
param value: the value to set to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remember to keep a same format for the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @leidwang, but what's the same format meaning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

if match:
return match.group(1)

def modify_and_analyze_params_result(vm, netkvmco_name, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def modify_and_analyze_params_result(vm, netkvmco_name, value):
def modify_and_analyze_params_result(vm, param_name, value):

I guess param_name should be more clear to describe it's meaning.Because it's a netkvm parameter's name instead netkvmco name.

utils_net.set_netkvm_param_value(vm, netkvmco_name, value)
cur_value = utils_net.get_netkvm_param_value(vm, netkvmco_name)
if cur_value != value:
test.fail(f"Current value '{cur_value}' was not equires '{value}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test.fail(f"Current value '{cur_value}' was not equires '{value}'")
test.fail(f"Failed to set '{param_name}' to '{value}'")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Leidong. I agree with you, and this explanation is better than mine.

Comment on lines 65 to 67
param session: session to execute commands on the target machine.
port: the port number to monitor.
script_to_run: the path to the Python script to execute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the docstring formatting issue above.

Comment on lines 72 to 85
c_pip_copy_cmd = params.get("c_pip_copy_cmd")
c_pip_cmd = params.get("c_pip_cmd")
c_py_copy_cmd = params.get("c_py_copy_cmd")
s_py_copy_cmd = params.get("s_py_copy_cmd")
status, output = session.cmd_status_output(check_live_python, timeout=1200)
if status == 0:
return
session.cmd(dest_location)
if "server" in script_to_run:
error_context.context(
"Run python3 code runs on the server node", test.log.info
)
s_py_copy_cmd = utils_misc.set_winutils_letter(session, s_py_copy_cmd)
session.cmd(s_py_copy_cmd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move the script and whl file copy part to run function or write a function for it, it's a test env preparation step.I do not think we need to copy them every time when we want to run server.py or client.py.

Comment on lines 156 to 157
for value in param_values.split(" "):
modify_and_analyze_params_result(vm=c_vm, netkvmco_name=param_name, value=value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please clarify why we need to modify the parameter's value before the end?Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @leidwang,
This step comes from our product bug: "RHEL-24992 BSOD occurs when the vector=1024 is set". So here, I want to simulate the operation on the client side, open the NIC properties, and modify the MinRxBufferPercent parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I am just curious why we modify the parameter's value but do not perform any testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm. you gave me some inspiration about this operation. I need to add a ping function here to check whether BSOD happens.

"""
conduct a ping test to check the packet loss on slow memory buffer reallocation

:param session: Local executon hint or session to execute the ping command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param session: Local executon hint or session to execute the ping command.
:param session: Local execution hint or session to execute the ping command.

Comment on lines 50 to 52
param vm: the selected vm
param netkvmco_name: the netkvm driver parameter to modify
param value: the value to set to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@heywji heywji force-pushed the KVMAUTOMA-2405 branch 2 times, most recently from f52c96a to 8d9d3db Compare March 7, 2025 07:59
@heywji
Copy link
Contributor Author

heywji commented Mar 7, 2025

@leidwang Hi Leidong, Could you help review this patch again?

Test result: PASS

 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netkvm_buffer_shortage.q35: PASS (1486.81 s)

Thanks!

@leidwang
Copy link
Contributor

Hi @heywji Please sign the commit, thanks.

@heywji
Copy link
Contributor Author

heywji commented Mar 13, 2025

@leidwang Done. Please help review other problems. Thank you!

if cur_value != value:
test.fail(f"Failed to set '{param_name}' to '{value}'")

def check_and_restart_port(session, port, script_to_run):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

port is not called in this function

Suggested change
def check_and_restart_port(session, port, script_to_run):
def check_and_restart_port(session, script_to_run):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified. Thank leidong for reviewing me.

Comment on lines 27 to 32
c_pip_copy_cmd = 'xcopy "WIN_UTILS:\packet_loss_scripts\${psutil_whl}" ${copy_dest}'
c_pip_cmd = "py -m pip install ${psutil_whl}"
s_py_copy_cmd = 'xcopy "WIN_UTILS:\packet_loss_scripts\${s_py}" ${copy_dest}'
s_py_cmd = "start cmd /c py ${s_py} ${port_num}"
c_py_copy_cmd = 'xcopy "WIN_UTILS:\packet_loss_scripts\${c_py}" ${copy_dest}'
c_py_cmd = "start cmd /c py ${c_py} 99999 %s ${port_num}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
c_pip_copy_cmd = 'xcopy "WIN_UTILS:\packet_loss_scripts\${psutil_whl}" ${copy_dest}'
c_pip_cmd = "py -m pip install ${psutil_whl}"
s_py_copy_cmd = 'xcopy "WIN_UTILS:\packet_loss_scripts\${s_py}" ${copy_dest}'
s_py_cmd = "start cmd /c py ${s_py} ${port_num}"
c_py_copy_cmd = 'xcopy "WIN_UTILS:\packet_loss_scripts\${c_py}" ${copy_dest}'
c_py_cmd = "start cmd /c py ${c_py} 99999 %s ${port_num}"
copy_file_cmd = 'xcopy "WIN_UTILS:\packet_loss_scripts\%s" ${copy_dest}'
install_psutil_cmd = "py -m pip install ${psutil_whl}"
server_cmd = "start cmd /c py ${s_py} ${port_num}"
client_cmd = "start cmd /c py ${c_py} 99999 %s ${port_num}"

We can try to merge some similar code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard. Let me do more tries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can copy the files together since they are small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Leidong. I am running the test results now and will let you and Qianqian review them together.

@heywji heywji force-pushed the KVMAUTOMA-2405 branch 5 times, most recently from c08ee32 to 5ae0e1e Compare April 1, 2025 06:55
Under various conditions, when the host-to-device packet rate is high, we lose
packets in QEMU due to a lack of guest-allocated buffers. Look also at
virtio-win/kvm-guest-drivers-windows#1012

Signed-off-by: wji <[email protected]>
@heywji
Copy link
Contributor Author

heywji commented Apr 2, 2025

@leidwang @vivianQizhu Could you help me review this patch?

Test Result: PASS

(1/2) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: STARTED 
(1/2) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: PASS (1496.10 s) 
(2/2) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netkvm_buffer_shortage.q35: STARTED 
(2/2) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netkvm_buffer_shortage.q35: PASS (1479.14 s)

x86_64:
psutil_whl = "psutil-6.1.1-cp37-abi3-win_amd64.whl"
pip_cmd = "py -m pip install ${psutil_whl}"
dest_location = "pushd ${copy_dest}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the absolute path directly so that we don't have to switch directories again and again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, but the Python script under winutils reports some errors. This script was provided to us by the upstream reporter, and it will take extra time to modify it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, if so, that's fine with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants