# SPDX-FileCopyrightText: 2006-2021 Istituto Italiano di Tecnologia (IIT) # SPDX-FileCopyrightText: 2006-2010 RobotCub Consortium # SPDX-License-Identifier: BSD-3-Clause Lorenzo Natale's bug hunting: ==================================================================== In $YARP_ROOT/example/stressTests you find some code + a couple of scripts that produce some undesired behavior. It is basically a remote_controlboard accessed concurrently from a number of controlboard (stressrpc.cpp). Code and scripts are easy, so I don't go too much in the details, I just summarize here the behavior I see: ./stress.sh works fine ./stress2.sh works fine, but at the end files to quit yarpdev, just have a look at the output of ps (this is the undefined behavior I see) ./stress3.sh works fine ==================================================================== Lorenzo: I have uploaded a stressrpcMD.cpp (MD is for motion done). If you run the fakeMotionControl device like this: yarpdev --device controlboard --subdevice fakeMotor --delay 80 and then two instances of stressrpcMD.cpp as: console1: stressrpcMD --id 0 console 2 stressrpcMD --id 1 you *should* hopefully see that sometimes one of the two (the second?) hangs within the "checkMotionDone" call (but sometimes even in the port "open" call). The problem is more frequent if you kill and restart one of the two clients. When this happens the fakeMotor device does not close properly unless the clients are killed (similarly to the previous bugs I have recently reported) I believe the problem has to do with timing. In fact it took me a while to replicate it in the fakeMotor device. The problem on the robot was triggered very reliably in the getPid function which took 70-90ms to complete. For this reason I have added a Time::delay() in the checkMotionDone function of the fakeMotor device (this delay can be changed with the --delay parameter). Caveats: the problem happens often but with variable likelihood, the --delay 80 appeared to trigger it more often, but I'm no longer sure. Maybe adding the sleep was enough to make the problem more probable and I got fooled into thinking the "80" number was more important than it is... I don't know. Anyway let's first see if you can reproduce the problem... it might be machine dependent. ==================================================================== Paul: I've added a stress test that doesn't use the controlboard: "smallrpc" smallrpc --server smallrpc --client --name /client0 smallrpc --client --name /client1 smallrpc --client --name /client2