Ticket #70 (closed defect: fixed)

Opened 3 years ago

Last modified 18 months ago

device removal breaks internal state, making later libusb_open calls always fail

Reported by: hagg Owned by: stuge
Milestone: Component: libusb-1.0
Keywords: linux Cc:
Blocked By: Blocks:

Description

I am using libusb-1.0.8 on a linux 2.6.35-22 (Ubuntu 10.10).

When libusb_open_device_with_vid_pid(..) is called in a loop,
it tries to open the usb device until it is powered up (or connected).

When I power down (or disconnect) the device later, there is a 50:50 chance
that libusb "crashes", and it is not possible to open the device from that
loop again. The error message libusb_open_device_with_vid_pid(..) gives is:

libusb:error [__open_sysfs_attr] open /sys/bus/usb/devices/(null)/descriptors failed ret=-1 errno=2

When a 1-second-sleep is appended to the loop the error does not occur and
it is possible to power up/down the device several times without problems.
(race condition somewhere?)

The error occurs when I run the program as an unprivileged user (w/o permissions to open the device) as well as when I run it as root.

A minimal program to trigger the error, its output and dmesg output are appended.

Attachments

libusb-problem.c (815 bytes) - added by hagg 3 years ago.
source code of minimal program to trigger the error
libusb-problem.works (1.2 KB) - added by hagg 3 years ago.
output of working program (w/ sleep(1) inside the loop)
libusb-problem.error (2.2 KB) - added by hagg 3 years ago.
output of erroneous program (w/o sleep)
libusb-problem.dmesg (1.1 KB) - added by hagg 3 years ago.
dmesg output - nothing conspicious - just "new full speed USB device using uhci_hcd"s and "USB disconnect"s
libusb-fulllog.txt (1.5 MB) - added by hagg 3 years ago.
full libusb debug log. error seems to occur around line 16035 and following.
libusb-shortened-fulllog.txt (11.3 KB) - added by hagg 3 years ago.
full log was to big to view online, so this si an excerpt: lines 15935-16134. ca 100 lines before after something breaks.
0001-Fix-Race-Condition-in-sysfs_get_device_list.patch (8.3 KB) - added by stuge 2 years ago.
Alan's latest patch
0001-Linux-Fix-70-race-condition-in-sysfs_get_device_list.patch (8.4 KB) - added by stuge 2 years ago.
Alan's latest patch against libusb-stuge with fix for opendir() bug

Download all attachments as: .zip

Change History

Changed 3 years ago by hagg

source code of minimal program to trigger the error

Changed 3 years ago by hagg

output of working program (w/ sleep(1) inside the loop)

Changed 3 years ago by hagg

output of erroneous program (w/o sleep)

Changed 3 years ago by hagg

dmesg output - nothing conspicious - just "new full speed USB device using uhci_hcd"s and "USB disconnect"s

comment:1 follow-up: Changed 3 years ago by xiaofan

I believe similar problem has been discussed before. The problem is currently libusb-1.0 does not support hotplug. You may have to wait for libusb-1.1 or try the non-yet-integrated Nokia hotplug patch.

http://libusb.6.n5.nabble.com/libusb-get-device-list-fails-with-No-such-file-of-directory-td510637.html

comment:2 in reply to: ↑ 1 Changed 3 years ago by stuge

  • Keywords linux added
  • Summary changed from not possible to re-open a device that was switched off (while opened) to device removal breaks internal state, making later libusb_open calls always fail

Replying to xiaofan:

The problem is currently libusb-1.0 does not support hotplug.

Yes and no. Regardless of hotplug support libusb clearly has some problem with internal state here. When studying the parts of libusb exercised here it is clear that there is room for improvement.

libusb should not behave this way, with or without hotplugging.

The linked email thread suggests that the problem is in __open_sysfs_attr() but it doesn't look like that's the whole story. That function is part of the exercised code path, but not the only place where I think there may be a problem.

Since this is a backend problem I would very much appreciate if someone could test for this problem also on MacOS X and Windows.

hagg, I would appreciate if you could also provide a full libusb debug log from a failure run. You may need to recompile libusb with the --enable-debug-log configure flag. Output will be lengthy. It will help debug this. Thanks!

Last edited 3 years ago by stuge (previous) (diff)

Changed 3 years ago by hagg

full libusb debug log. error seems to occur around line 16035 and following.

Changed 3 years ago by hagg

full log was to big to view online, so this si an excerpt: lines 15935-16134. ca 100 lines before after something breaks.

comment:3 follow-up: Changed 3 years ago by xiaofan

I am not that sure if the program is valid as it does not close the device handle before opening it again.

comment:4 in reply to: ↑ 3 Changed 3 years ago by hagg

Replying to xiaofan:

I am not that sure if the program is valid as it does not close the device handle before opening it again.

indeed there doesn't seem to be a difference if you close the device again after opening it successfully, and when opening fails and libusb_open_device_with_vid_pid(..) returns NULL there is no handle to close.

using this code in the loop does not change anything for me:

        m_device = libusb_open_device_with_vid_pid(NULL,
                                                   VENDOR_ID, PRODUCT_ID);
        if (m_device == NULL)
        {
            printf("could not open device
");
        } else
        {
            printf("device opened, close it again
");
            libusb_close(m_device);
        }

did you have different results with closing the device again? then please let me know (or attach your example) as it may help me to solve (or at least work around) the problem.

Last edited 3 years ago by hagg (previous) (diff)

comment:5 Changed 3 years ago by xiaofan

If you can, please subscribe to the mailing list. In the discussion, Pete already mentioned the problem with your test program.

http://libusb.6.n5.nabble.com/libusb-70-not-possible-to-re-open-a-device-that-was-switched-off-while-opened-td3235177.html

As for the result under Linux, at least it works at normal state. Without sleeep(1) line, when you unplug the device and plug it back in (for me, I press the reset button), sometimes it will not work (no long be able to open the device). Sometime it works (especially if I only press the button shortly). So indeed there is a problem with libusb in this situation under Linux. Please refer to the thread I mentioned before.
http://libusb.6.n5.nabble.com/libusb-get-device-list-fails-with-No-such-file-of-directory-td510637.html

With the sleep(1), the code works under Linux (tested with latest libusb.git and Ubuntu 10.10 32bit).

What is the real problem you want to solve?

#include <libusb-1.0/libusb.h>
#include <stdio.h>
#include <unistd.h>

#define VENDOR_ID 0x0925
#define PRODUCT_ID 0x7001

libusb_device_handle *m_device;

int main(void)
{

if (libusb_init(NULL) != 0)
{

printf("could not init libusb context

");

return -1;

}
libusb_set_debug(NULL, 0);

while (1)
{

printf("

");

printf("try to open device...

");

m_device = libusb_open_device_with_vid_pid(NULL, VENDOR_ID, PRODUCT_ID);
if (m_device == NULL)

printf("could not open device

");

else {

printf("device opened, close it again

");

libusb_close(m_device);

}

sleep(1);

}

}

comment:7 Changed 2 years ago by alan.ott

Posted a new patch to the mailing list:

http://libusb.6.n5.nabble.com/Race-Condition-in-libusb-tp3321808p3328267.html

Please disregard the one I posted previously. We decided that it wouldn't work in all cases.

Entire Email Thread:

http://libusb.6.n5.nabble.com/Race-Condition-in-libusb-td3321808.html

Changed 2 years ago by stuge

Alan's latest patch

comment:8 follow-up: Changed 2 years ago by arne

The patch has a minor bug in it.
In op_init() it calls

DIR *devices = opendir(SYSFS_DEVICE_PATH); 

but does not close the directory afterwards.

comment:9 in reply to: ↑ 8 Changed 2 years ago by stuge

Replying to arne:

The patch has a minor bug in it.
.. does not close the directory

Nice find! I've added a fix for this.

Changed 2 years ago by stuge

Alan's latest patch against libusb-stuge with fix for opendir() bug

comment:10 Changed 2 years ago by xiaofan

  • Owner set to stuge
  • Status changed from new to assigned

Alan Ott's latest patch is now integrated in libusb-stuge.

comment:11 Changed 21 months ago by alan.ott

In [ee411c51951c80d269ddc3efff7d4d1178c59b3d/libusb-pbatard]:

Fix #70 race condition in sysfs_get_device_list()

Change the way libusb chooses between using sysfs and usbfs for information
about the attached devies. Using the old method, a race condition could
occur if a device was unplugged just before (or during) the call to
libusb_get_device_list(), corrupting the internal sysfs_can_relate_devices
and sysfs_has_descriptors variables and preventing libusb_get_device_list()
from working in future calls.

The old method was based on the assumption that if certain sysfs files
(eg: busnum) could not be opened, that indicated an inadequacy of sysfs
(ie: the running kernel's sysfs version did not contain those files),
when in reality those files couldn't be opened because the device had
been unplugged.

The new method checks the adequacy of sysfs during libusb_init()
(op_init()) and if a sysfs file cannot be opened, it is now assumed that
it is because the device has been unplugged, not because sysfs is
inadequate.

Signed-off-by: Alan Ott <alan@…>
[stuge: Include closedir() bugfix posted in ticket by Arne Laansoo]
[stuge: Remove dead code in sysfs_scan_device() found by Hans de Goede]

comment:12 Changed 18 months ago by alan.ott

  • Resolution set to fixed
  • Status changed from assigned to closed

In [5f30c81f66e8dd61f8eae16de548697708f9bd18/libusb]:

Linux: Fix #70 race condition in sysfs_get_device_list()

Change the way libusb chooses between using sysfs and usbfs for information
about the attached devies. Using the old method, a race condition could
occur if a device was unplugged just before (or during) the call to
libusb_get_device_list(), corrupting the internal sysfs_can_relate_devices
and sysfs_has_descriptors variables and preventing libusb_get_device_list()
from working in future calls.

The old method was based on the assumption that if certain sysfs files
(eg: busnum) could not be opened, that indicated an inadequacy of sysfs
(ie: the running kernel's sysfs version did not contain those files),
when in reality those files couldn't be opened because the device had
been unplugged.

The new method checks the adequacy of sysfs during libusb_init()
(op_init()) and if a sysfs file cannot be opened, it is now assumed that
it is because the device has been unplugged, not because sysfs is
inadequate.

Signed-off-by: Alan Ott <alan@…>
[stuge: Include closedir() bugfix posted in ticket by Arne Laansoo]
[stuge: Remove dead code in sysfs_scan_device() found by Hans de Goede]

Note: See TracTickets for help on using tickets.