Ticket #6 (new defect)

Opened 5 years ago

Last modified 19 months ago

Zero Length Packet issue

Reported by: nikias Owned by:
Milestone: 1.0.17 Component: libusb-1.0
Keywords: ZLP Cc: hadess@…, hector@…, chris2k01@…, headch@…
Blocked By: Blocks:

Description

This patch fixes the Zero Length Packet issue by allowing the client
program to specify the LIBUSB_TRANSFER_ZERO_PACKET flag when calling
libusb_submit_transfer. This will pass the URB_ZERO_PACKET flag to the
appropriate ioctl and thus triggers zero-length packet processing
inside the kernel. This patch will also pass the URB_SHORT_NOT_OK flag
if LIBUSB_TRANSFER_SHORT_NOT_OK is specified.

Change History

comment:1 Changed 5 years ago by nikias

  • Type changed from defect to enhancement

comment:2 follow-up: Changed 5 years ago by dsd

Thanks for the patch.

The USBFS_URB_SHORT_NOT_OK change will break the meaning of LIBUSB_TRANSFER_SHORT_NOT_OK - please leave that part out.

The new flag needs better documentation, please. To me, "zero-terminate a packet" seems to suggest that libusb will change the last byte of the last packet in a transfer to 0.

I've actually forgotten what this patch is trying to do. your implementation will cause a zero length packet to be sent after every URB in a transfer. Instead, don't we want only to be only sending a zlp at the end of the transfer?

comment:3 in reply to: ↑ 2 Changed 5 years ago by M.S.

  • Type changed from enhancement to defect

Replying to dsd:

Thanks for the patch.

The USBFS_URB_SHORT_NOT_OK change will break the meaning of LIBUSB_TRANSFER_SHORT_NOT_OK - please leave that part out.

I don't get what you mean with this? Please mind that without this patch, the LIBUSB_TRANSFER_SHORT_NOT_OK flag afaik has no effect and is never ever used.

The patch fixes this and if you had a different meaning in mind, please rename this flag as it will collide with the already existing kernel flag.

The new flag needs better documentation, please. To me, "zero-terminate a packet" seems to suggest that libusb will change the last byte of the last packet in a transfer to 0.

Yes, it should be possible to fix the comment and make it a bit more clear.

I've actually forgotten what this patch is trying to do. your implementation will cause a zero length packet to be sent after every URB in a transfer. Instead, don't we want only to be only sending a zlp at the end of the transfer?

The USB spec defines that if the last packet of a bulk transfer has the exact size of the endpoint max packet size, the whole transfer must be terminated by a zero length urb.

If apps don't sent this in such a situation libusb times out and the initial urb is never sent resulting in a broken application.

All Kernel drivers use the URB_ZERO_PACKET to comply to the spec correctly.

Only the application using libusb can determine if the transfer is meant to be the last one.

That is why this flag is introduced by this patch so the right urb flag is set and the kernel send queue transparently generates a zero urb after the one set fixing the timeout and broken application communication.

It is a very explicit spec definition and a lot of people don't know about it and end up with thinking it is some kind of usb error of their hardware/host controller.

Also mind that some commits ago a patch attempted to fix this situation and allow zero length urbs to be generated by libusb. However it does not appear to work for everyone due to the async nature of libusb 1.0+ and this patch works for both usb 1.1 and usb 2.0.

The bug was discovered during the implementation of the native usb protocol (usbmux) of the iPhone and causes communication to hang only in rare situations when the sent packet has the exact endpoint packet size.

This problem is blocking the whole chain of apps to allow seamless iPhone support for Linux.

Thanks for looking into it.

comment:4 Changed 5 years ago by nikias

Here some new patches.

comment:5 follow-up: Changed 5 years ago by dsd

If LIBUSB_TRANSFER_SHORT_NOT_OK doesn't work as documented, please file another bug. I would be surprised though. Your patch will make a short URB fall into an "unrecognised urb status" error path, resulting in the user being notified of transfer failure too early and any further URBs in that transfer remaining pending. if we want to detect this through usbfs flags then more work is needed on the libusb side, but I would say that the currently implemented approach is easier, and also there is discussion of changing the meaning of that particular usbfs flag.

The fix made recently (15f18e64e96ae4ecc8e43d0de00933059a5209a) allows you to send a zero-length packet just by creating a transfer of length 0. It works for synchronous and asynchronous. USB1.1 and USB2. as discussed on the mailing list, any flag we add here is for convenience, rather than actually adding something that couldn't be done with libusb before that fix (right?)

Thanks for the the more detailed documentation on the new flag, but we really need to describe the effect of the flag, not it's implementation. usbfs is not used on Darwin, and libusb is designed to distance people a bit from usbfs, so we shouldn't ask the users to understand what an URB is, or the semantics of the ioctls.

Am I right in understanding that the usbfs ZERO_PACKET flag can be set on any URB of any length, and it will cause an additional packet of zero length to be tacked onto the end of the data transfer, immediately after the final packet used to carry the data?

Assuming the above is correct, if I apply your patch and submit a host-to-device bulk transfer of 17408 bytes on an endpoint with max packet size 512, the following sequence of packets will be transferred: 32 512-byte data packets, one ZLP, 2 512-byte data packets, one ZLP. Is that what we want?

If you refer to the USB specs in the documentation, please reference the exact section. I would prefer to tone down the advice though -- even if the specs are written as you say, I have yet to encounter a device that requires a ZLP in this fashion. And given that libusb has been around for years and has only had the ability to send ZLPs for about a month in latest git and latest kernel, I'd say it is a rare requirement at most.

final question for now, what happens if I set this flag on an interrupt transfer? how about isochronous? control?

Thanks for your efforts and patience!

comment:6 in reply to: ↑ 5 Changed 5 years ago by M.S.

Replying to dsd:

If LIBUSB_TRANSFER_SHORT_NOT_OK doesn't work as documented, please file another bug. I would be surprised though. Your patch will make a short URB fall into an "unrecognised urb status" error path, resulting in the user being notified of transfer failure too early and any further URBs in that transfer remaining pending. if we want to detect this through usbfs flags then more work is needed on the libusb side, but I would say that the currently implemented approach is easier, and also there is discussion of changing the meaning of that particular usbfs flag.

Ok, forget about discussing that flag. It doesn't solve the issue this ticket is based on anyways and appears to be just named the same as a kernel flag for urbs doing something different.

The fix made recently (15f18e64e96ae4ecc8e43d0de00933059a5209a) allows you to send a zero-length packet just by creating a transfer of length 0. It works for synchronous and asynchronous. USB1.1 and USB2.

Unfortunately, it does not work. While it allows you to pass a zero packet to the function, it will generate a second transfer.

Reading closely, I wrote that the end of a (single!) transfer must be terminated by a zero packet, what the current implementation allows is that you must create two transfers, one with your "last data packet" and a second one with the ZLP. This does not work. The usb communication already stalls on the first packet as it was not ended with a zero urb.

This patch here was also discussed with Hector Martin who initially requested the fix for the ZLP packets.

Perhaps the circumstances regarding sending it within one transfer is due to some hardware limitations as it appears to work for Hector whereas it does not for myself and Nikias.

as discussed on the mailing list, any flag we add here is for convenience, rather than actually adding something that couldn't be done with libusb before that fix (right?)

No. With libusb 0.1 it was possible to send the last data packet with a size of wMaxPacketSize and signal the host controller the end of the transfer using a second send of a ZLP solving this situation.

The same implementation in libusb 1.0 fails and there is no way to send a ZLP within the same transaction afaik.

So, without this fix (kernel drivers do this), all libusb 1.0 consuming software will encounter this issue sooner or later and is affected.

Example:

  1. must send a single 512 byte packet, thus use libusb_fill_bulk_transfer() for that buffer
  2. submit the packet with libusb_submit_transfer()

-> stalls, controller does not send packet, it is discarded after a timeout

Now what Hector did was to add this afterwards:

  1. The code recognizes (packetsize % 512 == 0) -> it must terminate with a ZLP
  2. It does libusb_fill_bulk_transfer() with a zero length empty buffer
  3. submit the packet with libusb_submit_transfer()

-> this fails. It always hangs already at 2. because we have two transfers here.

The solution is to send a ZERO length URB directly after the last packet within the first transfer. The API does not allow us to do this before a libusb_submit_transfer(), right?

That's what the flag is for. Solution is then like this:

  1. must send a 512 byte packet, thus use libusb_fill_bulk_transfer() for that buffer
  2. The code recognizes (packetsize % 512 == 0) -> it must terminate with a ZLP
  3. Sets libusb_transfer->flags |= LIBUSB_TRANSFER_ZERO_PACKET;
  4. Kernel does the right thing now and terminates the transfer with a ZLP urb following the data urb

Thanks for the the more detailed documentation on the new flag, but we really need to describe the effect of the flag, not it's implementation. usbfs is not used on Darwin, and libusb is designed to distance people a bit from usbfs, so we shouldn't ask the users to understand what an URB is, or the semantics of the ioctls.

For darwin, the effect must be replicated for usb bulk sends (since this "feature" is described in the usb spec and applies to all usb platforms), if there is no such functionality provided (I doubt it) on that platform it must be implemented by hand.

Am I right in understanding that the usbfs ZERO_PACKET flag can be set on any URB of any length, and it will cause an additional packet of zero length to be tacked onto the end of the data transfer, immediately after the final packet used to carry the data?

It can be set on any URB, but the kernel will only immediately generate a ZLP if the actual packet has the exact endpoint size, see the ohci-q.c ehci-q.c kernel sources (search for the flag).

Assuming the above is correct, if I apply your patch and submit a host-to-device bulk transfer of 17408 bytes on an endpoint with max packet size 512, the following sequence of packets will be transferred: 32 512-byte data packets, one ZLP, 2 512-byte data packets, one ZLP. Is that what we want?

No, there situation would more be like 32 512-byte data urbs, 2 512-byte data urbs, one "terminating" ZLP urb submitted to the host controller.

If you pass a buffer of size 17408, it will automatically be split correctly into the wMaxPacketSize urbs, however due to the overall size being (17408 % 512 == 0) you must set the ZLP flag or send a ZLP urb within the same transaction (which as noted does not work with libusb 1.0) or all your data will not be sent and the transfer will timeout.

If you refer to the USB specs in the documentation, please reference the exact section.

Ok, let's dig into this... Section 5.8.3 of the usb 2.0 specification states that:

"[A bulk] endpoint must always transmit data payloads with a data field less
than or equal to the endpoint’s reported wMaxPacketSize value. When a bulk
IRP involves more data than can fit in one maximum-sized data payload, all
data payloads are required to be maximum size except for the last data
payload, which will contain the remaining data. A bulk transfer is complete
when the endpoint does one of the following:

  • Has transferred exactly the amount of data expected
  • Transfers a packet with a payload size less than wMaxPacketSize or transfers

a zero-length packet

When a bulk transfer is complete, the Host Controller retires the current IRP and advances to the next IRP."

As you see this is all documented and must be supported by libusb 1.0.

I would prefer to tone down the advice though -- even if the specs are written as you say, I have yet to encounter a device that requires a ZLP in this fashion.

It is common for usb bulk transfers in general and not some random mutation of a specific usb device (like the iPhone).

And given that libusb has been around for years and has only had the ability to send ZLPs for about a month in latest git and latest kernel, I'd say it is a rare requirement at most.

The same issue was possible to handle using libusb 0.1 just fine. It fails with libusb 1.0. It happens rarely and only if you send those specific packet sizes.

So the kernel drivers use it, it is a documented "issue/feature", all libusb 1.0 bulk sends will suffer from this and the effects appear randomly being really really hard for a novice libusb/usb programmer to understand. Seriously, it is important to have this situation resolved and not skip over it.

final question for now, what happens if I set this flag on an interrupt transfer? how about isochronous? control?

The flag only has an effect on bulk transfers.

If you have a different implementation in mind to solve the issue we are all ears and willing to try asap, since as stated the whole chain of iPhone for Linux apps depend on this being fixed.

comment:7 follow-up: Changed 5 years ago by dsd

OK. If libusb-0.1 works where 1.0 does not, that sounds like a pretty significant issue. Can you help me dig into that? It will probably increase both of our understandings.

How did you send ZLPs using libusb-0.1? Can you show a test program?

What does the same code look like for libusb-1.0?

Can you capture usbmon traces for both? Then we can dig into the precise differences.

comment:8 Changed 5 years ago by nikias

  • Summary changed from [PATCH] Fix Zero Length Packet issue, new LIBUSB_TRANSFER_ZERO_PACKET flag. to Zero Length Packet issue

OK, here's another approach (see patch):
linux_usbfs.c: submit_bulk_transfer now itself checks if the packet requires sending an additional zero-length packet, and sets the URB_ZERO_PACKET flag to notify the kernel about it.
This way, the application doesn't have to think at all about wMaxPacketSize or flagging or producing a zero length packet.
With this patch it would also make sense to revert 615f18e64e96ae4ecc8e43d0de00933059a5209a. And as stated in the commit message, it didn't even work depending on the kernel.
The patch #0002 from my last post should still follow up this new #0001, and #0003 should be dropped.

comment:9 Changed 5 years ago by nikias

Ah, forgot something. The disadvantage is that each time a bulk transfer is about to be submitted, libusb (with my patch) reads the wMaxPacketSize of the endpoint. Due to sysfs file access, this takes about 200 usecs to do this (libusb_get_active_config_descriptor is the one that takes most of the time). We would need a caching of the device's active config descriptor to speed things up. Any idea about this?

comment:11 in reply to: ↑ 10 ; follow-up: Changed 5 years ago by nikias

Ok, you are right. This flags URB_ZERO_PACKET too often.

Replying to dsd:

http://article.gmane.org/gmane.comp.lib.libusb.devel.general/6210

"You can work around the limitation to some extent by queuing an extra 0-length transfer when necessary."
Which does not work... kernel issue? I don't think it's an option to patch the kernel for this.

"A better option would be to give users a way to specify that this flag should be set."
Isn't this what my first patch does? It gives the user exactly this possibility.

http://article.gmane.org/gmane.comp.lib.libusb.devel.general/6211

"If you want add a separate API that says "send a ZLP now", or a flag that says "send a ZLP at the end of this transfer", that would work OK, but you cannot determine this heuristically."

OK, LIBUSB_TRANSFER_ZERO_PACKET says "send a ZLP now".

So what is the problem with having this flag? If this flag is not acceptable due to other systems not having this 'feature', they need to implement it too or just ignore this flag (like darwin just ignores ANY flag).

comment:12 in reply to: ↑ 11 ; follow-up: Changed 5 years ago by dsd

Replying to nikias:

Which does not work... kernel issue?

If you know of any outstanding kernel issues, please go into detail.

"A better option would be to give users a way to specify that this flag should be set."
Isn't this what my first patch does? It gives the user exactly this possibility.

Yes. I have no problem with adding the flag, but I believe the patches posted above to be incorrect. I might be missing something though. Having good documentation on the effects of the flag would be helpful.

I figured if we first attack the libusb-1.0 regression reported above (where 0.1 could send ZLPs through standard APIs but 1.0 not - I'd certainly appreciate your help there) then that would clean up any holes in our understandings and help our discussion surrounding adding this flag and some user documentation for it.

Or would you prefer that we ignore the regression for now and move straight onto adding the new flag?

comment:13 in reply to: ↑ 12 Changed 5 years ago by nikias

Replying to dsd:

Replying to nikias:

Which does not work... kernel issue?

If you know of any outstanding kernel issues, please go into detail.

You yourself wrote in the commit message "Note that there are is a kernel bug preventing this from working properly at the moment, even after this fix." so I thought _you_ know what this is about as I think the kernel is working correctly for what I see.

Yes. I have no problem with adding the flag, but I believe the patches posted above to be incorrect. I might be missing something though. Having good documentation on the effects of the flag would be helpful.

What else needs to be documented? This flag has the effect of telling the kernel not to wait for another packet if the last packet has N*wMaxPacketSize but to send it immediately. And the application is aware of this situation and has to flag if (size % wMaxPacketSize) == 0.

I figured if we first attack the libusb-1.0 regression reported above (where 0.1 could send ZLPs through standard APIs but 1.0 not - I'd certainly appreciate your help there) then that would clean up any holes in our understandings and help our discussion surrounding adding this flag and some user documentation for it.

OK, but libusb-0.1 does not offer a way to the user to specify flags, so you just send a packet with a buffer size of 0 and a pointer to NULL as buffer. But in libusb-1.0 why do we need to do it with an explicit zero-length submit (which might fail due to some async issue or whatever) when we just can use a flag since the API allows us to set flags?

Or would you prefer that we ignore the regression for now and move straight onto adding the new flag?

I think not adding the flag is a regression and the issue wit 0-length transfers could be looked into later. But I think once the flag is avaiable, we don't need to support 0-length transfers anymore as flagging is easier to understand and to implement.

comment:14 Changed 5 years ago by nikias

OK, one thing that I have to rework: the flag needs only be set on the _last_ URB not every on inside the loop.

comment:15 Changed 5 years ago by dsd

Replying to nikias:

You yourself wrote in the commit message "Note that there are is a kernel bug preventing this from working properly at the moment, even after this fix." so I thought _you_ know what this is about as I think the kernel is working correctly for what I see.

The kernel was fixed soon after that commit was made, so this is no longer an outstanding issue.

Yes. I have no problem with adding the flag, but I believe the patches posted above to be incorrect. I might be missing something though. Having good documentation on the effects of the flag would be helpful.

What else needs to be documented? This flag has the effect of telling the kernel not to wait for another packet if the last packet has N*wMaxPacketSize but to send it immediately. And the application is aware of this situation and has to flag if (size % wMaxPacketSize) == 0.

The documentation in your 0001-New-flag-LIBUSB_TRANSFER_ZERO_PACKET-for-terminating.patch patch describes the implementation, and not so much the effects. This documentation needs to be aimed at users to be published on the API documentation website. In other words, you can't refer to URBs because an URB is something internal to Linux, and libusb users do not need to have any understanding of this (and hence the documentation does not even explain what an URB is). You shouldn't talk about ioctls, because libusb users do not need to call ioctl() on anything or understand how libusb works internally.

I would like to see it briefly note that this flag is ignored for control, interrupt and isochronous transfers. Another question for you to answer in the docs, what happens if I set this flag on a bulk device-to-host transfer?

If you refer to the specifications, you should also reference the specific section. You seem to be referring to section 5.8.3, and I think you have a slight misunderstanding here. It says:

A bulk transfer is complete when the endpoint does one of the following:
- Has transferred exactly the amount of data expected
- Transfers a packet with a payload size less than wMaxPacketSize or transfers a zero-length packet

Note that it says *one* of the following, not both. So, if a device protocol has been designed where the size of the message is always known in advance, then a ZLP is never needed. For example, my device has a bulk out endpoint with max packet size 32. The documentation says that the only thing I can legally send on this endpoint is a command structure of 32 bytes. Even though each logical communication over this endpoint is the max packet size, I don't have to send a ZLP - the purpose of the ZLP would be to tell the device "this is the end of my logical communication" but infact the device already knows (everything individual communication is 32 bytes long). In fact, it would be illegal for me to send a ZLP because the docs say that the only thing I can transfer is a 32-byte command, so I should expect an endpoint stall if I were to send a ZLP.

So, there *are* situations where ZLPs are not needed even though you are ending your logical communication on a packet boundary. This needs to be explained in the docs (which right now say a ZLP is a must in this situation). I would even go as far as to say that the device protocols which require ZLPs like this are rare - but if you don't agree I'm happy as long as both situations are discussed in the documentation.

The docs also need to explain that you can set this flag on a bulk-out transfer on any size, but it will only have an effect if the transfer size results in the final packet being a multiple of the max packet size. Therefore, if you are working with a device protocol that has this ZLP requirement, it is safe and sensible to set this flag on all your transfers without considering their length.

I figured if we first attack the libusb-1.0 regression reported above (where 0.1 could send ZLPs through standard APIs but 1.0 not - I'd certainly appreciate your help there) then that would clean up any holes in our understandings and help our discussion surrounding adding this flag and some user documentation for it.

OK, but libusb-0.1 does not offer a way to the user to specify flags, so you just send a packet with a buffer size of 0 and a pointer to NULL as buffer.

In other words, you are using the following libusb-0.1 call:

usb_bulk_write(dev, ep, NULL, 0, 0);

and it is working?

this would translate to

libusb_bulk_transfer(handle, ep, NULL, 0, &actual_length, 0);

And I think it's pretty important that if the former works, the latter should do too. Firstly, this is the logical thing to do if you're porting your app from 0.1 to 1.0, and I'd rather not add complications where they can be avoided. Secondly, as you have pointed out, ZLPs are a pretty important part of the USB spec, so it would kinda suck if our main easy-to-use function for sending bulk messages (libusb_bulk_transfer) was unable to send them. Thirdly, even after your flag, there is no way that libusb-1.0's sync API can use it, so making the libusb_bulk_transfer() call is the obvious approach to use for people in such situations as converting to async is trickier (maybe we can figure out a way to add flags into the sync API later on). Fourthly, the libusb project includes a libusb-compat layer which would translate the above libusb-0.1 call into the above libusb-1.0 call, so it's even more important that it works for compatibility's sake.

So, pretty please help me understand and fix this, in addition to adding the new flag. Moving back to that topic, I think there is a slight gap in your understanding.

libusb deals with transfers. The kernel deals with URBs. The bus and the device deal with packets.

A transfer is a collection of URBs. An URB is a collection of packets. The bus and device only know about packets -- not URBs or transfers. The kernel only knows about URBs, not about transfers.

Your patch enables the flag on every URB in a transfer, and I still believe this would result in the results on the bus being "32 data packets, 1 ZLP, 2 data packets, 1 ZLP" in the example I gave above. You cannot argue that the kernel would know that the first URB in the transfer is not the end of the transfer, because the kernel has no concept of transfer. I think you only want to enable the flag on the final URB in the transfer.

Finally, what do you think about the naming of the flag? I know it's named "ZERO_PACKET" in the kernel, but I think we can do a little better. Maybe LIBUSB_TRANSFER_ADD_ZERO_PACKET, or LIBUSB_TRANSFER_PROTO_NEEDS_ZLP, not sure that I like any of them that much!

And I know this is going outside the scope of your patch, but section 5.7.3 indicates that we might run into interrupt out endpoints with the same constraint. Would you be interested in looking into what needs to be done there?

comment:16 Changed 5 years ago by nikias

Hi, maybe to a recent kernel update on my system (2009-08-18) it seems to work now. I wanted to create usbmon logs and then it suddenly worked, see for yourself:

http://pims.selfip.net/~nikias/usbmon-log-libusb0.1-explicit_zlp.txt
http://pims.selfip.net/~nikias/usbmon-log-libusb1.0-explicit_zlp.txt
http://pims.selfip.net/~nikias/usbmon-log-libusb1.0-with-flag.txt

So the workaround is working but I think we should go the different approach. I know it has been discussed on the list, but I think that libusb-1.0 should set this flag automatically. This new patch only flags if the transfer direction is out (means writing) and if tranfer->length is > 0, which means that if an explicit zlp is used it still works. And it definitely only flags the _last_ URB in question (if transfer->length % max_packet_size == 0). As I said before we need a caching of the active_config_descriptor here to prevent performance dropping due to the reading of wMaxPacketSize. Perhaps this automatic ZLP processing should be switchable on/off by a flag?

comment:17 in reply to: ↑ 7 Changed 5 years ago by M.S.

Replying to dsd:

OK. If libusb-0.1 works where 1.0 does not, that sounds like a pretty significant issue. Can you help me dig into that? It will probably increase both of our understandings.

Yes, it is important you understand what happens.

How did you send ZLPs using libusb-0.1? Can you show a test program?

A test program is hard to do as you would need a corresponding device to send stuff through.

Let's start of with this simple example with libusb-0.1 solving this "issue" in partly pseudo code:

int send_to_device(device_t device, char *data, int datalen) {
  int bytes = 0;
  do {
    bytes = usb_bulk_write(device->usbdev, BULKOUT, data, datalen, 1000);
    if ((bytes % device->wMaxPacketSize) == 0) {
      char nullp = 0;
      int res = usb_bulk_write(device->usbdev, BULKOUT, &nullp, 0, 1000);
  } while(0);
  return bytes;
}

What does the same code look like for libusb-1.0?

Earlier attempt which fails (on some systems):

int send_to_device(device_t device, char *data, int datalen) {
  struct libusb_transfer *xfer = libusb_alloc_transfer(0);
  libusb_fill_bulk_transfer(xfer, device->usbdev, BULK_OUT, (void*)data, datalen, tx_callback, device, 0);
  xfer->flags = LIBUSB_TRANSFER_SHORT_NOT_OK;
  if((res = libusb_submit_transfer(xfer)) < 0) {
    libusb_free_transfer(xfer);
    return res;
  }
  if (datalen % dev->wMaxPacketSize == 0) {
    // Send Zero Length Packet
    xfer = libusb_alloc_transfer(0);
    void *buffer = malloc(1);
    libusb_fill_bulk_transfer(xfer, device->usbdev, BULK_OUT, buffer, 0, tx_callback, device, 0);
    if((res = libusb_submit_transfer(xfer)) < 0) {
      libusb_free_transfer(xfer);
      return res;
    }
  }
  return res;
}

Approach using the patch and new flag which works:

int send_to_device(device_t device, char *data, int datalen) {
  struct libusb_transfer *xfer = libusb_alloc_transfer(0);
  libusb_fill_bulk_transfer(xfer, dev->usbdev, BULK_OUT, (void*)data, datalen, tx_callback, dev, 0);
  xfer->flags = LIBUSB_TRANSFER_SHORT_NOT_OK;
  if (datalen % dev->wMaxPacketSize == 0) {
    xfer->flags |= LIBUSB_TRANSFER_ZERO_PACKET;
  }
  return res;
}

Can you capture usbmon traces for both? Then we can dig into the precise differences.

For libusb-0.1 an example usbmon trace for this situation looks like this:

...
f1020a80 411.668917 S Bo:3:024:4 - 512 = 00000006 00000200 7568cc45 00109cf0 00182a84 50100200 00000200 
0a000000 00040000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
f1020a80 411.669002 C Bo:3:024:4 0 512 >
f1020a80 xxx.xxxxxx C Bo:3:024:4 0 0
...

(Note: Didn't have a log at hand now so reconstructed last line from what I remember. Should be obvious now though.)

Same output on libusb-1.0. Except two things. On libusb-1.0 the connection already freezes after the 512 bytes packet is logged. Once the timeout is over and communication continues, the ZLP like with libusb-0.1 is never send nor appears in the usbmon trace.

I think this is clearly due to having two calls to libusb_submit_transfer(xfer) whereas the ZLP should already be sent in the "special situation" with the first transfer. If you already submit the stuff to the host controller without compling to the spec and terminating with the ZLP, it is then obvious why it times out and never sends the data over to the device.

Mind that using the newly introduced ZLP flag, the generated ZLP urb will not appear in the usbmon trace as it is generated in the lower level in the kernel queue and not logged anymore (afaik). You would just see that there is no more timeout and the communication is fine.

comment:18 follow-ups: Changed 5 years ago by dsd

Nikias, please re-read my latest comment, and then again :)

I tried to make 2 things clear:
# If you enable this flag all the time in this fashion, you are going to break things for a lot of users. I have a device with a bulk out endpoint, max packet size 32, and each command I send on this endpoint is 32 bytes. If I send 31 bytes, I get a stall. If I send 0 bytes, I get a stall. Commands are always exactly 32 bytes, as documented in the device specifications. If you implement a patch as you describe, every time I send a 32-byte command your patch will cause an illegal ZLP to be sent after, which will cause an endpoint stall.
# If you do find yourself working with a device where ZLPs are needed, then it makes sense for the flag to be appended to the final URB of *every* transfer on that protocol. This is because the flag only has an effect if the final packet is a multiple of the max packet size. Therefore you do not run into the complications of having to lookup the max packet size each time around.

or am I missing some things?

M.S., thanks for the info. It sounds like you are onto something. We need to dig into that freeze. You say that the usbmon output is the same in both cases, and the usbmon output clearly shows the 512 bytes being transferred successfully, so there is no obvious reason that other packets should fail to be sent afterwards.

If you are interested in helping further, the next steps would be to capture those 2 usbmon sessions again, without relying on your recollection, so that we can triple-check that they are identical. Then, could you please compile libusb-1.0 with debug messages (the ./configure option) and capture full output from a session? And also clarify exactly what you mean by "freeze" and at which point in the code that occurs.

Thanks!

comment:19 Changed 5 years ago by dsd

and, M.S, to expand on my thinking some more:

you say that "it is then obvious why it times out and never sends the data over to the device" but actually I don't see why.

Transfers are a libusb concept. They do not make it down to the bus or the device. The device works with packets, and packets only. It reads packets from the endpoint that your software is transmitting on. Since it does not know the length of the data being transferred, it must simply wait for a packet, acknowledge it, and then wait for the next one. It must do this indefinitely, until it recieves a ZLP in which case it knows that's the end of the data. It doesn't know that a ZLP is coming until it has acknowledged each and every preceding packet.

Therefore, in an example where an endpoint has max packet size 32, there is no difference at all between the following 2 situations:

# submit 1 transfer of 32 bytes, another transfer of 32 bytes, another transfer of 32 bytes, a transfer of 0 bytes
# submit 1 transfer of 96 bytes, then a transfer of 0 bytes

Each situation looks identical to the device. That is, it sees: full-size packet, full-size packet, full-size packet, zero-length packet

comment:20 in reply to: ↑ 18 Changed 5 years ago by nikias

OK I think I got it now. I would say my first approach is the right way to do it so that the application sets the flag if it is required for the device.
I thought of changing the documentation to this:

/** If this flag is set, a zero-length packet will be appended to
 * the transfer indicating that the transfer is complete.
 * Some devices require to be notified when a transfer is complete.
 * Usually a device knows about the end of the transfer if the last
 * packet's size is less than wMaxPacketSize.
 * But if the last packet of a transfer has _exactly_ the size of
 * wMaxPacketSize, the device does not know that it is the last
 * packet. This is where a zero-length packet is required, which
 * will let the device know that the transfer is complete.
 */

If I get the USB specification right, this flag could not only be useful for BULK transfers, but also for INTERRUPT and CONTROL transfers, but I don't know how we can test this, we would need a device that requires it. For linux usbfs INTERRUPT and BULK transfers are handled identically. For ISO transfers, this flag makes no sense for what I see.
We should add some lines in the documentation for what kind of transfers this flag is valid.

So we need to agree about the flag's name, I think LIBUSB_TRANSFER_ZERO_PACKET says what it does, but even LIBUSB_TRANSFER_ADD_ZERO_PACKET is not bad.

comment:21 in reply to: ↑ 18 Changed 5 years ago by M.S.

Replying to dsd:

M.S., thanks for the info. It sounds like you are onto something. We need to dig into that freeze. You say that the usbmon output is the same in both cases, and the usbmon output clearly shows the 512 bytes being transferred successfully, so there is no obvious reason that other packets should fail to be sent afterwards.

Well, usbmon shows the 512 bytes packet in both cases, however in the case without a ZLP, you can see the usb communication being frozen (usbmon output stops after that 512-packet) until it times out and the next packets are sent. Following packets do not fail (except if it is a 512 packet again ofc.).

Now the device reports that it did never see this 512 bytes packet, but it sees the following packets. If we send a ZLP directly afterwards with libusb-0.1 or use the ZERO_PACKET flag with libusb-1.0 so the zero urb is generated, it receives that 512 bytes packet without issues and the communication never stalls.

If you are interested in helping further, the next steps would be to capture those 2 usbmon sessions again, without relying on your recollection, so that we can triple-check that they are identical. Then, could you please compile libusb-1.0 with debug messages (the ./configure option) and capture full output from a session? And also clarify exactly what you mean by "freeze" and at which point in the code that occurs.

For libusb-0.1 it freezes after the first usb_bulk_write() following the write of the 512 bytes packet. For libusb-1.0 it freezes with the first libusb_submit_transfer().

By freeze I mean that the codepath does not proceed for like 3 seconds or so (probably some timeout value somewhere).

I'll see to come up with mass debug logs asap.

comment:22 Changed 5 years ago by hadess

  • Cc hadess@… added

comment:23 Changed 5 years ago by M.S.

Ok, so I took some time to investigate and surprisingly could no longer reproduce the issue...
Compared to Nikias who upgraded kernels and saw the issue fixed for him, I did not alter any system libs.

The solution sending the explicit ZLP suddenly works for me now and usbmon agrees:

...
f298dd80 726.913384 S Bo:3:011:4 - 512 = 00000006 00000200 0003c1d7 00001785 00002d6f 50100200 00000000 43464136
f298d980 726.913418 S Bo:3:011:4 - 0
f298dd80 726.913484 C Bo:3:011:4 0 512 >
f298d980 726.913492 C Bo:3:011:4 0 0
...

In above case libusb debugging shows that it generates the ZLP fine (usbmuxd output):

...
[12:16:53.895][6] client_read fd 10 buf 0x80799b0 len 49124
[12:16:53.895][5] [OUT] dev=1 sport=2 dport=49638 seq=6953 ack=14348 flags=0x10 window=131072[512] len=484
[12:16:53.896][6] send_packet(1, 0x6, 0xbfbc1f68, 0x80799b0, 484)
[12:16:53.896][4] usb_send with 512 len
libusb:debug [submit_bulk_transfer] need 1 urbs for new transfer with length 512
[12:16:53.896][4] Send ZLP
libusb:debug [submit_bulk_transfer] need 1 urbs for new transfer with length 0
[12:16:53.896][6] update_connection: sendable 49124, events 1, flags 0
[12:16:53.896][7] main_loop iteration
...

Nevertheless, for the case no explicit ZLP is sent by any code, usbmon shows the output below and hangs after the 512 byte packet (it is never received by the device) as expected:

...
f10a8d80 23.885018 S Bo:3:011:4 - 512 = 00000006 00000200 0002c1e3 0000141b 000026e8 50100200 00000000 43464136
f10a8d80 23.885178 C Bo:3:011:4 0 512 >
...
STALL
...
f1037480 28.886482 S Bo:3:011:4 - 76 = 00000006 0000004c 0002c1e3 000015ff 000026e8 50100200 00000000 43464136
f1037480 28.887562 C Bo:3:011:4 0 76 >
f1097580 62.962943 S Bo:3:011:4 - 70 = 00000006 00000046 0002c1e3 0000162f 000026e8 50100200 00000000 43464136
f1097580 62.963574 C Bo:3:011:4 0 70 >
...

Since I encountered this issue even when using the code to send the explicit ZLP early on, I still think it might be an issue with how the libusb transfers (yes, I get this is a libusb concept) are passed to the kernel queue.

So perhaps in this case both submit_transfer()s (the 512 + ZLP) are kind of "kernel queued" and submitted to the host controller at once.

Since it uses a ZLP to terminate, the host controller then sends out the stuff without issues immediately.

Whereas when the issue is encountered, the kernel might queue in a way (due to async stuff) that it already submits the 512 byte packet to the host controller, which sees no ZLP immediately and thus does not sent the data packet to the device causing this timeout effect.

Well, so perhaps we can move this into resolved state.

If someone else will be able to reproduce this again or if "threading has a bad day" for my system so I encounter the bug again myself, I'll reopen.

comment:24 follow-up: Changed 5 years ago by dsd

Great, thanks for following up.

Your "no explicit ZLP" usbmon output shows:

  1. 512 bytes being sent on bulk-out (S for submit)
  2. Those 512 bytes completing sucessfully very shortly after (0 for success, C for completion)
  3. A 5 second delay where there are no pending transfers (or did you snip something out of the logs?)
  4. 76 bytes being sent on bulk out
  5. Those 76 bytes completing successfully very shortly after
  6. A long delay with no active transfers

so if there are any bugs, it seems they are either in libusb (not reporting transfer completion after the few milliseconds that usbmon shows for the completion event), or in your code (not submitting any more transfers for a long time inbetween each one). would be interesting to investigate.

comment:25 in reply to: ↑ 24 ; follow-up: Changed 5 years ago by M.S.

Replying to dsd:

Great, thanks for following up.

Your "no explicit ZLP" usbmon output shows:

...

  1. Those 512 bytes completing sucessfully very shortly after (0 for success, C for completion)

While 2. obviously signals a success, it probably is just a success indication that the transfer got sent to the host controller. It doesn't show if it went out to the device.

  1. A 5 second delay where there are no pending transfers (or did you snip something out of the logs?)

Did not snip out nothing, just made it a bit more obvious where it stalls for 5 seconds.

The device reports that it does not receive the 512 bytes packet. It does see those before and the 76 bytes one though. As internally in the communication with the device all packets have a sequence index increased by the size of the bytes sent, the device can verify if it did not received packets inbetween.

  1. 76 bytes being sent on bulk out

This and following events are not important, just show that there is a 5 second delay.

so if there are any bugs, it seems they are either in libusb (not reporting transfer completion after the few milliseconds that usbmon shows for the completion event), or in your code (not submitting any more transfers for a long time inbetween each one). would be interesting to investigate.

Mind that if sending the ZLP, this issue does not happen.

The code used here in "no explicit" ZLP mode:

int send_to_device(device_t device, char *data, int datalen) {
  struct libusb_transfer *xfer = libusb_alloc_transfer(0);
  libusb_fill_bulk_transfer(xfer, device->usbdev, BULK_OUT, (void*)data, datalen, tx_callback, device, 0);
  xfer->flags = LIBUSB_TRANSFER_SHORT_NOT_OK;
  if((res = libusb_submit_transfer(xfer)) < 0) {
    libusb_free_transfer(xfer);
    return res;
  }
  return res;
}

The code apparently (need to verify) stalls when it reaches libusb_fill_bulk_transfer() again to send the next data packet (the 76 bytes one for instance).

comment:26 follow-up: Changed 5 years ago by marcan

  • Cc hector@… added

I'm pretty sure the device will (barring any host controller/kernel bugs) always get 512-byte packets that are sent out, ZLP or no ZLP - as far as the hardware and the USB bus is concerned. However, the kernel driver on the device sticks together transfers with no separating ZLP, which causes them to be rejected because the (upper layer) length header doesn't match up. I do support this kind of "stickiness" in usbmuxd (for reception) for sanity's sake, but the device doesn't. It is this that causes the 512-byte packet to be lost to the upper layers.

So if you send a 512-byte ZLPless transfer followed by a 76-byte transfer, the device will see a 512-byte packet followed by a 76-byte packet and recombine them into a 588-byte transfer. Then it will look at the header in the upper layer, see the 512 byte size, and throw the whole thing away because it doesn't match up. If in some cases the 76-byte transfer does go through as dsd claims, it might just be because the original 512-byte packet (as part of an unterminated transfer) timed out after a long wait and the device had a "clean slate" when it received the 76-byte packet.

So we have multiple issues here:

  • Old libusb-1.0 bug that caused 0-byte transfers to completely stall forever and never submit to the kernel (this was fixed)
  • Potential kernel usbcore or hcd bug with ZLP-less 512-byte transfers or 0-byte transfers (fixed with kernel update?)
  • The device will throw away transfers that are stuck together due to a missing ZLP

And the question is whether, in addition to all of this, there is still a tangible bug in either libusb or the kernel (or both).

The iPhone isn't very useful to determine whether the 512-byte packet is actually on the wire. If there is strong suspicion that it doesn't (which would definitely be a host bug), then we need to check that with a lower level device (such as a USB microcontroller) or a real USB protocol analyzer.

In any case, all of the following situations should work as far as raw USB is concerned, and if anything here doesn't work then it is most definitely a bug (all of the following with the ZLP flag, if any, turned off):

  • MAX transfer, short wait (to prevent device timeouts if any), SHORT transfer = MAX,SHORT byte packets on the wire (with wait in between), MAX+SHORT byte transfer reconstructed at the device (if expected)
  • MAX transfer = MAX byte packet on the wire immediately, and seen by the device as long as it internally is expecting MAX bytes with no ZLP.
  • MAX+SHORT transfer (single transfer) = MAX,SHORT packets on the wire, MAX+SHORT transfer reconstructed at device (if expected).
  • n*[n*MAX transfer] + [n*MAX+SHORT transfer] = a bunch of MAX packets and then a SHORT packet on the wire, everything reconstructed at the device as one single transfer.
  • n*[n*MAX transfer] + 0-byte transfer = a bunch of MAX packets and then a ZLP on the wire, everything reconstructed at the device as one transfer.
  • 0-byte transfer = ZLP on wire and 0-byte transfer "reconstructed" at device (unless deliberately filtered or whatever, or unless it follows an unterminated n*MAX transfer of course)

If all of the above works fine, then there is no need for a ZLP flag for transfers. However, since it is exceedingly common for many devices and more efficient than using separate transfers (and easier to code for), I strongly support adding it. It can be passed to the kernel on OSes that support the concept, and emulated with a separate transfer on those that don't. But it should not be a requirement for correct operation, and if it is, that's a bug.

comment:27 in reply to: ↑ 25 Changed 5 years ago by dsd

Replying to M.S.:

Replying to dsd:

Great, thanks for following up.

Your "no explicit ZLP" usbmon output shows:

...

  1. Those 512 bytes completing sucessfully very shortly after (0 for success, C for completion)

While 2. obviously signals a success, it probably is just a success indication that the transfer got sent to the host controller. It doesn't show if it went out to the device.

No, it means that all data was transferred to the device *and* acknowledged. (of course, it does not mean that the device actually processed it). At this precise point, libusb should indicate transfer completion, and normally that's a cue for your app to send the next transfer (which will instantly cause another S line to appear in the usbmon logs).

Either libusb is not reporting completion at the 'C' moment above (and waiting til a timeout before indicating completion), or something in your code is taking a while and causing a delay before the next libusb transfer is submitted. Can you help me figure out which of these scenarios is happening?

comment:28 in reply to: ↑ 26 Changed 5 years ago by dsd

Replying to marcan:

If all of the above works fine, then there is no need for a ZLP flag for transfers. However, since it is exceedingly common for many devices and more efficient than using separate transfers (and easier to code for), I strongly support adding it. It can be passed to the kernel on OSes that support the concept, and emulated with a separate transfer on those that don't. But it should not be a requirement for correct operation, and if it is, that's a bug.

We seem to have reached agreement above that some protocols require ZLPs and some do not.

Also, we determined above that libusb can correctly send ZLPs, by submitting a transfer of size 0. It seems that there is no *need* for a separate ZLP flag to be added.

However, I'm still happy to add a transfer flag to make this convenient, but so far I have not found time to work on this myself, and my feedback (posted above) on the patches has yet to be addressed.

Changed 5 years ago by dsd

my attempt

comment:29 Changed 5 years ago by dsd

Just posted my attempt at the patch and corresponding documentation. if someone can confirm it's working then i'll commit it for future releases. Sorry for the delay on getting to this.

also I'm still interested in attacking the other problem described above, where it seems like libusb's internal processing is not lining up with usbmon -- this is a bug. first step would be to measure the delay between the 'C' line in usbfs and the time when libusb invokes the callback. it should be immediate...

comment:30 Changed 5 years ago by marcan

Do we have confirmation that 2.6.31 is the first kernel release that reliably sends ZLPs? I had the feeling it was earlier (.29 or .30 at the latest).

The implementation on Linux looks fine (haven't tested it yet though), but I don't agree with the Darwin handling. IMO, the best solution would be to queue an explicit manual ZLP if needed (so the apps don't have to worry about it), followed by providing some up-front way of querying whether automatic ZLPs are supported (so the app can decide whether it needs to send ZLP transfers explicitly in advance). Returning an error for Darwin means apps have to deal with detecting it and retrying the transaction without the ZLP flag.

comment:31 Changed 5 years ago by dsd

I looked at the git logs and 2.6.30 doesn't have the ZLP fix.
As for Darwin, I agree and will happily accept patches for an actual implementation. I'll also point out this new flag to the darwin port maintainer. However I wouldn't make it a blocker for the linux functionality.

comment:32 Changed 4 years ago by www.google.com/accounts/o8/id?id=aitoawnx6bhsco20utb3w_lji0s4ph0xraz9pjy

  • Cc chris2k01@… added

I'd like to point out my 2¢ here.

I believe this flag should exist. As a user of libusb, I shouldn't need to know or care what an “URB” is. That string never appears in the USB 2.0 specification. All USB defines are packets, transactions (~3 packets, token+data+handshake), and transfers (1 or more transactions).

Various people have discussed above the quote from 5.7.3 that a transfer completes when the endpoint “Has transferred exactly the amount of data expected”, OR “Transfers a packet with a payload size less than wMaxPacketSize or transfers a zero-length packet”.

The point is, the functions exposed by libusb are all defined in terms of transfers. libusb_submit_transfer, struct libusb_transfer, libusb_interrupt_transfer, they all talk about transfers. To me, as a user, that means I ought not to have to mess around poking at configuration descriptors and such in my application and special-casing the ZLP case. This should all be handled for me. It's all part of a single transfer, after all. The only thing I should have to do is tell libusb whether my device is expecting the exact size of transfer I submitted (current behaviour) or whether the device is expecting a variable-sized transfer and hence needs proper termination (what would be provided with the special flag, noting that merely describing the flag as “adds a ZLP” is slightly misleading since it should really only do so if the transfer to which its attached is a multiple of max packet).

So I vote in favour the transfer flag. It's been 17 months since the last comment; is there any word on when this will get into git (I checked out the tree and it looks like its not there despite some patches above)? For now I'll use the ugly workaround of manually checking the max packet size and adding a second transfer if necessary, but this feels like a huge break of abstraction.

comment:33 Changed 4 years ago by www.google.com/accounts/o8/id?id=aitoawnx6bhsco20utb3w_lji0s4ph0xraz9pjy

  • Cc headch@… added

CCing my new e-mail address (was chris2k01@…)

comment:34 Changed 3 years ago by stuge

In e3d0a4cb9e2f9872c9fdbb22d7ded169e111fc8f/libusb:

Add LIBUSB_TRANSFER_ADD_ZERO_PACKET flag to indicate need for ZLP

Some protocols which use USB require an extra zero length data packet
to signal end-of-transfer on bulk endpoints, if the last data packet
is exactly wMaxPacketSize bytes long.

This flag allows applications to inform libusb about this requirement,
so that libusb can handle the issue transparently.

At the moment the new flag is only supported on Linux, and submitting
a transfer with the flag set returns an error at submit time on other
systems. Hopefully implementations will soon follow for other systems.

References #6.

comment:35 Changed 19 months ago by hjelmn

  • Milestone set to 1.0.16

I added support for an explicit ZLP to the Darwin backend to the upcoming 1.0.16 release. This will leave Windows as the only platform without this feature. Is it needed for Windows?

comment:36 Changed 19 months ago by xiaofan

Based on the above discussion, I think it is good to add this flag to all OS, including Windows. On the other hand, it can of course wait.

comment:37 Changed 19 months ago by hjelmn

I agree. I want to get ZLP support in for Windows and NetBSD for 1.0.17. Has this been implemented in libusbx and if not do you think Pete would be willing to implement this? If it is quick and simple I could get it in for 1.0.16 and push the final release back a week or so.

comment:38 Changed 19 months ago by xiaofan

This is not implemented in libusbx yet and I think it is a low priority now for libusbx.

Ref: https://github.com/libusbx/libusbx/issues/3

comment:39 Changed 19 months ago by hjelmn

  • Milestone changed from 1.0.16 to 1.0.17

Ok, I will push this ticket off to 1.0.17. Will do the same for hotplug until all platforms are supported.

Note: See TracTickets for help on using tickets.