25ms delay in WriteToPipeIn

Under certain yet not fully understood conditions the following can happen: (linux usbmon output)

ffff8b32769c3e40 1199806266 S Bo:2:004:2 -115 12304 = 292b4030 40000000 01c82b00 4b2d4030 40000000 02c82b00 4e344030 40
ffff8b32769c3e40 1199806370 C Bo:2:004:2 0 12304 >
ffff8b32769c3e40 1199806378 S Ci:2:004:0 s c0 c7 0000 0000 0020 32 <
ffff8b32769c3e40 1199806454 C Ci:2:004:0 0 32 = 10300000 10300000 01000000 00000000 00310000 00310000 09000000 00000000
ffff8b32769c3e40 1199831544 S Ci:2:004:0 s c0 c7 0000 0000 0020 32 <
ffff8b32769c3e40 1199831711 C Ci:2:004:0 0 32 = 10300000 10300000 03000000 00000000 00310000 00310000 09000000 00000000

you can see that there is a 25 msec delay between the first and second control transfer and that the 3rd word on the device response sees a change from 01000000 to 03000000.

If for some reasons these delays happen in a session, they happen often, but not for every bulk out transfer. Some of those packets are followed by a single control transfer and the 3rd word in the reply is always 03000000.

I kid you not: if the process is run under strace, then this delay never happens.

There may be a 25ms sleep somewhere in the frontpanel code base or in the cypress code base that does not belong there.

i’d really love to go back solving my own bugs and i am not sure i can come up with a workaround here.

The control transfer is a vendor specific request c0. since cypress reserves a0-af, i suspect this is an opalkelly specific request and i would love to know what it is all about.

thx for your time and attention.

the device in question is a xem7350-k410T, frontpanel version is 5.1.0 x64 ubuntu18.04.

The bug (?) also happens with frontpanel 4.5.0

It would be helpful if you could include the following:

  1. What you are doing. (FrontPanel API calls, etc)
  2. What you are expecting the results to be.
  3. What the results are and how they differ from your expectations.
  1. write to pipe in then read from pipe out, process the buffer. in a long running loop. that is all i am doing.

  2. i am expected there to be no occasional 25ms delay. it does not happen with every transfer, but it depresses my transfer rate to less than 10%. 25ms is a long time for a gigabit serial protocol.

  3. 25ms delay. i expect 1ms for all transfers

the program is working if i attach to the thread that handles the transfers with ptrace, without doing anything besides allowing the syscall to finish normally.

the delay is always 25ms, there is a control transfer being repeated. it’s always happening inside WriteToPipeIn. The control transfer is using the request type 0xc0. That should really be enough to diagnose the problem if you have access to the source code.

Since the delay is always 25ms, there just has to be an explicit delay of 25ms somewhere in the function, between those 2 ioctls that do the 0xc0 control transfer.

Please see SetBTPipePollingInterval:
https://library.opalkelly.com/library/FrontPanelAPI/classokCFrontPanel.html#ab5441f0e58ebcee9b14d6a6e935207f1

yes, that’s it. thanks.

on the other hand, attaching to the process with ptrace increases my host->device bandwidth from 30mbyte/s (polling=1ms) to at least 40mbyte/s (polling setting does not matter).

It seems that the delay that ptrace introduces is hitting a sweet spot of not delaying the ack transfers too much, but also preventing any need for retransmission of ack transfers.

I think there would be an opportunity there to improve bandwidth by more than 20% with an adaptive timer that tries to guess when would be the best time to try and acknowledge a transfer (possibly taking into account transfer size) so as to avoid more polling. Or by simply allowing to set the polling interval to something less than 1ms. I think this limitation is an API flaw. bandwidth can be precious.

anyway, thanks for your help. i will try to use LD_LIBRARY_PATH overloading to fine-tune my ioctl timing to avoid ptrace cpu and context switching overhead.

The best way to achieve high bandwidth is with larger transfers. A well-designed system can easily achieve 350 MB/s or better depending on the host. This almost always requires use of DRAM as an elastic buffer.

1ms is an arbitrary limitation. 5MT happen on the bus in that time span.

a 100mhz clock can pull in 100000 32bit words from the USB chip in 1ms.

either you wait 1ms or you execute 10 polling requests. the bus is not being used for anything else.

of course you could never reach the kind of utilization that huge buffers make possible. But if there is 20% to be harvested, then it should be harvested.