Why is timeout so bad, how to avoid it and how to recover from it?

Fcona · April 12, 2023, 8:59am

Hello,
I think the title is self explaining, but anyway, I would like to make a review of what is known about the “Timeout” when using BTPipes (I use XEM7310).
The point is that the topic has been addressed in several threads (see e.g. BlockThrottledPipe timeout issue, Timeout behavior, Stream data from XEM7310), but I think I’m still missing something, so I would like some very clear answers to these questions:

Why is timeout so bad when using BTPipes?
What’s the best way to avoid timeout conditions?
2a) The common suggestion is to start a transfer “only when you know it will complete”, but what’s the best way to know it will complete?
2b) Is timeout unavoidable in case the total amount of data to be transferred is not an integer multiple of the transfer size, i.e. the last transaction cannot be completed for good (this is very common in case of streaming devices, that can stop the data generation at any point)?
How do I make my system recover if a timeout condition happens? Is there an error flush method, a soft reset or something similar that avoids a complete power cycle?

Thanks for any help in figuring all of this out!

Filippo

okSupport · April 12, 2023, 9:44pm

Timeout is a condition that isn’t generally appropriate for USB devices. In a well-behaved, well-designed system, it simply shouldn’t happen. If it does, it’s representative of a “catastrophic” issue. The operating system can handle it, because any good operating system should… but it might not be pretty.

USB is not like ethernet. There shouldn’t be connectivity issues. There shouldn’t be unexpected delays. There aren’t reroutes due to outages.

This is not to say that a system involving USB or even a system involving BTPipes cannot time out. It just means that the protocols implement at the USB layer and the BTPipe layer shouldn’t. There can be layered implementations on top of this that can expect systemic timeouts, if appropriate.

The best way to avoid a timeout is to only start a transfer when you know it will complete. That is, ask the device “do you have 1,000 bytes for me to read?” If the answer is yes, then read 1,000 bytes. Or, ask the device “How many bytes do you have for me to read?” And then read the answer.

This is, actually, exactly how streaming devices operate when the stream is transferred over a block device or a packetized system. Think about how “streaming” a YouTube video works. It’s not like the bits flow like a river from Netflix to your television. The source sends a “stream” of packets (say 1024 bytes at a time). The packets arrive, are buffered, and “streamed” to the screen (note that even in this case, they still arrive to the screen in packets – one frame at a time).

If the buffer fills at the TV, back pressure is applied via the protocol to Netflix to tell it to slow down. Netflix doesn’t just keep sending you “The Real Housewives” hoping that you’ll eventually catch up. If the buffer empties, then the protocol has some fallback mechanisms.

But here’s where we get into some details… There are layers to these protocols and lots of devices involved. At some point, data “streams” into a memory chip on the TV. But this particular transfer isn’t allowed the convenience of a timeout. It either happens or it doesn’t. The timeout is handled at a higher protocol layer. You can’t split the block transfer to the memory any more than you can split a byte half way through the transfer through a UART.

How do you make a system that can recover… First, we should note that not all systems are engineered the same. Some don’t need to handle this sort of situation at all. This is why there are so many protocols out there.

But here’s one option… Make your protocol always operate on blocks of 65,536. Let the first two bytes of this packet inform the recipient of how many bytes are valid in the block. The remaining 65,534 are payload. Here’s how this would work…

GreedyConsumer: Give me a block! NOW!
Producer: Ok. Here you go. I filled this with 65,534 bytes. Payload = 65,534 bytes.
GreedyConsumer: Give me a block! NOW!
Producer: Ok. Here you go. I filled this with 1,000 bytes. Payload = 1,000 bytes + 64,534 bytes of garbage. (or 0’s)
GreedyConsumer: Give me a block! NOW!
Producer: Ok. Here you go. I filled this with 0 bytes. Payload = 0 bytes + 65,534 bytes of garbage.
GreedyConsumer: Woah. Did he just time me out? Brutal.

Now the GreedyConsumer tells HER greedy consumer that there was a timeout. But the underlying protocol that Producer and GreedyConsumer have been operating on goes on merrily without crashing.

Fcona · April 13, 2023, 7:38am

Dear support,
thanks for the quick and detailed reply!
Just a couple more clarifications.
I totally agree that the system receiving the data should inquire the system sending it about how much data is currently available before performing a transfer request. However, I wasn’t able to find in the API a method like “int getAvailableData()”, so I was wondering if I’m just not spotting this method or if I have to rely on some other strategies, like the 2 “length bytes” at the beginning of each block as you suggested, or maybe sharing a lower bound of the available bytes with a wireout.
Secondly, I want to make sure I understood correctly your proposed solution. Basically your suggestion to always provide data on each transaction eventually filling the gaps with garbage is a way to replace the hard timeout (Error code -2) with a soft timeout, in which the software realizes that no data is available but without triggering the -2 error code ever again. Is my understanding correct?

Thanks again,

Filippo

okSupport · April 13, 2023, 8:04pm

There is not a “getAvailableData” method - this would need to be implemented by your architecture using the other low-level components such as wires and triggers.

And yes, that is one possible approach that could be taken to avoid timeouts within the FrontPanel API.

Fcona · April 14, 2023, 8:21am

Ok, thank you very much for your time and support!

Best regards,

Filippo