Wire|trigger to pipe crossover

gilherbeck · February 7, 2008, 7:35pm

I have been reading the FrontPanel UM and have noticed the latency of the wire|trigger transactions is roughly 1 millisecond (plus or minus).

One type of access that I need to the device is basically programmed IO. I need to read and write several 16-bit registers each millisecond. It varies, but for example lets just say I need to perform 8 register writes and 64 register reads each millisecond. All registers are 16-bits.

I understand that the latency on the USB is not gauranteed, and that this really won’t work. But I am actually just using the FPGA to simulate the real design. So in my “simulation” I will gate the clock to turn it off and allow the software to keep up. But I would like my “simulation” to run as fast as possible.

I understand that the pipes are faster for bulk transfer, but that they have more setup overhead.

So here is my question:

Has someone determined the crossover point (in bytes for example) from wire|trigger to pipe?

In other words, can we say that for less than “X” bytes wire|trigger is the fastest due to less overhead? And for greater than “X” bytes pipe is the fastest due to burst?

Cheers,
Gil

okSupport · February 7, 2008, 10:54pm

Gil–

We at Opal Kelly have not done such a test. Primarily, this is because we view them as fundamentally different ways to communicate with the hardware design. But also because the results will vary depending on platform (Mac, PC, Linux) as well as specific machine.

One thing to note about wires and triggers is that, when you do an UpdateWireIns, UpdateWireOuts, and UpdateTriggerOuts (but NOT ActivateTriggerIn), ALL wires or triggers are updated with the same transaction.

You might achieve the closest results to what you’re looking for by using pipes. Use the “block throttled” pipes and configure it to receive one block at a time – the FPGA would then “throttle” it to receive one block every millisecond. Since the pipe transfers are blocking on the PC side, you’ll want to set these up in small (3-5 seconds?) chunks.

You could also set up the throttling to be a little faster than 1ms (say 0.9ms) and buy you some time during each transfer for setup time.

gilherbeck · February 7, 2008, 11:35pm

Thanks. I did notice that aspect of the three endpoints that you listed. But at about 1ms per transaction I’m still not sure if I can ever use a wire or trigger. I’ll continue to read the UM and pay close attention to the pipes.

I noticed the block throttled pipes and I am wondering if I can have multiple concurrent block throttled pipes? Or does each complete transfer need to finish before another transfer is setup?

I could use three concurrent block throttled pipes in my application. Two PC–>FPGA and one FPGA–>PC as follows:

PC–>FPGA “signal” data
Something on the order of 1GB that is sent to the FPGA for processing - the FPGA would throttle this in

PC–>FPGA control register writes
This would be the register writes to control the processing. It would be a small number of 16-bit words each millisecond and could be throttled by the FPGA.

FPGA–>PC result and status register reads
This would be the register reads from the FPGA to monitor the status and evaluate the results. It would be a small number of 16-bit words each millisecond.

I am assuming that I can’t operate these pipes “concurrently”. But rather that each pipe needs to complete its transaction before another transaction is setup on another pipe. If I can operate multiple, concurrent, throttled, pipes, that would be a very elegant solution.

Cheers,
Gil

okSupport · February 8, 2008, 12:32am

Nope, you cannot have multiple concurrent pipes in this manner.

I should mention that USB was not really designed to be as low-latency as you’re looking for. It was more designed to move bulk data with some provisions (isochronous mode) for guaranteed throughput.

gilherbeck · February 8, 2008, 12:50am

No problem. I knew in advance that this was a square peg round whole situation. But your footprint compels me to work around the issues.

I should point out that I am coming from a full size PCI board as my current “simulation” platform. Honestly, I was not happy with the latency there either :). But in any case I will try to squeeze out whatever performance I can get from any platform.

That PCI board has some performance advantages in my situation, but it is tied to my desktop machine and is therefore not portable. My new XEM-3050 will be able to go with me on my next trip to Japan and I will be able to continue with R&D. It’s a good trade off overall.