Two critical question about block pipe out and its problem


#1

Hello!
my support team.

I am using xem6310 board with SDK.
Everything seems ok with all my project thanks to your product and support.
However, this time, I encountered seemingly quite serious limitation on the performance of your product.
it is all about continuous data acquisition process with block pipeout module.
All I need is their performance to be worked as fast as to receive all sensor data generated with the speed of up to 100 MHz.
From my several tests, it seems not supporting to guarantee the data continuity by the loss on fifo memory on fpga side.
By all means, I came to find data loss between the gap on C# while statement for unlimited loop.

To make it short, my question is two as below.

  1. Is there any strategy to receive all sampled data with speed of 100 MHz or less on C# SDK side.
    (I am using while statement with all strategies like multi-threading and multi-core parallelism, but the library “readfromblockpipeout(…)” has quite large time gap between the while statement iterations.)

  2. Once the same library function “readfromblockpipeout” (or just readfrompipeout) seems not recovered from Timeout error. So, everytime it occured, we need to re-upload the fpga bit file all over again.

These two problem is quite big barrior with me.

I just wonder if it is possible or not

thanks for your support every time.


#2

Yes, the strategy is to two-fold:

  1. Use the DDR2 memory on the XEM6310 as a buffer memory. This will allow data to be buffered during inevitable gaps that occur on non-realtime systems such as Windows, Mac OS X, and Linux.

  2. Use long transfers (> 1 MB) to assure maximum throughput.

Generally, we don’t recommend using Block Throttled Pipes unless your application specifically requires it. An example of something that would require the use of BTPipes is one where low-latency obviates the use of a larger buffer memory.


#3

Ok, thanks for your comment.
Your kind reply is giving us big and helpful idea to get through this.

However, there are some following questions about your reply.

  1. From your advice, I am a little uncertain that, even if I am taking fifo block of (not DDR2 yet) memory before blockpipeout vhdl module, the fundamental problem of data loss, in my opinion, comes from the not-enough data-transfer-speed from okHost to PC, compared to sensor data generation side(100MHz), during the time of block-pipeout on PC side at a time.
    (Hope I be hitting the core point of this problem, if not, sorry in advance)

So, is your point indicating that DDR2 memory interface will be able to allow users to give faster clock speed to fpga from PC through okHost, I mean, much higher than around 103MHz which is given on your official vhdl samples.

  1. Could you please give us a little description about why Timeout event is not restored from freeze by latent situation in case, though, when sensor side is back again on next iteration from that latent situation? Clearly I checked, on next iteration, sensor gives data fast enough to fifo and fifo is ready, but still the block pipeout gives Timeout error message to PC. We are already trying to use long transfer with maximum throughput.
    We are testing the worst case of sensor module, it all for just in case…
    So, our best need from you is for us to know the way how to recover from Timeout and the freeze phenomenon.

We are always appreciated with your help at all time.
Best regards.


#4
  1. Transfer rate from okHost to PC can be measured using the PipeTest sample. See how your setup performs. Most modern setups should get 300 MB/s or better. We typically see 340 - 360 MB/s.

  2. You keep talking in MHz. The host interface is, indeed, at 100.8 MHz, but it is 32-bits (4 bytes) wide. When discussing throughput, it is customary to define in terms of bits or bytes. If you want to keep in MHz, we just need to know how many bits or bytes per cycle your sensor generates.


#5

Thanks for your nice comment.

Ok, I apologize that I didn’t mention that it, MHz measurement, is going to be multiplied by 4Bytes (cuz the basic wide is 32-bit). We are using 1 sample data at every 32bit wide.
(To inform you we also consider that multiple sample data can be compacted on 32bit wide in any case though.)

Nevertheless, the practical throughput on recieving procedure from okHost to PC is seemingly 300 MB/s or around.
Theoretically, in my idea, since okHost clock speed is 100.8MHz, the throughput is expected to reach to 403.2MB/s. So, our sensor data speed is originally designed to be worked at about 400MB/s, cuz it is sampled at speed of 100MHz.

All our data loss seem to caused by the speed gap between 400MB/s on sensor and practical throughput of 300 MB/s on okHost side.

So, based on your advice, I wonder if we are using DDR2 memory interface, is it reasonable to use much faster throughput on the side of okHost and PC?

Because DDR2 memory might allow us to use faster clock as your official document said, “300MHz clock seems to be available”(on XEM6310-UM.pdf). So is that setting for TS_okHostClk on ucf file setting?

ex)
NET “okUH[0]” TNM_NET = “okHostClk”;
TIMESPEC “TS_okHostClk” = PERIOD “okHostClk” 9.92 ns HIGH 50%;

I just need a little comment about how to assign faster clock to receiving clock pin, okHostClk.

Best Regards
Sincerely


#6

Simply put, you will not be able to achieve 400 MB/s.