Pipe Operation Questions

sonoranbill · February 5, 2006, 8:47pm

I have been trying to get the buffered pipes operation working for a data acquisition application. I now note that the example pipeTest routines do not test the buffered pipe operation very thoroughly, and the pipe operations have no flow control I/O. Additionally it appears that the ActivateTriggerIn takes a while to complete also. After hooking up a counter to the okBufferedPipeOut on the hardware side which clock is clk1, the same as the BPO ep_clk, I have observed the following:

if I run the counter clock at 5MHz I start getting double values after ~5000 counts. It appears that the FIFO Empty has occured, but the pipe just continues to send whatever is there…

if I run the counter at 50MHz I get a big old glitch in the data after 2048 counts. It appears that a FIFO Full has occured, while the ActivateTriggerIn is happening, and the transfer hasn’t started yet…

For my current application to work, I need to take data in chunks of 1MBytes, in less then 0.5 seconds, once per second, with a clock varying between 5MHz and 50MHz. Additionally, there will need to be external trigger capability. Do you have a workaround for these problems such as code for another type of pipe which uses the onboard FX2 FIFO’s in slave mode to take advantage of FX2 flow control, and maybe a modified ReadFromPipeOut which will handle triggering by itself ?

Thanks,

Bill

okSupport · February 5, 2006, 9:49pm

Bill-

We have worked around these issues in a couple different ways:

Polling – Using the FPGA BRAMs as buffers, we can poll the status frequently enough to determine how much to read, then read that much. This incurs the overhead of polling but has worked quite effectively. I don’t have any solid numbers on the bandwidth achievable in this situation since we were only trying to accomplish about 500 kB/s.
SDRAM buffering – With an external SDRAM (or the XEM3010), the SDRAM can quite easily keep pace with the FPGA->PC transfer rates. We wait until a reasonably large block is available and pull it out. This technique can work at the maximum transfer rate which is about 19 MB/s from FPGA to PC. The same idea works at the max transfer rate in the other direction (about 32 MB/s).

The latter technique is used in our RAMTester sample app.

sonoranbill · February 5, 2006, 10:06pm

Is there some reason that you have not developed code to utilize the on-board fx-2 fifo’s? How much time and effort would this be? Is there interest?

Certainly the polling method should work at 5MHz, pretty doubtful for the 50MHz. And the SDRAM would be just an extended case of the polling, and tie up lots of I/O… guess I’ll have to do the poll for now… do you have a good example?

Thanks,

Bill

PS – heading out for the Super Bowl

sonoranbill · February 7, 2006, 7:33pm

Hello,

In keeping with trying to get some performance out of the workaround, Can you clarify the operation of “ActivateTriggerIn().” as refered to in the FE Manual: “Trigger Ins are not transferred immediately with the call to ActivateTriggerIn().” What will transfer imediately, ie what is the best single bit transfer mechanism (minimum latency)?

Also can you please respond to my last post?

Thanks,

Bill

okSupport · February 7, 2006, 11:38pm

In answer to your first question – there are tradeoffs with any design that affect the outcome of the product. After evaluating the tradeoffs involved in implementing what we wanted to implement with FrontPanel we chose not to use the slave FIFO as you suggest. In the end, we have a very simple but richly-capable interface between FPGA and PC that has been effective for most applications we have thrown at it.

Actually, Trigger Ins are transferred immediately with a call to ActivateTriggerIn. That documentation is flawed and will be fixed.

okSupport · February 8, 2006, 1:17am

I should also clarify that the current design does use the FX2 FIFOs.

sonoranbill · February 8, 2006, 2:09pm

If the Pipes are using the FX-2 FIFO’s, how come there is no outbound flow control happening when the pipe empties out? Is there a bug, or maybe a minor change that can be done to get this capability? Certainly the unbuffered output pipe doesn’t have a signal to gate pipe out reads, and the buffered pipe out seems to ignore it…

Thanks,

Bill

okSupport · February 8, 2006, 9:00pm

We are not using the flow control of the FX2 FIFOs. Unfortunately, it is not a minor change to do this.

sonoranbill · February 8, 2006, 9:06pm

I think I can manage for now, however it would make a big differance for my other apps if you could get the feature working…

Thanks,

Bill

sonoranbill · February 10, 2006, 7:45pm

Well I tried it, and it doesn’t look very promising.

I built an 8K deep FIFO w/ coregen, and put a programmed full at 2048. I fed this to a wireout that I monitor in a for loop, and everytime the FIFO gets full I ask for 8K bytes (=4K shorts) of data. Well that appears to have solved the underflow problem by slowing down the transfers, but it is way too throttled back. I start getting FIFO overflows about halfway through the 1MB transfer. I’m still hacking at it… not what I wanted to spend my time doing… creating exotic work-arounds…

You sure you can’t get the built-in FX-2 flow control to work?

Bill

sonoranbill · February 12, 2006, 1:00am

For anyone who is interested, I finally achieved repeatable sustained performance of ~8MB/s. I achieved this with an 8K by 16 FIFO, with wr_count output, almost empty, and first value drop through.

The fifo read clock had to be set at ti_clk to keep everything happy, and ep_write had to be connected to bpo1_write_connect s…

Bill

sonoranbill · May 1, 2006, 11:38pm

Hi you guys…

Well you keep peeling the onion… do you have a solution for control interupts from other devices? I’ve got data dropouts happening after other devices on the bus are opened and talked too… closing these device drivers doesn’t help… some times the interupts are so long or so often, it just about closes down the bus… And how far off are the isochronous pipes? Also, do you have BIG FIFO’s developed for the 3010 RAM? (note even this solution could fail in high traffic???)

Thanks–

okSupport · May 2, 2006, 12:10am

Hi Bill-

We have several projects now that are using the XEM3010 with the SDRAM as a larger FIFO-type memory. We have developed these using the stock FrontPanel API as well as the SDRAM controller we have provided for free – with tweaks depending on the project.

You can certainly make a distributed RAM FIFO using the Xilinx FIFO Core Generator and just tying it to the PipeIn / PipeOut. In fact, since ISE 8.1i WebPack now includes the Core generator, this is the recommended way to build a buffered pipe. The buffered pipes we included just used the FIFO HDL provided by Xilinx because their Core Generator was not available in WebPack. The Core Generator is much more flexible than our buffered pipes.

In particular, you can use the programmable thresholds to do better flow control and can build a FIFO tuned to the size you need.

Unfortunately, we can’t do anything about the other devices on your bus. A sledgehammer might clear up any traffic problems you have.

sonoranbill · May 2, 2006, 12:27am

Hello,

So, are you discarding all plans for isochronous pipes? 'Cause as I see it, you can’t guarantee ANY transfer rate using bulk endpoints… At least not the way it presently is implemented… Ok, for disk drives, but terrible for streams like video, or any RT type work… and IS affecting my customer right now…

Bill –

okSupport · May 2, 2006, 3:00am

We have not discarded any plans for isochronous. But at the moment, we do not have any plans for it.

Unfortunately, USB does not provide any guarantees for bulk transfers other than that they will get there and they will be error-free. USB can guarantee bandwidth for isochronous transfers, but will not guarantee they will get there.

If you can share more details about your application, we may be able to offer some suggestions on design choices you may not have considered.

Remember that Windows is not a real-time OS. Even if your interconnect (PCI-express, USB, firewire, etc) can handle the streaming, the OS may not. This is not a new problem. It is why audio card drops samples, cameras drop frames, and CD/DVD burners make coasters. The solution has always been to add additional buffer memory and sacrifice latency for reliability.

ciejas · May 3, 2006, 7:33pm

I have a question concerning efficiency as well, I was wandering if it is possible to run two pipe transmissions in two separate threads? I think that will not work but i have to be sure…

okSupport · May 3, 2006, 9:56pm

No, you may not do this. If you are creating a multithreaded application, you must ensure that accesses to your device (via the FrontPanel API) are mutexed in some way.

bwalker · September 4, 2006, 2:39am

I just wanted to concur that it would be outstanding to be able to hold off the bulk data transfer until we have “valid” data on the XEM side. Right now I have to stream constantly, good, bad, and ugly data. It would be a whole lot more efficient if I could only stream over the good stuff. My understanding of the Cypress part is that this should be conceptually doable.

Thanks.

sonoranbill · September 5, 2006, 5:33pm

Just for everyones info, after much work, I can only achieve reliable 128KB sustained transfers at ~1.6MB per sec. And don’t eevn try using a USB mouse or KB or disk drive on the same port at the same time… the result is what we call “data dropouts”, where the on-board FIFO (I’m using the max… 16K words) overflows… because the USB2 can’t unload it fast enough. My surmise, using a bus snooper, is that these devices start emiting a lot of control packets, which have a higher priority than bulk packets. Anyway, just get them on a different hub/port and they are not as bothersome… However I wish you guys would give us the Isochronous Packets, for those of us that don’t care if an ocasional bit is corrupted, but would like that 22MB/s throughput…

Bill

okSupport · September 5, 2006, 6:10pm

Bill-

Are you using the SDRAM for buffering? Our designs achieve well over 15 MB/s with flow control. But there are many factors that need to be considered. If you need to use only small bursts (for example, when low-latency is a requirement), then your bandwidth will suffer.