DMA for PCI Express

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to an ultra scale quick take video featuring Xilinx is DMA subsystem for PCI Express starting with Lovato 2016 dot one Xilinx will be making available a full-featured DMA specifically built to work with PCI Express in the video today we'll go over a very quick look at typical PCIe DMA use cases we'll talk about the features of the DMA subsystem for PCIe we'll walk through of a Votto implementation and finally we'll put it in hardware and see how it works so a typical DMA operation for PCI Express is really all about moving data between the system memory that our host has access to and our PCI Express endpoint now oftentimes what we see our FPGAs are used for data acquisition or possibly acceleration and what we want to do is move data back and forth between that D that endpoint and the system memory so the DMA is available with a large number of devices starting with the vertex 7 devices that are capable of gin 3 PCI Express it is also available for context and vertex ultra scale devices as well as the entire ultra scale plus family now it works with what we call the pcie 3.0 block and beyond and that is what is in all of these devices that are highlighted now the other nice thing about this DMA is that the cost is absolutely right as it is free to use and is widely available in lovato 2016 dot 1 when using the DMA the first choice that needs to be made is which type of interface are you going to use are you doing an ax I for stream interface which means that you're typically doing RTL design or are you doing an ax I for memory map interface which means that you're doing some kind of a X I subsystem if you choose a stream interface you'll have the option to have as many interfaces as you have as the number of channels that you've specified if you choose the X I for memory map interface you will get one interface regardless of the number of channels the DMA engine is a scatter gather block DMA engine it has up to four read and four right channels possible and it has either 64 128 or 256 bit data bus operation up to 250 megahertz and that is going to depend upon the link width and speed that you've selected for your PCIe design now in addition to those features that we just mentioned some additional features are it has 256 megabyte transfer size it has unlimited descriptor size so you can have as many descriptors as you would like it has an option that does contiguous descriptor prefetch this means that if descriptors sit next to each other in local memory they will be read with a single read instead of multiple reads for one per descriptor and finally it has MSI MSI X and int X interrupts now there are a couple sideband interfaces that are useful for designers the first one we're going to look at here is called the ax I for light master interface and what this does is it allows single D word PCIe reads and writes to be sent over the PCIe link and they will get translated into an ax I memory map and these are any transactions that hit bar 0 we also have a control light slave interface so again this is an ax I for memory map interface it's a light interface so it supports single D word reads and writes to look at control and status registers and then finally we have the ax I for bypass master and this master is for high performance applications where maybe the FPGA in this case is the target of a DMA transfer and we need to be able to accept really high performance reads and writes and that is mapped to bar 2 so without further ado let's go ahead and get started and show you how to make this in Vlado 2016 dot 1 so the first thing we need to do is create a new project go ahead and give it a name that you would like to have and go ahead and advance forward until you get to the part selection local here select the board's tab and this will give us easy access to the KCU 105 now while this is generating what we could do one of the things that we could do is we could go ahead and generate a IP core from the IP catalog for the DMA engine and then right-click on that ex-cia file that gets generated and then say create example design instead I'm going to show you another way to do it which is just to create a block diagram to show you really how easy it is to get up and running so go ahead and create that block diagram give it any name you want and then select the boards tab here we're going to select the dip switches and the rotary switch is just GPIO that we can access using the ax I light master interface that we talked about earlier once you do that select PCI Express it gives us an option here we want the DMA we'll go ahead and select the Lane width as make it a by 8 and we'll double check on that in just a minute but go ahead and generate that design and once that comes up what we want to do is go ahead and select ddr4 memory because what we're going to do in this design is we're going to have the DMA connected to ddr4 memory as well as the GPIO all right once we have those in place we'll go ahead and run block automation so here we can go ahead and select some different things that are available different options that are available to us for the dma engine sort of a quick access we'll go ahead and set one channel for both upstream and downstream or read and write in this case and let that generate and then once that's done we can go ahead and look into it and we can push in and we'll see that the different options that we set in that quick menu have indeed been transferred here so we see we have the X eye light bar it translates to zero we're not going to use the the bypass interface the high-performance interface and here we see we've got one read and one write channel just as we specified so we'll go ahead and cancel out of that now let's go ahead and run connection automation this is going to connect up a lot of our things automatically and really simplifies the the process so once we get that populated here and reach it relayed out you can see I've got some nice color coding on the clock and reset nets if you want to know how to do that you just go to the options here and you can select colors and then you can change those so that really helps out and sometimes seeing what's going on in your design so now we've got you know with with those few simple steps we've got our design pretty much complete here we'll go ahead and look at the address map and we see that the DDR memory is mapped to address 8008 million hex and we're going to go ahead and add our GPIO just to address 0 and that will match up because we left our address translation register for that ax eye light interface to zero so anything that hits that bar will go to address 0 or whatever offset has been set to all right so we will go ahead now let's go ahead and save our block diagram and let's go ahead and validate it make sure that it looks ok once we validated it and it looks fine we get no critical errors well you can go ahead and exit out of here and that's really all it took to get our Hardware design in place now we do need to add the top-level RTL file to this project and in this case what we do is we'll go up to our block design right click on it and go ahead and create the HDL wrapper because we don't want any customizations made to this wrapper we'll go ahead and let Vivaro manage it and it will wrap our design up into that very log file and now all we have to do is go ahead and generate a bit stream it's as easy as that so we go ahead and speed this up so we don't have to wait through the entire process and now we'll go ahead and open the design just to make sure that timings been met and everything looks good so you can see there's actually a couple check timing items here that are highlighted down on the bottom left if we go and open those up we can see that there's just no input delays and no output delays on the dip switches the reset if we wanted to do some tags or if we cared about timing we could add constraints and rerun the design for this case we'll go ahead and leave them as is they're not critical so now I have this design hooked up to a KCU 105 so we'll go ahead and connect to it and once we're there we'll go ahead and reprogram it with this new design alright now that it's complete we're to switch over we're going to switch windows here well actually before we do that just to give you an idea we can look at some of the system monitor dashboards we can see what the temperature is if we had some if we had some chip scope in here we could bring up those waveforms but for now we'll leave it as is and the next thing we'll do is we're going to switch over and take a look at the Linux system that we have running that has the KCU 105 and closed in it all right so here we are on our Linux box and you can see that I've already pulled up answer record six five four four four this is where we keep our DMA driver and software so now this is an example driver it comes with some documentation there's a zip file that you can download what we're expecting is customers to be able to take this driver and integrate it into their complete driver and application as they choose alright so I've downloaded the zip file I'm going to go ahead and make a directory here called DM a driver we will copy over that zip file from that directory and unzip it and once we have it unzipped we can go ahead and push in there and at the very top level we'll see there's a readme file there now I'm not going to go through everything but you can see there's a usage section we're just going to follow that usage section in this demo and see if we can get our design up and running with with data being transferred back and forth between our design and the ddr memory our application i guess in the ddr memory alright so we go ahead and push into the driver directory we type make to compile the driver we go up one directory two tests we go here we go ahead and compile our tests now in this case the dot sh files the script files were not set as executable when i unzipped them so i'm going to go in and just make sure that all of the SH files are set to executable we'll go ahead and copy over the some of the some of the files recruit some of the files required for this driver to operate into the rules D directory I've already done it once so but we'll go ahead and just say overwrite them and then we can go ahead and run the load driver script and we see that it worked correctly now in the example design if you remember we set that offset to 8 million hex of where the DDR memory is so in the script file what we see is the run test is actually going to run something called a DMA memory map and what we need to do is we need to change the offset because right now the offset is set up to go to address 0 and this address offset that we're specifying is actually the ax I address offset and so what we're going to do is we're going to go ahead and just take the address offset and we're going to add eight million hex to it or as it turns out in decimal it's some sort of funky number that will go ahead and get put in here as you'll see in the on the screen and we need to do this in both the right location where we write the data as down as where we do what we call a read or a card to host transfer all right so once we have that in place we can go ahead and run our test and we see that our test runs were able to transfer data back and forth here between the system memory and the ddr4 so that completes our walkthrough remember the PCI Express DMA IP subsystem is available in Vivaro 2016 dot one we hope you'll check it out for your next design
Info
Channel: XilinxInc
Views: 42,930
Rating: 4.8461537 out of 5
Keywords: PCI Express, PCIe, DMA, IP Subsystem, UltraScale, UltraScale+, Virtex-7, driver
Id: TzzzM97L4HI
Channel Id: undefined
Length: 13min 57sec (837 seconds)
Published: Thu May 26 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.