ZYNQ AXI Interfaces Part 1 (Lesson 3)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi how are you doing I am Muhammad Sadri postdoctoral researcher at micro electronics systems design Research Group tu Kaiserslautern and this is one of our educational zinc training videos and it is titled XY interfaces since AXA interfaces in the zinc device are of a very high level of importance we have dedicated several videos just to talk about dioxide interfaces of this inside the zinc device specifically the axial interfaces through which the PS and the PL talk to each other and we will describe each of these interfaces in detail and we will talk about how someone can use them to transfer data between the PS and appeal' inside dissolving zinc device so so here is the block diagram that we saw together in previous videos it is an overall brief block diagram of the Xilinx link device and as we described and as the block diagram shows the zinc architecture is divided into two parts the programmable logic and the programmable system and today what we are going to begin to talk about are these interfaces in fact the interfaces which allow you to transfer the data from the FPGA part of the Xilinx link device to the CPU part of the Xilinx link device and vice versa before going into further details let's make sure that we briefly know what is X I and what is an axial interface what is an X I master and what is an X is slave first into the world as you can see we have chips containing a large number of different types of IP blocks and these IP block need to talk to each other need to transfer data between each other towards this we need a kind of a standard based on which all of these IP blocks can talk to each other so every IP block cannot introduce its own way of talking to the outside world all of the IP blocks that everybody designs should obey one a specific set of rules for receiving data from outside world and transferring data to the outside world towards ease we have seen several standards appearing in the design community by big companies to solve this problem X I is one of them very basically X I is a kind of method is an S on door through which different components inside your multi-component chip can talk to each other every X I link contains necessarily two parts one X Y master and another X is slave the XO a master is the one who initiates transactions it begins read or write transactions the axil slave is the one who responds to the transactions initiated by the XO master the transactions can be either read or write they are always initiated by the aksoy master however for the read transactions we have the data data flowing from the axial slave toward the XY master for the right transaction we have the data flowing from the axial master towards a curiously in both of the cases the one who is initiating and beginning the transaction is the axial master but for reads and writes we have different data directions now we have two types of axial interfaces XY memory mapped interfaces and axial stream interfaces the axial stream interface is basically a reduced or very simplified version of X our memory map interface if we try to look in more detail at 1xo memory mapped connection what shall we see we see different set of signals for read transactions and another set of signals for write transactions for read transactions we basically have these two groups of signals involved first then the axial master wants to initiate a read transaction to an axial slave it needs fares to provide the slave with the address from which it wants to perform the read operation so basically we will see that there are a set of signals that we call read address channel going from axial master to Dyke seriously in fact your instead of signals between XY master and I saw a slave dedicated to transfer the read address to the X is Li then the axle is slave answers with the required data to the aksoy master so V have a read data channel through the read data channel the data in the specified address by the aksoy master will be transferred today to the aksoy master to the requesting component then for write transactions again we have another set of signals first obviously the I saw a master needs to provide the axial slave with the required right address second the axial master should send the data that it wants to be written to the specific address at the slave site to the exile slave and third the axle slave should respond to the XO master if the data is transferred correctly is written correctly or not so we have a right response channel each memory mapped acts I interface or each memory mapped aksoy connection has these five channels now let's have a look at what is the contents what is the internal signals of one of these channels in slightly more detail so I take the right data Channel and I go and have a more detailed look at the contents of this channel if we look we can see that the basic signals that are creating a channel are the following first from the axial slave there is a ready signal coming to the aksoy master this ready signal is indicating to the aksoy master that hey I am ready you can begin your operation you can begin answering the data second the axle master sends a valid signal to the axle slave indicating that the data that I am sending to you is now a valid data so whenever the axle master is transferring a valid data to the axle slave it also enables the valid signal practically these three signals are the main signals which create each of these axle channels there are other set of signals inside each acts acts are channel as well for example each time that you want to transfer a set of data you indicate the size of the data or the width of the data which is being transferred you can transfer data in single beats each time for example one word of data or one for example one byte of data one sixteen bits of data or you can transfer data in bursts each time you can transfer for example 256 bits of data continuously without interruption then in that case we need the Bears links and then sometimes for a specific set of data that you want to transfer between one axle master and the slave interfaces you need some quality of service measures for example some set of data are more important than the rest of the others and you need to somehow indicate this then we have a set of QoS signals passing from the axle master to the axle slave indicating these points there are also a set of custom signals that the user can drive them by by its own wish in fact programmable the programmer the hardware designer can decide what he wants to answer on this set of signals but the three signal that I'm showing you here are the main signals building each of the channels now in addition to acts or memory mapped interfaces we have also excited stream interfaces or actual stream connections in many cases in the hardware that we implement on a 6 on FPGAs and different types of logic we have blocks that receive the data perform a kind of processing on the data and then pass the data to the next step for example the data is coming from an A to G and the data goes into a filter and then the data gets passed to the next module we have the stream of data flowing inside our Hardware in these cases when a module wants to transfer data to another module it doesn't need to provide the address for that specific amount of data and furthermore the direction of data is always unique the direction is always from this module to the next module it does never happen that you need to transfer data from this module to this module so practically what is happening this module in an axial stream interface is always writing to that module and no address for these data transfers are required this builds the principles of AX is stream interfaces and basically if you look at one axial stream interface you can see that it's the right channel so if I look at one X Y memory map interface and I extract its right channel and I look at each signals basically you are seeing and X is stream interface so again the importance signals which create an axial stream interface are the followings the ready signal from the slave to the master and the valid and data signal from the master to the slave look at this naming convention for all of the streaming components the name of the ports usually begin with XY s then if the port is an ax slave port you have underlined s and if the port is a master port we have underlined M for all of the memory mapped interfaces the name of the port begins usually with AK so I saw underlined the rest of the name for ax is dreaming in their faces it begins with X is underlined the rest of the name you will see as we progress through our training videos you will see that this naming a standard is obeyed in most of the blocks and modules that we will use for our designs now having these definitions for axial interfaces in mind we will go ahead and have a relook at the architecture at the structure of the darlings Inc device now let's go back to the architecture that we were showing for resolving zinc and have a more detailed look at the actual interfaces that we have between the PS and the PL so we are going to focus only and only on these axial interfaces and first at the first stage I want to have a look at these ports here HP 0hp 1 HP 2 fe 3 which we for high-performance aksoy interfaces these high-performance axle interfaces are slaves for the logic that you develop inside the pl so practically they are acts I slave ports of the PS all of the ports here as I described before they are memory mapped port there is no streaming port here and in future videos I will describe to you what you should do to translate one stream axle interfaces to one memory mapped axle interface but let's for now focus on these acts are memory mapped interfaces and here for these HP ports we have each of the ports with the width of 64 bits the logic the hardware accelerator and the block that you develop here should contain XO a master plug or XO master interface and through this axle master interface you can get connected to this axial slave port and this axial slave port redirects you to the address space of the PS so the logic that you have here can initiate read and write transactions to the DRAM memory that you have here or to the ocm to the on-chip memory that you have here then we have these two axon master ports they are called mg p 0 and M GP one the Viets for each of the ports is 32 bits and these ports are practically very important because they are the only master ports of the PS so these ports are master ports of the PS meaning that I need to in fact develop XO slave interfaces or actual slave blocks for my module that I implement on the PL so that the logic that I have in the PS can transfer data to the logic that I have in the peer through these axial interfaces for example the CPU cores that you have here the arm CPU cores can initiate read and write transactions to the logic that you have implemented on the PL so whenever the arm CPU core that you have here initiates a read or write transaction to a specific address range that I will show you later then the transaction will be appear on these two acts or master port and then it will be transferred to the axial slave clocks that you have developed on the PL and then your logic on the PL should provide the switchable response and the data through these excellent master ports will then be transferred to your CPU cores these two ports are the main means of your arm host to control the hardware that you have developed on the PL so they are very important furthermore these two ports are the main means of the dma engine that you have here on the peers to perform reads and writes to the logic that you have developed on the PL part of the zinc so these two excellent master port are very important and they allow your CPU cores to initiate read and write transactions to the blocks that you create on the pier next we have the accelerator coherence support the accelerator current support is very similar to the HP ports the width is 64 bits it is an X I a slave port for the PS and thus we need to develop Excel Master plox and connect excellent master plugs to this ACP port so that we can initiate read and write transactions to the PS the main difference of ACP port compared to HP forth is that as you can see the ACP port gets directly connected to this new control unit and this new control unit is connected to the caches of the CPU the l1 cache and l2 cache so whenever a transaction is initiated by the master that you have here then first the caches of the CPU will be checked for that data indeed the caches will be searched for that specific physical address of the transaction that you have initiated and if the data is available if an instance of the data is available then maybe if all of the conditions are met the transaction will be responded by the data in the caches this allows faster data transfers and more energy efficient data transfers from the arms of system to the hardware accelerator or to the logic that you have developed on the PL if for the aksoy transaction which has a which has been initiated on the aksoy master there is no correspondence copy or data or instance inside the caches then what happens is that through this port the transaction will be redirected to the DRAM memory and from there it will be responded it will be answered by the probe by the data stored in the DRAM memory this of course introduces an additional latency which may degrade the performance of your system as we will see in future videos using the ACP port should be done with extreme caution whenever you have a kind of task in which you want your arm host CPU to collaborate to cooperate when a hardware accelerator implemented on the PL then the ACP port that you have here can be an extremely useful mean this means that they can use these two components can use the ACP port and in fact the caches of the CPU for sharing data this allows the arm host and accelerator to share data very fast with low energy consumption but as I will show you in future videos if you don't design your structure and algorithm and memory allocation very well or efficiently then this may not only improve your performance it can decrease in fact runtime and it can increase runtime and it can increase energy consumption finally we have this a slave general-purpose port here and again these are slave ports for the PS so in the PL you need to develop XO master logic and the axle master logic will drive transactions through these sir through these general-purpose slave port in fact to the peripheral and different components inside the PS again these ports are each one 32 bits okay now I want to have a more detailed look at the architecture internal architecture of the PS and how each of these axle plugs is connected to the rest of the components inside the PS for this purpose I use the zinc technical reference manual document which is coded as ug 585 InDesign links website and you can download it very easily indeed it is the most complete reference afforded darling zinc and the zinc architecture so we go ahead with a system level view of the zinc device first at the top we have the pl fabric and as you can see in the PL fabric we have first high-performance Excel controllers then we have cache coherence ACP port indeed the actual controller which will be connected to cache coherence ACP for and then we have general purpose axonal masters these are the guys who will drive SG p0 + s GP one port and then finally we have general purpose excise slave port these are the hardware logic that we develop and will be driven by g p1 and g p0 excellent master port iving in here and I go down through the architecture and I try to track each of these signals to see where does the signals go and to where they get connected so what I have here in fact is the set of EXO master clocks these aksoy master plugs are implemented inside my logic the logic that I have developed and I have implemented inside the peel of the zinc and then these acts are master plugs they are connected to the XY of slave ports indeed hv 0 2 HB 3 port of the PS and then they go inside the PS so first they are passing an async vlog meaning that the clock domain can be changed in fact the clock which is driving the rest of the logic here is not necessarily the same clock that's driving this axial interface here then we have a FIFO which can buffer the data on the transactions the data related to transactions which are passing which can result in an improved performance then finally we have a kind of axial inter connect we have an inter connect through which different ports which are coming in fact from the HP 0 2 HP three plugs will be connected to the rest of the architecture now let's scroll this map a little bit down and see what will happen to these signals so as you can see at the next step after this axial inter connect that we have here in fact we have three ports two ports each of them 64 bits and then another port and practically I would say all of these guys can access I would say each of these ports here and then we have this port this component here which is called ocm interconnect on-chip memory inter connect and if I follow this component later you will see that is connected to the on-chip memory and the interconnect is driven by first this axon HP interconnect meaning that every module that you have connected to each of the HP 0 2 HP 3 port can true this LCM interconnect can access the OCA and furthermore as we will see the rest of the components inside the system have also the possibility of accessing the OCA so let's scroll down the map a little bit more and see what will happen to these two signals here if I go down slightly I can see that these two signals will go to the ddr controller indeed they are going to the DRAM memory so the axon masters that you have implemented on the PL and have connected to the PS through each of the HP 0 2 HP 3 port they have the possibility of directly accessing the DRAM memory writing data to drm memory and writing and reading data from it then if I look I can see that my arm CPU host which is here has also the possibility of accessing the DRAM memory so we have our CPU host here then it has own caches then we have this new control unit then we have l2 cache and then we have this m0 port and if we follow the m0 port through an async block it will go to the DRAM memory so the aksoy masters that we implement on the PL and we connect to each of the HP 0 to h p3 port can share the data with the cpu over the DRAM memory they will access the data through these two ports and the CPU is able to access the data through this port now let's get back again to the top of the map and see to where the accelerator coherency port is connected here we have in fact an axon master plug that we have implemented on the zinc PL and this axial master plug is connected to the accelerator coherence support to the ACP port and then through an async block it is directly connected to this new control unit and then the snoop control unit is connected to the l2 cache or generally speaking the caches of the CPU this allows the axle plug that you have here to perform current transactions to the memory space of the CPU and if the data instance is available here on the l2 caches or the caches of the CPU that the transaction will be responded directly with the data level here this new control unit as you can see has two ports one port is going to the level 2 cache to the shared level to the level 2 shared cache of the arm cost and then another port if we follow we can see that if it will go to the ocm memory so the axon master that we have implemented on the PIO and is connected to the ACP port can true this new control unit go to the on-chip memory if they follow we can see this back here we have this new control unit we have the on ship Ram in fact 256 kilobytes and it has two ports one port is directly connected to this new control unit and another port is direct is connected to the ocm interconnect and the OCME inter connect as I described to you is can be driven by each of the HP 0hp 3 ports and also the rest of the components which are available inside using PS then if we look at the level two cache of the arm subsystem we can see that the level two cache has two ports one port as we described goes directly to the DRAM and the other port if we follow we can see that it goes to an a slave inter connect is called a slave inter connect so I have the port of the level two cache and it is coming and it's getting connected to the slave inter connect and this slave inter connect has four XO master ports as you can see em zero to entry here and each of these acts on master ports will be connected to the part of the system so through this interface the armed host that you have and your run is running on the system can access each of the peripherals that each one's as I described to the Zinke wise provides a set of peripherals to the user such as for example SD card interface USB interface network interface and all of these interfaces can be accessed by the CPU through this interface through this port furthermore we have a DMA in fact we have a kind of central I would call central DMA controller here which can be programmed by the CPU and this DMA controller can also perform some transactions and this transfers for us the DMA controller is a master for this central interconnect and if we follow the slave ports of the state central interconnect we can see that one of them if we follow it goes to the ocm interconnect and the other one if we follow it it goes to the DRAM and the other one if we follow it goes to the slave interconnect finally if I look at M Giro and envel parts I have here if I follow these two signals I will see that they reach in fact these two interfaces here which are practically the GP zero and the GP one port of the PS and these two master acts on master ports they can get connected to the slaves to the exile slaves that I will implement on the zinc PL ok so on the zinc PL I initiate I create hardware containing X is lave blocks and these acts is slave blocks can get connected to GP 1 and gp0 port and practically as I showed you these two ports are driven through the central through dab slave interconnect and since the slave interconnect is driven by the CPU by the DMA and by the peripherals each of them also can have access to the logic that you develop in the PL through these ports ok don't worry if this looks confusing to you practically the goal was to have a more detailed overview of the internal architecture and for actually designing systems with the zinc you don't really need to know everything in detail is absolutely not necessary and having a very rough knowledge of what are these ports what are these aksoy ports and how I should connect and where I should connect the hardware that I have developed in the PL this is completely enough ok obviously this in fact CPU system that we described in the PS it has a memory map and each component in this system will contain an address range for example you may ask me how shall I initiate transactions to the GP zero and GP 1 port suppose that I want to write a program running on the arm house and inside this program I want to read a piece of data or write data to the GP z ro and GP 1 port how shall I do that the answer is here in this table the address map of the device and inside this table you can see that every component in the system has a specifically defined address range in the next video I will describe this address map in more detail and we will see more detail how through programming the arm house we can access each of the in fact GP 0 GP 1 port and consequently the hardware that we implement on the PL this is the end of the current video thanks for watching and we will meet you in our future videos you you

Info

Channel: Microelectronic Systems Design Research Group

Views: 55,825

Rating: 4.9196429 out of 5

Keywords: Norbert Wehn, MohammadSadegh Sadri, Sadri, Mohammad, AXI, Zynq, Xilinx, University of Kaiserslautern, Universität Kaiserslautern, Microelectronic Systems Design Research Group, Computer Hardware (Industry), Advanced Microcontroller Bus Architecture

Id: nAycgPUOiAI

Channel Id: undefined

Length: 39min 9sec (2349 seconds)

Published: Mon Aug 25 2014