Eight benefits of using FPGAs with on-chip high-speed networking

Since FPGAs were first introduced decades ago, each new architecture has continued to use bit-wise routing structures. Although this method has been successful, with the rise of high-speed communication standards, it is always required to continuously increase the width of the on-chip bus to support these new data rates. One consequence of this limitation is that designers often spend a lot of development time trying to achieve timing closure, sacrificing performance to place and route their designs.

introduction

Since FPGAs were first introduced decades ago, each new architecture has continued to use bit-wise routing structures. Although this method has been successful, with the rise of high-speed communication standards, it is always required to continuously increase the width of the on-chip bus to support these new data rates. One consequence of this limitation is that designers often spend a lot of development time trying to achieve timing closure, sacrificing performance to place and route their designs.

Traditional FPGA wiring is based on multiple independent segment interconnects (segments) running in the horizontal and vertical directions in the entire FPGA, with a switch box at the intersection of the horizontal and vertical wiring to realize the connection of the channels. Through these independent segments and switch boxes, a path from any source to any destination can be constructed on the FPGA. This unified structure of FPGA wiring provides great flexibility for realizing any logic function, and can be used for any data path width in the FPGA logic array.

Although bit-wise routing in FPGAs is very flexible, its disadvantage is that each segment adds delay to any given signal path. Signals that need to be transmitted over long distances in the FPGA will cause connection delays between segments, thereby reducing the performance of the function. Another challenge of bit-wise routing is congestion, which requires the signal path to bypass congestion, which will cause more delay and result in further degradation of performance.

Achronix sees this challenge as an opportunity to develop a new architecture to eliminate the design challenges of traditional FPGAs and improve system performance. Achronix’s solution is to build a revolutionary two-dimensional (2D) high-speed network on chip (NoC) for its brand-new Speedster7t FPGA series devices on top of the traditional segmented FPGA wiring structure. Speedster7t NoC is connected to all on-chip high-speed interfaces: 400G Ethernet, PCIe Gen5, GDDR6 and multiple ports of DDR4/5.

The inside of the NoC is composed of a set of rows and columns, which distribute the network data traffic from the horizontal and vertical directions in the entire FPGA logic array. The primary NoC access (NAP) point and the secondary NoC access point are located at the intersection of each row and each column of the NoC. These NAPs can be the source or destination between the NoC and the programmable logic array.

Eight benefits of using FPGAs with on-chip high-speed networking

Figure 1: Speedster7t’s network on chip (NoC) and interface

Ethernet: Ethernet

Security: Security

Configuration: Configuration

each direction: each direction

Speedster7t’s NoC seems to only help the wiring bus inside the FPGA; however, this new architecture can significantly improve the efficiency of designers, realize brand-new design functions, and provide the ability to easily implement intensive data processing applications. Listed below are the eight most significant application scenarios in terms of efficiency improvements, design changes, and performance improvements.

Simplify high-speed data distribution in the entire FPGA logic array

In various traditional FPGA architectures, bidirectional read/write operations on off-chip memory connected to FPGA and external high-speed data sources connected to it require data to go through a long and segmented route in the FPGA logic architecture path. This restriction not only limits the bandwidth, but also consumes the wiring resources required by the user design in the logic array, which brings challenges to FPGA designers in terms of timing closure, especially when other logic functions increase device utilization. when.

Using Speedster7t’s NoC to transfer data from external sources to FPGA and memory is much easier than using traditional FPGA architecture to complete the same work. Speedster7t NoC enhances the traditional programmable interconnection in the FPGA array, where the NoC is like a highway network superimposed on a city street system. Although the traditional, programmable interconnect matrix in Speedster7t FPGAs is still suitable for slower local data traffic, NoC can handle more challenging, high-speed data streams.

Each row or column in the NoC is implemented as two 256-bit unidirectional data channels running at a fixed clock rate of 2 Ghz. The rows have east/west channels and the columns have north/south channels, allowing each NoC row or column to simultaneously handle 512 Gbps of data traffic in each direction. All in all, these channels can transmit large amounts of data in the FPGA array by writing simple Verilog or VHDL codes. These codes support the communication between FPGA and NAP and connect to the NoC highway network.

The following figure shows the data transmission between various points in NoC. The logic of point 1 and point 2 respectively instantiate a horizontal NAP. NAP can send and receive data, but each individual data stream only faces one direction. Similarly, the logic of point 3 and point 4 instantiates a vertical NAP and can send data streams between each other.

Eight benefits of using FPGAs with on-chip high-speed networking

Figure 2: Data flow across device logic arrays on NoC

Automatically connect the PCIe interface to the storage

In the current FPGA, designers must consider the delays in the device due to the connection logic, wiring, and the location of input and output signals when connecting a high-speed interface to a storage device connected to the FPGA for reading and writing. In order to implement basic interface functions, it usually takes a lot of time to construct a simple storage interface during the design process.

In the Speedster7t architecture, the task of connecting the embedded PCIe Gen5 interface to the connected GDDR6 or DDR4 memory can be automatically handled by the peripheral NoC, and the designer does not need to write any RTL to establish these connections. Because NoC is connected to all peripheral IP interfaces, designers have great flexibility when connecting PCIe to any memory interface of GDDR6 or DDR4. In the example below, NoC can provide enough bandwidth to continuously support PCIe Gen 5 communication streams connected to any two channels of GDDR6 memory. This kind of high-bandwidth connection can be realized without consuming any FPGA logic array resources, and the design time is almost zero. Users only need to enable PCIe and GDDR6 interfaces to send transactions on NoC.

Eight benefits of using FPGAs with on-chip high-speed networking

Figure 3: Connect PCIe directly to the GDDR6 interface

Realize safe partial reconfiguration on a standalone FPGA logic array module

Like other FPGAs based on static random access memory (SRAM), Speedster7t FPGAs must be configured when power is applied. Speedster7t FPGA has an on-chip FPGA configuration unit (FCU) to manage the initial configuration of the FPGA and any subsequent partial reconfiguration. The FCU is also connected to the NoC, which provides more flexibility when configuring the FPGA. Using NoC to transfer the configuration bit stream to the Speedster7t FCU, you can use a new method that was previously unavailable to configure the FPGA.

Before device configuration, Speedster7t NoC can be used for certain read/write transactions: PCIe to GDDR6, PCIe to DDR4, and finally PCIe to FCU. Once the PCIe interface is set up, the FPGA can receive the configuration bitstream through the PCIe interface and send it to the FCU to configure the rest of the device. Once at the FCU, the configuration bit stream is written into the FPGA programmable logic to configure the device. After the device is configured, designers can flexibly reconfigure certain parts of the FPGA (partial reconfiguration) to add new functions or improve acceleration performance without shutting down the FPGA.

The new partial reconfiguration bit stream can be sent to the FCU through the PCIe interface to reconfigure any part of the device. When part of the device is reconfigured, by instantiating a NAP in the required area to communicate with the NoC, any data that enters and exits the new configuration area can be easily accessed in the Speedster7t1500 device. NoC eliminates the complexity of traditional FPGA partial reconfiguration, because users do not have to worry about routing around existing logic functions and affecting performance, nor do they have to worry about not being able to access certain device pins due to existing logic in this area. This feature saves the designer’s time and provides greater flexibility when using partial reconfiguration.

In addition, partial reconfiguration allows designers to adjust the logic within the device when the workload changes. For example, if the FPGA is performing a compression algorithm on the input data, and compression is no longer needed, the host CPU can tell the FPGA to reconfigure and load the optimized new design to handle the next workload. While the device is still running, partial reconfiguration can be done independently at the logical array cluster (cluster) level. A clever use case is to develop a self-aware FPGA that uses a soft CPU to monitor device operations to initiate partial reconfiguration in real time, shut down logic to save power, or add more accelerator modules to the FPGA architecture. To temporarily process a large amount of input data. These features provide designers with unprecedented configuration flexibility.

Easily support hardware virtualization

Speedster7t NoC uses NAP and its AXI interface to provide designers with the unique ability to create virtualized secure hardware in a single FPGA. To directly connect a programmable logic design to the NoC, only a NAP and its AXI4 interface need to be instantiated in the logic design. Each NAP also has an associated address translation table (ATT), which converts the logical address on the NAP to the physical address on the NoC. NAP’s ATT allows programmable logic modules to use local addresses while mapping NoC-oriented transactions to addresses allocated by NoC global memory mapping. This remapping function can be used in a variety of ways. For example, it can be used to allow all identical copies of acceleration engines to use zero-based virtual addressing while simultaneously sending data traffic from each acceleration engine to different physical storage locations.

Each ATT entry also contains an access protection bit to prevent the node from accessing the forbidden address range. This function provides an important inter-process safety mechanism to prevent multiple applications or tasks running on a Speedster7t FPGA from interfering with the memory modules assigned to other applications or tasks. This safety mechanism also helps prevent system crashes due to accidental, accidental or even deliberate storage address conflicts. In addition, designers can use this scheme to prevent logic functions from accessing the entire storage device.

Eight benefits of using FPGAs with on-chip high-speed networking

Figure 4: Using Speedster7t NoC to implement hardware virtualization

Memory Space: storage space

Simplify team collaborative design

Team-based collaborative FPGA design is not a new concept, but the underlying architecture and wiring depend on other parts of the FPGA, making it very challenging to implement this simple concept. Once a team has completed a part of the design, another team that designs other parts will usually encounter challenges when trying to access the resources on the other end of the device, because the wiring needs to be done in the completed part of the design. Similarly, changes to the area or size of some FPGAs that have been designed and routed may have a knock-on effect on all other FPGA design modules.

Using Speedster7t NoC, design modules can be mapped to any part of the FPGA, and resource allocation can be changed without affecting the timing, layout or routing of other FPGA modules. Since all NAPs in the device support unlimited access to NoC for communication by each design module, it makes team-based design possible. Therefore, if a certain part of a design increases in scale, as long as there are enough FPGA resources available, the data flow will be automatically managed by the NoC, so that the designer does not have to worry about whether the timing is met, and other team members are The possible follow-up effects of other parts of the design being carried out.

Eight benefits of using FPGAs with on-chip high-speed networking

Figure 5: Multiple design teams dedicated to the development of the same FPGA

Design Team: Design Team

Speed ​​up design through independent interface and logic verification

Another unique feature of Speedster7t NoC is to support designers to configure and verify I/O connections independently of user logic. For example, one design team can verify the PCIe to GDDR6 interface, while another design team can independently verify the internal logic functions. This independent operation can be realized because the peripheral part of NoC is connected with PCIe, GDDR6, DDR4 and FCU without consuming any FPGA resources. These connections can be tested without using any HDL code, so that the interface and logic can be independently verified at the same time. This function eliminates the dependency between verification steps and achieves a faster overall verification speed than traditional FPGA architecture.

Eight benefits of using FPGAs with on-chip high-speed networking

Figure 6: Independent I/O and logic verification

Design Team 1: I/O Verification: Design Team 1: I/O Verification

Design Team 2: Logic Verification: Design Team 2: Logic Verification

Use Packet Mode to simplify 400 Gbps Ethernet applications

The challenge in implementing a high-speed 400 Gbps Ethernet data path in an FPGA is to find a bus width that can meet the performance requirements of the FPGA. For 400G Ethernet, the only viable option for full bandwidth operation is a 1,024-bit bus running at 724 MHz, or a 2,048-bit bus running at 642 MHz. Such a wide bus is difficult to route because they consume a lot of logic resources in the FPGA architecture, and even the most advanced FPGA will have timing closure challenges under such speed requirements.

However, in the Speedster7t architecture, designers can use a new processing mode called packet mode, in which the incoming Ethernet stream is rearranged into four narrower 32-byte packets, or four Independent 256-bit bus running at 506 MHz. The advantages of this mode include: when the data packet ends, the waste of bytes is reduced, and data can be transmitted in parallel, instead of having to wait until the first data packet is completed before starting the second data packet transmission. The Speedster7t FPGA architecture is designed to enable the grouping mode by connecting the Ethernet MAC directly to a specific NoC column, and then using a user-instantiated NAP to connect from the NoC column to the logic array. Using the NoC column, data can be sent to any position in the FPGA fabric along the column for further processing. Using the ACE design tool to configure the packet mode can greatly simplify user design and improve efficiency when processing 400 Gbps Ethernet data streams.

Eight benefits of using FPGAs with on-chip high-speed networking

Figure 7: Data bus rearrangement in packet mode

Packet: data packet

Byte: byte

Eight benefits of using FPGAs with on-chip high-speed networking

Figure 8: 400 Gbps Ethernet using packet mode

Reduce logic footprint and improve overall FPGA performance

Compared with previous traditional FPGAs, Speedster7t NoC has greater flexibility and simpler design methods. A potential benefit is that NoC will automatically reduce the amount of logic required for a given design. Designs can use NoC instead of FPGA logic arrays for inter-module routing. The ACE design tool automatically manages the complexity of connecting the design unit to the Speedster7t NoC, so designers can achieve productivity without writing HDL code. This approach simplifies the time-consuming challenge of achieving timing closure without degrading overall application performance due to wiring congestion in the FPGA logic array. NoC can also increase device utilization without sacrificing FPGA performance, and can significantly increase the number of lookup tables (LUTs) available for calculations.

To emphasize this advantage, we created an example design that supports convolution of a two-dimensional input image. Each module uses Speedster7t machine learning processor (MLP) and BRAM module, and each MLP performs 12 int8 multiplications in one cycle. Link 40 two-dimensional convolution modules together to utilize almost all available BRAM and MLP resources in the device. There are a total of 40 two-dimensional convolution example design examples running in parallel, using 94% of MLP, 97% of BRAM, but only 8% of LUT. Of the total available LUTs, the remaining 92% can still be used for other functions.

As more instances are built into the device, the highest frequency of a single unit module (FMAX) Will not decrease. This design can maintain performance because the data going in and out of each two-dimensional convolution module can directly access the GDDR6 memory from the NAP connected to the NoC without routing through the FPGA logic array.

Eight benefits of using FPGAs with on-chip high-speed networking

Figure 9: A Speedster7t device with 40 examples of two-dimensional convolution modules

in conclusion

Speedster7t NoC has realized a fundamental change in the FPGA design process. Achronix is ​​the first FPGA company to implement a two-dimensional network on chip (2D NoC), which can connect all system interfaces and FPGA logic arrays. This new architecture makes Achronix’s FPGAs particularly suitable for high-bandwidth applications, while significantly increasing the productivity of designers. Because NoC manages all network functions between the data accelerator and high-speed data interface designed in FPGA, the designer only needs to design its data accelerator and connect it to the NAP primitive. ACE and NoC are responsible for all other matters. By using NoC, FPGA designers will benefit from:

l Simplify high-speed data distribution in the entire FPGA logic array

l Automatically connect the PCIe interface to the storage

l Realize safe partial reconfiguration on a separate FPGA logic array module

l Easily support hardware virtualization

l Simplify team design

l Speed ​​up design through independent interface and logic verification

l Use packet mode to simplify 400 Gbps Ethernet applications

l Reduce logic occupation and improve overall FPGA performance

The Links:   SK30GD123D EL640480-AA1

Leave a Reply

Your email address will not be published.