Enyx 40G MAC/PCS ULL Performance Report

Version

1.0.0

This document was automatically generated and include measurements resulting from behavioral simulations of the Enyx 40G MAC/PCS ULL core.

Latency Testing

The latency of the Enyx 40G MAC/PCS ULL core is measured according to the following methodology:

  • A packet generator, hosted in the FPGA user logic, generates payloads of a size ranging from 1 to 64 bytes with random data content.

  • The generated payloads are forwarded to the Media Access Control (MAC) mac_tx interface where they are time-stamped at Start of Packet (SoP).

  • The resulting Ethernet packets (Ethernet header + generated payload) are sent through the FPGA vendor pma_tx simulation model, loopbacked and sent back through the FPGA vendor pma_rx simulation model.

  • The Ethernet packets which are looped back are forwarded to the MAC which will output two events through the mac_rx interface which will be both time-stamped by user logic :

    • Raise the Start of Frame (SoF) signal as soon as the Ethernet preamble is received by the MAC. This will be time-stamped by the user logic.

    • Forward the looped back payload to the user logic via the mac_rx interface.

  • These two receipt timestamps and the transmission timestamp are then used to calculate the two following latency metrics for each payload size :

    • A SoP-to-SoF latency

    • A SoP-to-SoP latency

_images/MACPCS_perfs.svg

Figure 1 Enyx 40G MAC/PCS ULL Latency Testing Scenario

Virtex UltraScale+ targets

The following measurements include the latency of Xilinx Virtex Ultrascale+ GTY transceivers.

Table 1 Xilinx UltraScale+ GTY Transceiver latency

Item

Absolute Minimum

RX Path

133 + 0.7 * (10 - 2) UI = 13.44 ns

TX Path

141 + 0.3 * (10 - 2) UI = 13.91 ns

Total RTT

282 UI = 27.345 ns

Note

The GTY transceiver RX and TX latencies used in these measurements are the ones detailed by Xilinx in this document

The GTY transceivers are configured as follows: 32b Internal datawidth, running frequency 322.265625 Mhz, 10.3125Gbps.

RX and TX lines under “Total – absolute minimum line”.

Add “TX Note 2) 1)” on TX Path

Add “RX Note 3) 1) ” on RX Path

Lines can be added together to get the RTT in UI

1 UI = 1/10.3125 ns

Table 2 Enyx 40G MAC/PCS ULL RTT Latency for Virtex UltraScale+ targets - SOP-to-SOP

Configuration

Minimum

Maximum

No FIFO - DATA_WIDTH = 128b - 322 MHz PMA Rx and Tx clocks(ns)

61.41

64.52

Rx FIFO - DATA_WIDTH = 128b - 322 MHz PMA Tx clock (ns)

71.37

74.47

Tx & Rx FIFO - DATA_WIDTH = 128b - 350 MHz user clock (ns)

80.0

85.71

Table 3 Enyx 40G MAC/PCS ULL RTT Latency for Virtex UltraScale+ targets - SOP-to-SOF

Configuration

Minimum

Maximum

No FIFO - DATA_WIDTH = 128b - 322 MHz PMA Rx and Tx clocks(ns)

55.2

58.32

Rx FIFO - DATA_WIDTH = 128b - 322 MHz PMA Tx clock (ns)

62.06

65.16

Tx & Rx FIFO - DATA_WIDTH = 128b - 350 MHz user clock (ns)

71.43

77.14

_images/MACPCS_40G_vusp_Latency.svg

Figure 2 Enyx 40G MAC/PCS ULL RTT Latency diagram for Virtex UltraScale+ targets - SOP-to-SOP

Resources & Frequencies

Below is a table showing the Enyx 40G MAC/PCS ULL resource usage along with the timing margins and does not include the logic used by transceivers (PMA).

Table 4 Resources and timing summary.

Family

Device

Speed Grade

Scenario

Data width

Slack (ns)

Logic

%

Registers

%

Memory

%

Virtex US+

VU9P

2

No FIFO

128b

0.207

11482

1

6792

1

0

0

Virtex US+

VU9P

2

Rx FIFO

128b

0.165

11508

1

7055

1

0

0

Virtex US+

VU9P

2

Tx & Rx FIFO

128b

0.135

11609

1

7295

1

0

0

Virtex US+

VU9P

3

No FIFO

128b

0.22

11452

1

6794

1

0

0

Virtex US+

VU9P

3

Rx FIFO

128b

0.192

11447

1

7052

1

0

0

Virtex US+

VU9P

3

Tx & Rx FIFO

128b

0.171

11589

1

7295

1

0

0

Note

  • Logic is expressed in ALM for Intel and in LUT for Xilinx

  • Memory as block memory usage is expressed in M20k for Intel and in 36k for Xilinx

  • These resource utilization and slack metrics are the average of the values obtained by running 10 x synthesis and place and route.

The constraints applied to the FPGA toolchain are provided in this section: FPGA compiler constraints.

The configuration of the Enyx 40Gbs MAC/PCS ULL used for these tests are detailed in the section: Enyx 40G MAC/PCS ULL core configuration.

Testing scenarios description

The performance of the Enyx 40G MAC/PCS ULL core is measured according to 3 different configurations depending on which clock is being used to drive the user logic connected to the MAC.

No FIFO configuration

  • DUAL_CLOCK_FIFO_RX_ENABLE must be set to false

  • DUAL_CLOCK_FIFO_TX_ENABLE must be set to false

_images/enyx_macpcs_40g_nofifo.svg

Figure 3 Enyx 40G MAC/PCS ULL architecture with no Dual clock FIFOs : ‘No FIFO’

Rx FIFO configuration

  • DUAL_CLOCK_FIFO_RX_ENABLE must be set to true

  • DUAL_CLOCK_FIFO_TX_ENABLE must be set to false

_images/enyx_macpcs_40g_notx.svg

Figure 4 Enyx 40G MAC/PCS ULL architecture with only the Rx Dual clock FIFO : ‘Rx FIFO’

Note

Tx FIFO and Rx FIFO configurations and results are similar. Tx FIFO configuration is therefore not tested.

Tx & Rx FIFO configuration

  • DUAL_CLOCK_FIFO_RX_ENABLE must be set to true

  • DUAL_CLOCK_FIFO_TX_ENABLE must be set to true

_images/enyx_macpcs_40g.svg

Figure 5 Enyx 40G MAC/PCS ULL architecture with both Tx and Rx Dual clock FIFOs : ‘Tx & Rx FIFO’

FPGA compiler constraints

Below are the constraints that are provided to the FPGA compiler tools for the resources and working frequencies estimations.

Virtex UltraScale+ (-2 speed grade) targets

################ FPGA ####################
set_property part xcvu9p-flgb2104-2-e [current_project]

regex {Vivado v(\d+)\.(\d).*SW Build (\d+).*IP Build (\d+)} [version] matched major minor sw_build ip_build
if {$major < 2020} {set_property STEPS.SYNTH_DESIGN.ARGS.FANOUT_LIMIT 400 [get_runs synth_*]}

set_property strategy Flow_PerfOptimized_high [get_runs synth_1]
set_property STEPS.SYNTH_DESIGN.ARGS.ASSERT true [get_runs synth_1]

#set_property STEPS.SYNTH_DESIGN.ARGS.FSM_EXTRACTION one_hot [get_runs synth_*]
#set_property STEPS.SYNTH_DESIGN.ARGS.KEEP_EQUIVALENT_REGISTERS true [get_runs synth_*]
#set_property STEPS.SYNTH_DESIGN.ARGS.RESOURCE_SHARING off [get_runs synth_*]
#set_property STEPS.SYNTH_DESIGN.ARGS.NO_LC true [get_runs synth_*]
#set_property STEPS.SYNTH_DESIGN.ARGS.SHREG_MIN_SIZE 5 [get_runs synth_*]

Virtex UltraScale+ (-3 speed grade) targets

################ FPGA ####################
set_property part xcvu9p-flgb2104-3-e [current_project]

regex {Vivado v(\d+)\.(\d).*SW Build (\d+).*IP Build (\d+)} [version] matched major minor sw_build ip_build
if {$major < 2020} {set_property STEPS.SYNTH_DESIGN.ARGS.FANOUT_LIMIT 400 [get_runs synth_*]}

set_property STEPS.SYNTH_DESIGN.ARGS.FSM_EXTRACTION one_hot [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.KEEP_EQUIVALENT_REGISTERS true [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.RESOURCE_SHARING off [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.NO_LC true [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.SHREG_MIN_SIZE 5 [get_runs synth_*]

Enyx 40G MAC/PCS ULL core configuration

Below are the generic parameters used for the Enyx 40G MAC/PCS ULL core under test:

  • The PMA is configured with a data width of 128bits running at 322.265625 MHz.

  • Fixed parameters throughout all scenarios and FPGA targets :

    • PCS_RX_LATENCY_MODE = 1 // One register for timing purpose

    • PCS_TX_LATENCY_MODE = 1 // One register for timing purpose

    • MAC_RX_LATENCY_MODE = 0 // Asynchronous

    • MAC_TX_LATENCY_MODE = 1 // One register for timing purpose

    • DUAL_CLOCK_FIFO_RX_DEPTH = 16

    • DUAL_CLOCK_FIFO_TX_DEPTH = 16

    • PCS_RX_DESKEW_BIT_COUNT = 256

    • MAC_RX_DATA_WIDTH = 128

    • MAC_RX_CRC32_LATENCY = 1

    • MAC_RX_CRC32_DISABLE = 0

    • MAC_RX_CRC32_REMOVE = 1

    • MAC_RX_MIN_PKT_LENGTH = 64

    • MAC_RX_ERROR_DELAYED_ENABLE = 0

    • MAC_TX_DATA_WIDTH = 128

    • MAC_TX_CRC32_DISABLE = 0

    • MAC_TX_CRC32_CORRUPTED_ON_ERROR = 0

    • MAC_TX_CRC32_ADD = 1

    • MAC_TX_PADDING_SIZE = 64

    • MAC_TX_PAUSE_LENGTH = 0

    • MAC_TX_IFG_COUNT = 12

  • Parameters specific to each scenario :

    • Dual clock FIFOs are activated or not :

      • DUAL_CLOCK_FIFO_RX_ENABLE = 0 or 1

      • DUAL_CLOCK_FIFO_TX_ENABLE = 0 or 1

    • User logic clock frequency is set to either 322.265625 MHz (PMA clock) or 350 MHz :

      • MAC_RX_CLK_FREQ_KHZ = 322 266 or 350 000

      • MAC_TX_CLK_FREQ_KHZ = 322 266 or 350 000