Enyx 64b66b MAC/PCS ULL Performance Report

Version

1.0.0

This document was automatically generated and include measurements resulting from behavioral simulations of the Enyx 64b66b MAC/PCS ULL core.

Latency Testing

The latency of the Enyx 64b66b MAC/PCS ULL core is measured according to the following methodology:

  • A packet generator, hosted in the FPGA user logic, generates payloads of a size ranging from 1 to 64 bytes with random data content.

  • The generated payloads are forwarded to the Media Access Control (MAC) mac_tx interface where they are time-stamped at Start of Packet (SoP).

  • The resulting Ethernet packets (Ethernet header + generated payload) are sent through the FPGA vendor pma_tx simulation model, loopbacked and sent back through the FPGA vendor pma_rx simulation model.

  • The Ethernet packets which are looped back are forwarded to the MAC which will output two events through the mac_rx interface which will be both time-stamped by user logic :

    • Raise the Start of Frame (SoF) signal as soon as the Ethernet preamble is received by the MAC. This will be time-stamped by the user logic.

    • Forward the looped back payload to the user logic via the mac_rx interface.

  • These two receipt timestamps and the transmission timestamp are then used to calculate the two following latency metrics for each payload size :

    • A SoP-to-SoF latency

    • A SoP-to-SoP latency

_images/MACPCS_perfs.svg

Figure 1 Enyx 64b66b MAC/PCS ULL Latency Testing Scenario

Virtex UltraScale+ targets

Table 1 Xilinx UltraScale+ GTY Transceiver latency

Item

Absolute Minimum for 16 bits at 644.53125 MHz

Absolute Minimum for 32 bits at 322.265625 MHz

RX Path

[77 + 0.7 * (10 - 2)] UI = 8.01 ns

[133 + 0.7 * (10 - 2)] UI = 13.44 ns

TX Path

[75 + 0.3 * (10 - 2)] UI = 7.51 ns

[141 + 0.3 * (10 - 2)] UI = 13.90 ns

Total RTT

160 UI = 15.515 ns

282 UI = 27.345 ns

Note

The GTY transceiver RX and TX latencies used in the above calculations are detailed by Xilinx in this document

The GTY transceivers are configured with these two possible configurations:

  • 16 bits Internal Data Width running at 644.53125 MHz, 10.3125 Gbps.

  • 32 bits Internal Data Width running at 322.265625 MHz, 10.3125 Gbps.

The total latency is calculated by adding the following sections of the document :

  • TX “Total – absolute minimum for a given internal data width” for 64b + Note 2) 1.

  • RX “Total – absolute minimum” for 64b + Note 3) 1)

  • 1 UI = 1/10.3125 ns

The following measurements include the latency of Xilinx Virtex Ultrascale+ GTY transceivers as described above.

Table 2 Enyx 64b66b MAC/PCS ULL RTT Latency for Virtex UltraScale + target - SOP-to-SOF

Configuration

Minimum

Maximum

No FIFO - 16b PMA and 16b MAC

22.23

28.45

No FIFO - 32b PMA and 32b MAC

29.22

35.43

Rx FIFO - 32b PMA and 32b MAC

37.24

43.44

Tx & Rx FIFO - 32b PMA and 32b MAC

45.71

54.29

Table 3 Enyx 64b66b MAC/PCS ULL RTT Latency for Virtex UltraScale + target - SOP-to-SOP

Configuration

Minimum

Maximum

No FIFO - 16b PMA and 16b MAC

28.44

34.66

No FIFO - 16b PMA and 32b MAC

31.56

37.76

No FIFO - 32b PMA and 32b MAC

35.42

41.64

Rx FIFO - 16b PMA and 16b MAC

34.13

40.34

Rx FIFO - 16b PMA and 32b MAC

40.34

46.54

Rx FIFO - 32b PMA and 32b MAC

46.54

52.75

Tx & Rx FIFO - 32b PMA and 32b MAC

57.14

62.86

_images/MACPCS_10G_vusp_Latency.svg

Figure 2 Enyx 64b66b MAC/PCS ULL RTT Latency diagram for Virtex UltraScale + targets - SOP-to-SOP

Resources & Frequencies

Below is a table showing the Enyx 64b66b MAC/PCS ULL resource usage along with the timing margins and does not include the logic used by transceivers (PMA).

  • In the No FIFO configuration and Rx FIFO configuration, there are 3 possible scenario:

    • 16b PMA - 16b MAC:

      The user clock is set to the respective PMA clocks so everything is running at 644.53125 MHz.
    • 16b PMA - 32b MAC:

      The PMA is running at 16 bits at 644.53125 MHz. Whereas the MAC/PCS is running at 32 bits at 322.265625 MHz.
      As the clock frequency in the MAC/PCS component is divided by two and coming from the same clock domain (from PMA clock domain), the Synchronous Dual Clock Bridge component is used.
    • 32b PMA - 32b MAC:

      The user clock is set to the respective PMA clocks so everything is running at 322.265625 MHz.

    Note

    For the Rx FIFO configuration, in the Tx side, the User Logic is driven with the pma_tx_clk (mac_tx_clk = pma_tx_clk).
  • In the Tx & Rx FIFO scenario, the user clock is set to 350 MHz therefore the mac user Tx & Rx interfaces connected to the Dual Clock FIFOs.

    Table 4 Resources and timing summary.

    Family

    Device

    Speed Grade

    Scenario

    PMA Data width

    MAC Data width

    Slack (ns)

    Logic

    %

    Registers

    %

    Memory

    %

    Virtex US+

    VU9P

    2

    No FIFO

    32b

    32b

    0.322

    1641

    1

    1363

    1

    0

    0

    Virtex US+

    VU9P

    2

    Rx FIFO

    32b

    32b

    0.253

    1738

    1

    1525

    1

    0

    0

    Virtex US+

    VU9P

    2

    Tx & Rx FIFO

    32b

    32b

    0.201

    1808

    1

    1631

    1

    0

    0

    Virtex US+

    VU9P

    3

    No FIFO

    32b

    32b

    0.365

    1632

    1

    1363

    1

    0

    0

    Virtex US+

    VU9P

    3

    No_FIFO

    16b

    16b

    0.052

    1050

    1

    1182

    1

    0

    0

    Virtex US+

    VU9P

    3

    No_FIFO

    16b

    32b

    0.004

    1202

    1

    1477

    1

    0

    0

    Virtex US+

    VU9P

    3

    Rx FIFO

    16b

    32b

    0.02

    1231

    1

    1543

    1

    0

    0

    Virtex US+

    VU9P

    3

    Rx FIFO

    32b

    32b

    0.225

    1733

    1

    1526

    1

    0

    0

    Virtex US+

    VU9P

    3

    Tx & Rx FIFO

    32b

    32b

    0.266

    1798

    1

    1632

    1

    0

    0

Note

  • Logic is expressed in LUT (Look-Up Table) for Xilinx

  • Memory as block memory usage is expressed in 36k for Xilinx

  • These resource utilization and slack metrics are the average of the values obtained by running ten times synthesis and place and route.

The constraints applied to the FPGA toolchain are provided in this section: FPGA compiler constraints.

Testing scenarios description

The performance of the Enyx 64b66b MAC/PCS ULL component is measured according to 7 different configurations depending on which clock is being used to drive the user logic connected to the MAC.
Each configuration can be configured depending on the generic parameters applied at synthesis, described in each section.
The common generic parameters used for the performance reports are provided hereinafter :
DUAL_CLOCK_FIFO_RX_DEPTH = 16
DUAL_CLOCK_FIFO_TX_DEPTH = 16
MAC_RX_CRC32_LATENCY = 1
MAC_RX_CRC32_DISABLE = 0
MAC_RX_CRC32_REMOVE = 0
MAC_RX_MIN_PKT_LENGTH = 64
MAC_RX_ERROR_DELAYED_ENABLE = 0
MAC_TX_CRC32_DISABLE = 0
MAC_TX_CRC32_CORRUPTED_ON_ERROR = 0
MAC_TX_CRC32_ADD = 1
MAC_TX_PADDING_SIZE = 64
MAC_TX_PAUSE_LENGTH = 0
MAC_TX_IFG_COUNT = 12

No FIFO configuration

No FIFO - 16b PMA - 16b MAC

The PMA and MAC/PCS are configured with 16 bits at 644.53125 MHz.
As data width and clock frequency remain unchanged, therefore:
PCS_RX_LATENCY_MODE = 1
PCS_TX_LATENCY_MODE = 1
MAC_RX_LATENCY_MODE = 0
MAC_TX_LATENCY_MODE = 0
PMA_DATA_WIDTH = 16
MAC_TX_DATA_WIDTH = 16
MAC_RX_CLK_FREQ_KHZ = 644531
MAC_TX_CLK_FREQ_KHZ = 644531
DUAL_CLOCK_FIFO_RX_ENABLE = 0
DUAL_CLOCK_FIFO_TX_ENABLE = 0
SYNC_DUAL_CLOCK_RX_ENABLE = 0
SYNC_DUAL_CLOCK_TX_ENABLE = 0
_images/enyx_macpcs_10g_nofifo_16b_644.svg

Figure 3 Enyx 64b66b MAC/PCS ULL architecture with no Synchronous Dual Clock Bridge and no Dual clock FIFOs

No FIFO - 16b PMA - 32b MAC

Here, the PMA is configured with 16 bits at 644.53125 MHz whereas the MAC/PCS is configured with 32 bits at 322.265625 MHz.
As the clock frequency is divided by two in MAC/PCS component, the Synchronous Dual Clock Bridge component will be used. Hence:
PCS_RX_LATENCY_MODE = 1
PCS_TX_LATENCY_MODE = 1
MAC_RX_LATENCY_MODE = 0
MAC_TX_LATENCY_MODE = 0
PMA_DATA_WIDTH = 16
MAC_TX_DATA_WIDTH = 32
MAC_RX_CLK_FREQ_KHZ = 322265
MAC_TX_CLK_FREQ_KHZ = 322265
DUAL_CLOCK_FIFO_RX_ENABLE = 0
DUAL_CLOCK_FIFO_TX_ENABLE = 0
SYNC_DUAL_CLOCK_RX_ENABLE = 1
SYNC_DUAL_CLOCK_TX_ENABLE = 1
_images/enyx_macpcs_10g_sdc_16b_644_to_32b_322.svg

Figure 4 Enyx 64b66b MAC/PCS ULL architecture with Synchronous Dual Clock Bridge and no Dual clock FIFOs

Note

Using the Synchronous Dual Clock Bridge requires that both PMA clock and MAC/PCS clock are synchronous and have the appropriate frequency ratio.

No FIFO - 32b PMA - 32b MAC

The PMA and MAC/PCS are now configured with 32 bits at 322.265625 MHz.
PCS_RX_LATENCY_MODE = 0
PCS_TX_LATENCY_MODE = 0
MAC_RX_LATENCY_MODE = 0
MAC_TX_LATENCY_MODE = 0
PMA_DATA_WIDTH = 32
MAC_TX_DATA_WIDTH = 32
MAC_RX_CLK_FREQ_KHZ = 322265
MAC_TX_CLK_FREQ_KHZ = 322265
DUAL_CLOCK_FIFO_RX_ENABLE = 0
DUAL_CLOCK_FIFO_TX_ENABLE = 0
SYNC_DUAL_CLOCK_RX_ENABLE = 0
SYNC_DUAL_CLOCK_TX_ENABLE = 0
_images/enyx_macpcs_10g_nofifo.svg

Figure 5 Enyx 64b66b MAC/PCS ULL architecture with no Synchronous Dual Clock Bridge and no Dual clock FIFOs : ‘No FIFO’

Rx FIFO configuration

Rx FIFO - 16b PMA - 16b MAC

With this scenario, both PMA and MAC/PCS are configure with 16 bits at 644.53125 MHz.
Moreover, in the Tx side, the User Logic is driven with the pma_tx_clk (mac_tx_clk = pma_tx_clk). Hence:
PCS_RX_LATENCY_MODE = 1
PCS_TX_LATENCY_MODE = 1
MAC_RX_LATENCY_MODE = 0
MAC_TX_LATENCY_MODE = 0
PMA_DATA_WIDTH = 16
MAC_TX_DATA_WIDTH = 16
MAC_RX_CLK_FREQ_KHZ = 644531
MAC_TX_CLK_FREQ_KHZ = 644531
DUAL_CLOCK_FIFO_RX_ENABLE = 1
DUAL_CLOCK_FIFO_TX_ENABLE = 0
SYNC_DUAL_CLOCK_RX_ENABLE = 0
SYNC_DUAL_CLOCK_TX_ENABLE = 0
_images/enyx_macpcs_10g_notx_16b_644.svg

Figure 6 Enyx 64b66b MAC/PCS ULL architecture with only the Rx Dual clock FIFO : ‘Rx FIFO’

Note

Tx FIFO and Rx FIFO configurations and results are similar. Tx FIFO configuration is therefore not tested.

Rx FIFO - 16b PMA - 32b MAC

PCS_RX_LATENCY_MODE = 1
PCS_TX_LATENCY_MODE = 1
MAC_RX_LATENCY_MODE = 0
MAC_TX_LATENCY_MODE = 0
PMA_DATA_WIDTH = 16
MAC_TX_DATA_WIDTH = 32
MAC_RX_CLK_FREQ_KHZ = 322265
MAC_TX_CLK_FREQ_KHZ = 322265
DUAL_CLOCK_FIFO_RX_ENABLE = 1
DUAL_CLOCK_FIFO_TX_ENABLE = 0
SYNC_DUAL_CLOCK_RX_ENABLE = 0
SYNC_DUAL_CLOCK_TX_ENABLE = 1
_images/enyx_macpcs_10g_notx_16b_644_to_32b_322.svg

Figure 7 Enyx 64b66b MAC/PCS ULL architecture with only the Rx Dual clock FIFO and the Synchronous Dual Clock Tx Bridge : ‘Rx FIFO’

Rx FIFO - 32b PMA - 32b MAC

PCS_RX_LATENCY_MODE = 0
PCS_TX_LATENCY_MODE = 0
MAC_RX_LATENCY_MODE = 0
MAC_TX_LATENCY_MODE = 0
PMA_DATA_WIDTH = 32
MAC_TX_DATA_WIDTH = 32
MAC_RX_CLK_FREQ_KHZ = 322265
MAC_TX_CLK_FREQ_KHZ = 322265
DUAL_CLOCK_FIFO_RX_ENABLE = 1
DUAL_CLOCK_FIFO_TX_ENABLE = 0
SYNC_DUAL_CLOCK_RX_ENABLE = 0
SYNC_DUAL_CLOCK_TX_ENABLE = 0
_images/enyx_macpcs_10g_notx.svg

Figure 8 Enyx 64b66b MAC/PCS ULL architecture with only the Rx Dual clock FIFO : ‘Rx FIFO’

Tx & Rx FIFO configuration

PCS_RX_LATENCY_MODE = 0
PCS_TX_LATENCY_MODE = 0
MAC_RX_LATENCY_MODE = 0
MAC_TX_LATENCY_MODE = 0
PMA_DATA_WIDTH = 32
MAC_TX_DATA_WIDTH = 32
MAC_RX_CLK_FREQ_KHZ = 322265
MAC_TX_CLK_FREQ_KHZ = 322265
DUAL_CLOCK_FIFO_RX_ENABLE = 1
DUAL_CLOCK_FIFO_TX_ENABLE = 1
SYNC_DUAL_CLOCK_RX_ENABLE = x
SYNC_DUAL_CLOCK_TX_ENABLE = x
_images/enyx_macpcs_10g.svg

Figure 9 Enyx 64b66b MAC/PCS ULL architecture with both Tx and Rx Dual clock FIFOs : ‘Tx & Rx FIFO’

Note

Using Tx and Rx FIFO permit to change the data width and the clock frequency in the User Logic as long as 10G throughput is respected.

FPGA compiler constraints

Below are the constraints that are provided to the FPGA compiler tools for the resources and working frequencies estimations.

Virtex UltraScale+ (-2 speed grade) targets

################ FPGA ####################
set_property part xcvu9p-flgb2104-2-e [current_project]

set_property strategy Flow_PerfOptimized_high [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.NO_SRLEXTRACT true [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.ASSERT true [get_runs synth_*]

{% raw %}
set_msg_config -new_severity {INFO} -id {Constraints 18-514} -string {{CRITICAL WARNING: [Constraints 18-514] set_max_delay: Path segmentation by forcing} {inst_startupe3_flash_prim}}
set_msg_config -new_severity {INFO} -id {Constraints 18-515} -string {{CRITICAL WARNING: [Constraints 18-515] set_max_delay: Path segmentation by forcing} {inst_startupe3_flash_prim}}
{% endraw %}

Virtex UltraScale+ (-3 speed grade) targets

################ FPGA ####################
set_property part xcvu9p-flgb2104-3-e [current_project]

set_property strategy Flow_PerfOptimized_high [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.NO_SRLEXTRACT true [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.ASSERT true [get_runs synth_*]

{% raw %}
set_msg_config -new_severity {INFO} -id {Constraints 18-514} -string {{CRITICAL WARNING: [Constraints 18-514] set_max_delay: Path segmentation by forcing} {inst_startupe3_flash_prim}}
set_msg_config -new_severity {INFO} -id {Constraints 18-515} -string {{CRITICAL WARNING: [Constraints 18-515] set_max_delay: Path segmentation by forcing} {inst_startupe3_flash_prim}}
{% endraw %}