1. Context

This document provides performance measures of the Enyx MAC PCS 10G core in multiple configurations.

The measures are automatically imported from simulations of the Enyx MAC PCS 10G core.

Important : Hardware measures provide the same results for Xilinx targets. However, Altera simulations are not realistic regarding their PMA, therefore currently provide wrong latency estimations of their PMA.

The vendor PMA latencies are provided as is, therefore our total Round Trip Time (RTT) latencies are affected by these minimum and maximum latencies.

The document is split up as follows :

2. Resources & Working frequencies

Hereafter is a table showing MAC/PCS 10G implementation results on our supported FPGA families. The compilations only include the MAC and PCS logic, therefore providing a maximum reachable frequency for each configuration. These results include the MAC PCS 10G resource usage along with the timing margins, averaged from 10 different runs on either Vivado 2018.3 or Quartus 16.0 Standard, and the parameters applied to the tools are provided Section 4 Constraints applied to FPGA compiler tools for Resources and Working frequencies estimations.

The results are provided with the following MAC PCS 10G generic parameters, and are the recommended values for working typical scenarios :

  • Stratix V targets :

    • PCS_RX_LATENCY_MODE = 1

    • PCS_TX_LATENCY_MODE = 2

    • MAC_RX_LATENCY_MODE = 0

    • MAC_TX_LATENCY_MODE = 0

  • Arria 10 targets :

    • PCS_RX_LATENCY_MODE = 1

    • PCS_TX_LATENCY_MODE = 2

    • MAC_RX_LATENCY_MODE = 0

    • MAC_TX_LATENCY_MODE = 0

  • Virtex UltraScale + targets :

    • PCS_RX_LATENCY_MODE = 0

    • PCS_TX_LATENCY_MODE = 0

    • MAC_RX_LATENCY_MODE = 0

    • MAC_TX_LATENCY_MODE = 0

The pma_tx_clk and pma_rx_clk clocks are running at 322.265625 MHz.

Table 2.1 Resources and timing summary.

Family

Device

Speed Grade

Tx clk

Rx clk

Data width

Slack (ns)

Logic

%

Registers

%

Memory

%

Arria 10

GX 1150

1

250

250

64b

0.533

1484

1

2170

1

0

0

Arria 10

GX 1150

1

pma_tx_clk

pma_rx_clk

32b

0.454

1323

1

1423

1

0

0

Arria 10

GX 1150

1

pma_tx_clk

pma_tx_clk

32b

0.482

1376

1

1509

1

0

0

Stratix V

GX A7

2

250

250

64b

0.334

1425

1

2035

1

0

0

Stratix V

GX A7

2

pma_tx_clk

pma_rx_clk

32b

0.431

1218

1

1492

1

0

0

Stratix V

GX A7

2

pma_tx_clk

pma_tx_clk

32b

0.348

1272

1

1534

1

0

0

Virtex US+

VU9P

2

350

350

32b

0.166

1753

1

1284

1

0

0

Virtex US+

VU9P

2

pma_tx_clk

pma_rx_clk

32b

0.277

1600

1

1074

1

0

0

Virtex US+

VU9P

2

pma_tx_clk

pma_tx_clk

32b

0.204

1672

1

1179

1

0

0

Virtex US+

VU9P

3

350

350

32b

0.274

1741

1

1283

1

0

0

Virtex US+

VU9P

3

pma_tx_clk

pma_rx_clk

32b

0.37

1585

1

1073

1

0

0

Virtex US+

VU9P

3

pma_tx_clk

pma_tx_clk

32b

0.269

1656

1

1181

1

0

0

  • Logic

    • in K ALM for Intel

    • in K LUT for Xilinx

  • Registers in K

  • Memory as block memory usage

    • M20k for Intel

    • 36k for Xilinx

3. Latency Testing

In order to measure the latency of the Enyx MAC PCS 10G IP Core, the following test is performed:

  • Random packets are sent out from a pattern generator core to the Enyx MAC PCS 10G IP Core, one by one, from 1 to 64 Bytes with a step of 1 Byte.

  • Packets are immediately timestamped when they enter the MAC PCS 10G IP Core, at the Start of Packet (SoP).

  • Packets are then sent to the vendor PHY, and perform a loopback and sent back to the MAC PCS 10G IP Core.

  • As soon as the packets are received on the Enyx MAC PCS 10G IP Core Rx output interface, a second timestamp is performed, at the Start of Packet (SoP).

  • Both timestamps are compared and provides the MAC PCS 10G RTT latency for each packet size.

_images/MACPCS_perfs.svg

Figure 3.1 Enyx MAC PCS 10G Latency Testing Scenario

Latencies are all measured from Start-of-Packet (SoP) to Start-of-Packet (SoP).

All latency tests are performed with the following parameters :

  • The PMA is configured with a data width of 32bits running at 322.265625 MHz.

  • Fixed parameters throughout all scenarios and FPGA targets :

    • MAC_TX_PADDING_SIZE = 64

    • DUAL_CLOCK_FIFO_RX_DEPTH = 16

    • DUAL_CLOCK_FIFO_TX_DEPTH = 16

    • MAC_RX_CRC32_DISABLE = false

    • MAC_RX_CRC32_REMOVE = true

    • MAC_TX_CRC32_DISABLE = false

    • MAC_TX_CRC32_CORRUPTED_ON_ERROR = false

    • MAC_TX_CRC32_ADD = true

    • MAC_RX_MIN_PKT_LENGTH = 64

  • Parameters specific to each FPGA target (values are provided in each specific section):

    • Stratix V targets :

      • PCS_RX_LATENCY_MODE = 1

      • PCS_TX_LATENCY_MODE = 2

      • MAC_RX_LATENCY_MODE = 0

      • MAC_TX_LATENCY_MODE = 0

    • Arria 10 targets :

      • PCS_RX_LATENCY_MODE = 1

      • PCS_TX_LATENCY_MODE = 2

      • MAC_RX_LATENCY_MODE = 0

      • MAC_TX_LATENCY_MODE = 0

    • Virtex UltraScale+ targets :

      • PCS_RX_LATENCY_MODE = 0

      • PCS_TX_LATENCY_MODE = 0

      • MAC_RX_LATENCY_MODE = 0

      • MAC_TX_LATENCY_MODE = 0

  • Parameters specific to each scenario throughout all FPGA targets :

    • User MAC Tx and Rx data widths are set either to 64 bits or 32 bits depending on the scenario such as :

      • MAC_RX_DATA_WIDTH = 64 or 32

      • MAC_TX_DATA_WIDTH = 64 or 32

    • User clock frequency is set to either 250, 322 or 350 MHz :

      • MAC_RX_CLK_FREQ_KHZ = 250000, 322266 or 350000

      • MAC_TX_CLK_FREQ_KHZ = 250000, 322266 or 350000

    • If both dual clock fifos are enabled, which is the default behavior of our latency scenarios, then :

      • DUAL_CLOCK_FIFO_RX_ENABLE = true

      • DUAL_CLOCK_FIFO_TX_ENABLE = true

      _images/enyx_macpcs_10g.svg

      Figure 3.2 Enyx MAC PCS 10G architecture with both Dual Clock FIFOs enabled

    • If only the Rx Dual Clock fifo is enabled, specified in the vector names by “No Tx FIFO”, then :

      • DUAL_CLOCK_FIFO_RX_ENABLE = true

      • DUAL_CLOCK_FIFO_TX_ENABLE = false

      _images/enyx_macpcs_10g_notx.svg

      Figure 3.3 Enyx MAC PCS 10G architecture with no Dual clock FIFO Tx

    • If both dual clock fifos are disabled, specified in the vector names by “No DC FIFO”, then :

      • DUAL_CLOCK_FIFO_RX_ENABLE = false

      • DUAL_CLOCK_FIFO_TX_ENABLE = false

      _images/enyx_macpcs_10g_nofifo.svg

      Figure 3.4 Enyx MAC PCS 10G architecture with no Dual clock FIFOs

3.1. Arria 10 GX Targets

_images/MACPCS_10G_a10_Latency.svg

Figure 3.5 Enyx MAC PCS 10G RTT Latency diagram for Arria 10 GX target

Table 3.1 Enyx MAC PCS 10G RTT Latency table for Arria 10 GX target

MAC Payload Size (Bytes)

1

8

16

32

64

250 MHz - DATA_WIDTH = 64b (ns)

72

72

72

72

68

322 MHz - No Tx FIFO - DATA_WIDTH = 32b (ns)

55

52

55

55

52

322 MHz - No DC FIFO - DATA_WIDTH = 32b (ns)

43

40

43

40

40

3.2. Stratix V GX targets

_images/MACPCS_10G_s5_Latency.svg

Figure 3.6 Enyx MAC PCS 10G RTT Latency diagram for Stratix V GX targets

Table 3.2 Enyx MAC PCS 10G RTT Latency table for Stratix V GX target

MAC Payload Size (Bytes)

1

8

16

32

64

250 MHz - DATA_WIDTH = 64b (ns)

52

52

52

52

56

322 MHz - No Tx FIFO - DATA_WIDTH = 32b (ns)

40

37

37

37

37

322 MHz - No DC FIFO - DATA_WIDTH = 32b (ns)

24

24

24

24

24

3.3. Virtex UltraScale + targets

Virtex UltraScale + tests have been run with 20G PMAs.

_images/MACPCS_10G_vusp_Latency.svg

Figure 3.7 Enyx MAC PCS 10G RTT Latency diagram for Virtex UltraScale + targets

Table 3.3 Enyx MAC PCS 10G RTT Latency table for Virtex UltraScale + target

MAC Payload Size (Bytes)

1

8

16

32

64

350 MHz - DATA_WIDTH = 32b (ns)

57

57

57

54

57

322 MHz - No Tx FIFO - DATA_WIDTH = 32b (ns)

43

43

46

46

43

322 MHz - No DC FIFO - DATA_WIDTH = 32b (ns)

31

31

34

34

31

4. Constraints applied to FPGA compiler tools for Resources and Working frequencies estimations

Hereafter are the constraints that are provided to the default FPGA compiler tools for the Resources and Working frequencies estimations.

4.1. Stratix V targets

regexp {[\.0-9]+} $quartus(version) quartus_version
regexp {Full|Standard|Pro} $quartus(version) quartus_edition
set quartus_version_major [lindex [regexp -all -inline {[0-9]+} $quartus_version] 0]
set quartus_version_minor [lindex [regexp -all -inline {[0-9]+} $quartus_version] 1]


set_global_assignment -name FLOW_ENABLE_IO_ASSIGNMENT_ANALYSIS ON
set_global_assignment -name OPTIMIZATION_TECHNIQUE SPEED
set_global_assignment -name SYNTH_TIMING_DRIVEN_SYNTHESIS ON
set_global_assignment -name OPTIMIZE_HOLD_TIMING "ALL PATHS"
set_global_assignment -name FITTER_EFFORT "STANDARD FIT"
set_global_assignment -name ALLOW_POWER_UP_DONT_CARE OFF
set_global_assignment -name SYNTH_PROTECT_SDC_CONSTRAINT ON

if {$quartus_version_major >= 15} {
    set_global_assignment -name OPTIMIZATION_MODE "HIGH PERFORMANCE EFFORT"
    set_global_assignment -name PROGRAMMABLE_POWER_TECHNOLOGY_SETTING "FORCE ALL USED TILES TO HIGH SPEED"
    set_global_assignment -name PERIPHERY_TO_CORE_PLACEMENT_AND_ROUTING_OPTIMIZATION AUTO
}
if {($quartus_version_major == 16 && $quartus_version_minor == 0) || ($quartus_version_major < 16)} {
    set_global_assignment -name PHYSICAL_SYNTHESIS_REGISTER_DUPLICATION ON
    set_global_assignment -name PHYSICAL_SYNTHESIS_COMBO_LOGIC ON
    set_global_assignment -name PHYSICAL_SYNTHESIS_REGISTER_RETIMING ON
    set_global_assignment -name PHYSICAL_SYNTHESIS_ASYNCHRONOUS_SIGNAL_PIPELINING ON
    set_global_assignment -name PHYSICAL_SYNTHESIS_COMBO_LOGIC_FOR_AREA ON
    set_global_assignment -name PHYSICAL_SYNTHESIS_MAP_LOGIC_TO_MEMORY_FOR_AREA ON
    set_global_assignment -name PHYSICAL_SYNTHESIS_EFFORT EXTRA
}

4.2. Arria 10 targets

regexp {[\.0-9]+} $quartus(version) quartus_version
regexp {Full|Standard|Pro} $quartus(version) quartus_edition
set quartus_version_major [lindex [regexp -all -inline {[0-9]+} $quartus_version] 0]
set quartus_version_minor [lindex [regexp -all -inline {[0-9]+} $quartus_version] 1]

set_global_assignment -name FLOW_ENABLE_IO_ASSIGNMENT_ANALYSIS ON
set_global_assignment -name OPTIMIZATION_TECHNIQUE SPEED
set_global_assignment -name SYNTH_TIMING_DRIVEN_SYNTHESIS ON
set_global_assignment -name OPTIMIZE_HOLD_TIMING "ALL PATHS"
set_global_assignment -name FITTER_EFFORT "STANDARD FIT"
set_global_assignment -name ALLOW_POWER_UP_DONT_CARE ON
set_global_assignment -name SYNTH_PROTECT_SDC_CONSTRAINT ON
set_global_assignment -name PROGRAMMABLE_POWER_TECHNOLOGY_SETTING "FORCE ALL USED TILES TO HIGH SPEED"
set_global_assignment -name PERIPHERY_TO_CORE_PLACEMENT_AND_ROUTING_OPTIMIZATION AUTO
set_global_assignment -name AUTO_GLOBAL_REGISTER_CONTROLS OFF
set_global_assignment -name OPTIMIZE_POWER_DURING_SYNTHESIS OFF
set_global_assignment -name OPTIMIZE_POWER_DURING_FITTING OFF
set_global_assignment -name ALLOW_REGISTER_MERGING ON
set_global_assignment -name ALLOW_REGISTER_RETIMING ON
set_global_assignment -name ALM_REGISTER_PACKING_EFFORT LOW
set_global_assignment -name ROUTER_TIMING_OPTIMIZATION_LEVEL MAXIMUM
set_global_assignment -name ECO_OPTIMIZE_TIMING ON
set_global_assignment -name AUTO_DELAY_CHAINS ON
set_global_assignment -name AUTO_GLOBAL_CLOCK ON

if {$quartus_version_major >= 19} {
    set_global_assignment -name OPTIMIZATION_MODE "HIGH PERFORMANCE EFFORT WITH MAXIMUM PLACEMENT EFFORT"
    set_global_assignment -name GLOBAL_PLACEMENT_EFFORT "MAXIMUM EFFORT"
} elseif {$quartus_version_major >= 16} {
        set_global_assignment -name OPTIMIZATION_MODE "AGGRESSIVE PERFORMANCE"
        set_global_assignment -name PHYSICAL_SYNTHESIS_REGISTER_DUPLICATION ON
        set_global_assignment -name PHYSICAL_SYNTHESIS_COMBO_LOGIC ON
        set_global_assignment -name PHYSICAL_SYNTHESIS_REGISTER_RETIMING ON
        set_global_assignment -name PHYSICAL_SYNTHESIS_ASYNCHRONOUS_SIGNAL_PIPELINING ON
        set_global_assignment -name PHYSICAL_SYNTHESIS_COMBO_LOGIC_FOR_AREA OFF
        set_global_assignment -name PHYSICAL_SYNTHESIS_MAP_LOGIC_TO_MEMORY_FOR_AREA OFF
        set_global_assignment -name PHYSICAL_SYNTHESIS_EFFORT EXTRA
}

4.3. Virtex UltraScale+ (-2 speed grade) targets

################ FPGA ####################
set_property part xcvu9p-flgb2104-2-e [current_project]

set_property STEPS.SYNTH_DESIGN.ARGS.FANOUT_LIMIT 400 [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.FSM_EXTRACTION one_hot [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.KEEP_EQUIVALENT_REGISTERS true [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.RESOURCE_SHARING off [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.NO_LC true [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.SHREG_MIN_SIZE 5 [get_runs synth_*]

set_property strategy Performance_BalanceSLLs [get_runs impl_*]

4.4. Virtex UltraScale+ (-3 speed grade) targets

################ FPGA ####################
set_property part xcvu9p-flgb2104-3-e [current_project]

set_property STEPS.SYNTH_DESIGN.ARGS.FANOUT_LIMIT 400 [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.FSM_EXTRACTION one_hot [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.KEEP_EQUIVALENT_REGISTERS true [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.RESOURCE_SHARING off [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.NO_LC true [get_runs synth_*]
set_property STEPS.SYNTH_DESIGN.ARGS.SHREG_MIN_SIZE 5 [get_runs synth_*]