nxLink Share

Latencies

Setup Configuration

Schematics

_images/latency_setup.png

Figure 1 nxLink Share Latency measure setup

This testing configuration uses different elements:
  • A host Arista 7130 device holding the nxLink Share application (software and FPGA firmware) under measurement.
  • A host FPGA hardware holding a custom application able to generate any desired frame size (basically bytes injection on the wire). It also provides possibility to insert errors on the trunk port for the device under test. Platform is an Arista 7130 used as layer 1 switch to connect all elements of the setup together.
  • A host Arista 7150S used to timestamp traffic and get latency measurements.
  • A capture server responsible for the capture of timestamped input and output forwarded by the timestamping device.

On the Arista 7130, nxLink Share is configured to process incoming user traffic with/without optimizations (Header Compression, Padding Removal) and send over a trunk port (L1 or L2) according to test case.

Trunk traffic then cross a WAN emulator which can be used to insert errors on the line. The errors are set for 1 bit per frame, with any rate of frame.

Traffic then cross back from trunk to user.

Frames are injected to nxLink Share and also tapped to an Arista 7150S with timestamping feature (input stream).

Frames that output on the egress of nxLink Share are also sent to the timestamping device (output stream).

Both input and output streams are then captured. The device latency can therefore be obtained by computing the difference between output timestamps and input timestamps.

Devices list

Device under test (DUT) characteristics
Hardware Platform Package name & version OS
Arista 7130LB nxlinkplatform-share-vu9pl_1g (3.0.1-335.release) MOS 0.27.1_core
Arista 7130LB nxlinkplatform-share-vu9pl_10g (3.0.1-339.release) MOS 0.27.1_core
Arista 7130L nxlinkplatform-share-vu7p_10g (3.0.1-584.release) MOS 0.27.1_core
Traffic generator and layer 1 device characteristics
Hardware Platform Firmware name & version OS
Arista 7130EB top-v19_1-m48e_emu_xcvu9p-mem_eth-nxlink_setup_tools (105196) MOS 0.23.0_core
Capture server characteristics
Hardware Platform Network card OS
Intel Xeon E5/Core i7 Solarflare SFC9120 10G adapter Debian 9.9 (Stretch)
Timestamping device model and version
Model OS Port speed Timestamping resolution Timestaping trigger
Arista 7150S EOS 4.14.6M 10G 2.857 nanoseconds First byte of FCS

Measures Scenario

Each measure is the average latency measurement over 100 times the same frame injected.

There are 3 possible trunk configurations that strongly impact latency, mainly due to serialization delay of the trunk speed:
  • 1G Layer 1 with specific encoding (64b66b instead of standard 1G 8b10b, compatible mainly with millimeter waves radios)
  • 10G Layer 2 (compatible with radios providing a 10G Layer 2, mainly with Aviat ULLM radios)
  • 10G Layer 1 (compatible with L1 radios providing a 10G interface, for example SAF radios)
Each of these trunk configuration are tested with two compression profiles:
  • no compression : no specific optimization is performed on incoming user frames
  • header compression & padding removal (HCPR): MAC addresses of user frame are compressed and padding is removed for frame with EtherType field value set as size of payload and with value inferior or equal to 46
Each of those compression profile are applied with 6 config:
  • no WAN emulation : trunk interface is connected directly without going through any device. This is used for reference latencies
  • no FEC : trunk interface is going through our WAN emulator, but configured to generate no corruption and Forward Error Correction is disabled in configuration. This is used as a baseline comparison with other setup involving FEC correction
  • FEC CRC, no corruption : FEC is configured to use CRC Forward Error Correction, but no corruption is generated on WAN emulator. This shows latency of the FEC CRC module when there is no corruption on the radio
  • FEC CRC, corruption : FEC is configured to use CRC Forward Error Correction and corruption is generated on WAN emulator. This shows latency of FEC CRC module when there is corruption on the radio
  • FEC Reed-Solomon, no corruption : FEC is configured to use RS Forward Error Correction, but no corruption is generated on WAN emulator. This shows latency of the FEC RS module when there is no corruption on the radio
  • FEC Reed-Solomon, corruption : FEC is configured to use RS Forward Error Correction and corruption is generated on WAN emulator. This shows latency of FEC RS module when there is corruption on the radio
Each test is performed with 6 different frame sizes:
  • a 1 byte payload frame of 64 bytes (EtherType/size value of 1)
  • a 46 bytes payload frame of 64 bytes (EtherType/size value of 46)
  • a 128 bytes frame (payload is 110 bytes)
  • a 256 bytes frame (payload is 238 bytes)
  • a 512 bytes frame (payload is 494 bytes)
  • a 1024 bytes frame (payload is 1006 bytes)

Those frame sizes are chosen for results to be comparable with RFC 2544 testing reports.

User ports are always 10G (not tunable). Trunk bandwidth is not line rate, for 1G trunk it is setup to 1 Gbps, for 10G trunk, it setup to 5 Gbps

Product latencies

Latency measures

The below latencies includes PMA, PCS and application latencies, which in definition is one-way Wire to Wire. User port is 10G.

1G L1 trunk with 64B66B encoding

User frame (payload) No compression (No WAN emulator) Header compression & padding removal (No WAN emulator)
nxLink Share Enterprise 1G, Arista 7130L/LB (Xilinx Virtex UltraScale+)
64B (1B, padded) 1062 ns 672 ns
64B (46B) 1063 ns 966 ns
128B (110B) 1635 ns 1559 ns
256B (238B) 2692 ns 2617 ns
512B (494B) 4805 ns 4727 ns
1024B (1006B) 9030 ns 8953 ns

10G L2 trunk with padding removal

User frame (payload) No compression (No WAN emulator) Header compression & padding removal (No WAN emulator)
nxLink Share Enterprise 10G, Arista 7130L/LB(Xilinx Virtex UltraScale+)
64B (1B, padded) 622 ns 570 ns
64B (46B) 624 ns 609 ns
128B (110B) 794 ns 786 ns
256B (238B) 1039 ns 1032 ns
512B (494B) 1530 ns 1523 ns
1024B (1006B) 2515 ns 2506 ns
nxLink Share Legacy M48/M32 (Arria 10)
64B (1B, padded) 810 ns 768 ns
64B (46B) 810 ns 795 ns
128B (110B) 993 ns 997 ns
256B (238B) 1241 ns 1244 ns
512B (494B) 1738 ns 1741 ns
1024B (1006B) 2729 ns 2733 ns

10G trunk in layer 1 compatibility

User frame (payload) No compression (No WAN emulator) Header compression & padding removal (No WAN emulator)
nxLink Share Enterprise 10G, Arista 7130L/LB(Xilinx Virtex UltraScale+)
64B (1B, padded) 611 ns 555 ns
64B (46B) 614 ns 597 ns
128B (110B) 776 ns 772 ns
256B (238B) 1018 ns 1011 ns
512B (494B) 1497 ns 1490 ns
1024B (1006B) 2457 ns 2450 ns
nxLink Share Legacy M48/M32 (Arria 10)
64B (1B, padded) 799 ns 746 ns
64B (46B) 799 ns 787 ns
128B (110B) 979 ns 985 ns
256B (238B) 1219 ns 1225 ns
512B (494B) 1699 ns 1704 ns
1024B (1006B) 2659 ns 2664 ns

Forward Error Correction (latency difference same for all type of trunk)

User frame (payload) No Forward Error Correction FEC CRC, no corruption FEC CRC, corruption FEC Reed-Solomon, no corruption FEC Reed-Solomon, corruption
nxLink Share Enterprise 10G, Arista 7130L/LB(Xilinx Virtex UltraScale+)
64B (1B, padded) + 0 ns + 65 ns + 63 ns + 60 ns + 130 ns
64B (46B) + 0 ns + 62 ns + 59 ns + 66 ns + 120 ns
128B (110B) + 0 ns + 59 ns + 55 ns + 59 ns + 115 ns
256B (238B) + 0 ns + 59 ns + 64 ns + 69 ns + 130 ns
512B (494B) + 0 ns + 58 ns + 55 ns + 79 ns + 132 ns
1024B (1006B) + 0 ns + 60 ns + 54 ns + 99 ns + 157 ns
nxLink Share Legacy M48/M32 (Arria 10)
64B (1B, padded) + 0 ns + 75 ns + 73 ns N/A N/A
64B (46B) + 0 ns + 75 ns + 71 ns N/A N/A
128B (110B) + 0 ns + 74 ns + 71 ns N/A N/A
256B (238B) + 0 ns + 74 ns + 71 ns N/A N/A
512B (494B) + 0 ns + 74 ns + 72 ns N/A N/A
1024B (1006B) + 0 ns + 75 ns + 72 ns N/A N/A

Congestion impacts on latency

The nxLink Share system is optimized for low latency. It ensures that ingress landside-to-trunk and egress trunk-to-landside transmissions are done as fast as possible.

_images/congestions.dia.png

Figure 2 System overview

But some congestion situations on the ingress part, described below, can increase the overall latency of the system.

User frames queueing

Per-user queues

When the user sends frames using more bandwidth than what he’s been allocated, the frames are queued. This queue is specific for each user, so that one user respecting its bandwidth will not be delayed at this point by another user sending too much traffic.

Queue size

The queue size is configurable for each user. Increasing the queue size will allow the system to absorb user bursts that exceed bandwidth. The latency will be increased for the last frames of the bursts, that would be sent once the first user frames were emptied from the queue. Setting a queue size of 0 means that each frame that exceeds the allocated user bandwidth is dropped.

Priorization

The User packet priority mechanism can be used when a user wants to bypass its own queue. But the high priority frames of one user will not bypass frames from another one.

Monitoring

User queues can be monitored using the software interface to show the actual filling rate of each queue (enyxShareUserPriorityRXBufferUsage, enyxShareUserPriorityRXBufferMaxUsage and enyxShareUserPriorityRxBufferTotalSize)

Scheduler congestion

The first frame of the queue for each user is presented to the scheduler module.

Interleaving user fragments
_images/scheduler_no_congestion.dia.png

Figure 3 Case 1: Scheduler with no congestion

This diagram presents a case where several users send frames, but not at the same time. Fragments are sent on the trunk without one user impacting the others.

In the event where several user have fragments ready to be sent on trunk, the scheduler will select which one will be transmitted first and delay the others.

_images/scheduler_congestion.dia.png

Figure 4 Case 2: Scheduler with inter-users congestion

The worst case scenario here is when all users send a frame at the same time. One user will be the last and delayed by the fragments of all other users.

In this scenario, the different user fragments are sent without gap, and interleaved to ensure that one user is not prioritized over another. The trunk is used at maximum capacity.

Single user burst mode

When only one user has a frame ready to be sent on trunk, the resulting fragments are sent in burst mode, without gap on the trunk.

_images/scheduler_burst.dia.png

Figure 5 Case 3: Scheduler in burst mode

In this case, there is no gap between fragments sent on the trunk. The trunk is also used at maximum capacity, but there’s no impact on latency delay between users.

Monitoring

The system can use the maximum channel capacity without impact of one user delaying the others (case 3). It can also send user frames interleaved without delaying each other (case 1). The congestion situation occurs when there’s interleave between user fragments and a burst with frames sent without gap (case 2).

This can be monitored in the device using the interleaving congestion ratio: it provides the ratio of the number of fragments sent with interleave (enyxShareChannelTxCongestionRatioUsage ) on the total number of fragments sent without gaps (enyxShareChannelTxCongestionRatioTotal).

Overprovisioning channel

In the nxLink Share system, the channel bandwidth can be distributed between users. If the sum of user bandwidths does not exceed the channel bandwidth, all users respecting their inter-frame delay (IFD) will be able to send frames on the trunk.

The configuration where the sum of user bandwidths exceed the channel bandwidth is referred to as “overprovisioning”. In this scenario, there could be events where a user is not able to send a frame respecting the IFD: the channel is not available in time, as it was busy sending other user frames.

The system counts events where some user frames were delayed by overprovisonned channels (enyxShareUserExtendedCongestionCount).

Trunk speed limitation

The scheduler module decides when fragments can be sent based on the allocated bandwidth on channels.

If more bandwidth is allocated on the channel than what can be physically provided by the trunk interface, there can be scenarios where the scheduler triggers a new fragment transmission while the last one is still being transmitted by the physical interface.

This scenario can be avoided by using the channel busy behaviour system configuration, to either drop frames that can’t be transmitted, or slow down the scheduler to ensure that frames are transmitted correctly, possibly delaying user traffic.

Latency monitoring

The latency can be monitored in the nxLink Share interface.

Device latency

_images/device_latency.dia.png

Figure 6 Ingress and Egress latency

The system can determine the device latency corresponding to the sum of ingress and egress latencies for user frames (excluding trunk latency in calculation).

A polling is done periodically (every few seconds) and shows for each user the maximum latency of its frames (enyxShareUserLatency).

Trunk Round Time Trip

This measure indicates the latency of the trunk link. It corresponds to a round trip on the trunk only.

Measure

The system is automatically sending RTT monitoring packets to measure the total round-trip time trunk link latency.

  • These packets are timestamped and sent from the device A, bounced in the remote device B back to the device A and timestamped again.
  • The RTT latency packets period is configurable, and is set to 1000 ms by default.
  • RTT latency measurements results are available in the web interface
RTT frame size

The RTT latency packet needs at least 8 bytes payload to store the timestamp data.

But to help generating RTT measures that better reflect the actual usage of the trunk link, the size of the RTT measure frame can be adjusted by adding some padding.

The resulting size of the frame outputted by nxLink Share depends on the radio layer mode.

_images/rtt_latency_frames_generated.dia.png

Figure 7 RTT latency frames generated by nxLink Share

But even for radios in Layer 2 mode with padding removal, only 8 bytes and padding are sent on the link thanks to radio optimizations (MAC addresses removal and padding removal), plus the eventual additional radio overhead.

_images/rtt_latency_frames_accounted.dia.png

Figure 8 RTT latency frames actually sent on the radio link

More details about the radio transmission are given in Radio transmission).

The size of the RTT frame payload (highlighted in red in the schema) can be modified in the channel configuration (RTT monitoring key frame size).

For example, this value could be set to system fragment size.

Warning

When using the RTT monitoring feature, measure frames are injected on the trunk, that could possibly be in concurrence with user traffic.

The channel configuration screen on the web interface allowing to set monitoring period and frame size displays the bandwidth that will be generated on the trunk by RTT measures.

For more information about the solution, please visit our documentation portal or contact our support team.


Copyright © Enyx SA 2011-2021

Proprietary Notice

Words and logos marked with ® or ™ are registered trademarks or trademarks owned by Enyx SA. Other brands and names mentioned herein may be the trademarks of their respective owners. Neither the whole nor any part of the information contained in, or the product described in, this document may be adapted or reproduced in any material form except with the prior written permission of the copyright holder. The product described in this document is subject to continuous developments and improvements. All particulars of the product and its use contained in this document are given by Enyx in good faith. This document is provided “as is” with no warranties whatsoever, including any warranty of merchantability, non infringement, fitness for any particular purpose, or any warranty otherwise arising out of any proposal, specification, or sample. This document is intended only to assist the reader in the use of the product. Enyx shall not be liable for any loss or damage arising from the use of any information in this document, or any error or omission in such information, or any incorrect use of the product. Nor shall Enyx be liable for infringement of proprietary rights relating to use of information in this document. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted herein.