QED nxAccess Latency Tests

Overview

This latency report details the performance of the Enyx nxAccess QED solution on April 20th 2020, a very active trading day due to the COVID-19 global health crisis.


To demonstrate the capabilities of the nxAccess solution, this report details the latency profile of the solution when triggering up to 8 distinct orders per market data message. Additionally, this report highlights the performance improvement that nxAccess offers when using two distinct 10Gbs connections to send orders to the market.

Setup Configuration

Schematics

_images/nxaccess_diagram.png

This testing configuration uses two different servers:

  • A host server:
    • Hosting an Enyx nxAccess FPGA board running a tick-to-trade algorithm
    • Running a software test application responsible for configuring and monitoring the FPGA during the duration of the test
  • A replay and capture server:

    • Replaying market data captures at defined speeds
    • Running a TCP server to which the nxAccess execution engine connects simulating the exchange connection
    • Capturing the timestamped raw market data and TCP execution traffic forwarded by the Arista 7130 timestamping device

The algorithm running on the nxAccess FPGA board is configured to analyze the trade messages published on the raw market data feed. Depending on the size of these trades, the algorithm can then trigger 1 to 8 distinct orders. Each of these orders will then be sent to the replay and capture server on distinct TCP segments and sessions.


For this test, the algorithm's 8 triggers were configured with a trigger threshold increments of 10 as shown in the table below:


Trigger ID Trigger Threshold TCP Session ID Order Payload Size TCP Segment Size
0 ≥ 0 0 100 Byte 170 Byte
1 ≥ 10 1 100 Byte 170 Byte
2 ≥ 20 2 100 Byte 170 Byte
3 ≥ 30 3 100 Byte 170 Byte
4 ≥ 40 4 100 Byte 170 Byte
5 ≥ 50 5 100 Byte 170 Byte
6 ≥ 60 6 100 Byte 170 Byte
7 ≥ 70 7 100 Byte 170 Byte

For example, with this configuration, when the algorithm receives a trade message with a size of 25, it will trigger the following orders:

  • Trigger 0 → Order of 100 bytes on TCP session 0
  • Trigger 1 → Order of 100 bytes on TCP session 1
  • Trigger 2 → Order of 100 bytes on TCP session 2

Using a layer 1 switch, both the raw market data feed and the TCP segments are forwarded to a timestamping device (Arista 7130) for precise timestamping and then forwarded to the capture server for storage and post-analysis.


The analysis of the timestamped data is used for the latency measurements and the extraction of the actual rate at which the replay server was able to replay the raw market data onto the network.

Host Server Characteristics

CPU 1 model & version Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
CPU 1 PCIe devices N/A
CPU 2 model & version Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
CPU 2 PCIe devices Enyx FPB2

Replay & Capture Server Characteristics

CPU 1 model & version Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz
CPU 1 PCIe devices Solarflare SFC9220 10G Adapter Solarflare SFC9220 10G Adapter
CPU 2 model & version Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz
CPU 2 PCIe devices N/A

Layer 1 Device Model & Version

Model Metamako MetaMux 48 with K-Series Plus
Version 0.17.4
Port Speed 1Gb/10Gb
Port to port advertised latency 5 ns

Layer 2 Device Model & Version

Model Arista 7150s
Version EOS-4.14.6M
Port Speed 1Gb/10Gb
Timestamping resolution 2.857 nanoseconds
Timestamping trigger First Byte of the FCS

Enyx Solution Characteristics:

FPGA card version Enyx FPB2
Firmware version QED50 2.8.1
Software version libenyxmd 5.1.2
Driver version hfp 2.4.2
Thread Binding CPU ID 5 (Numa 0)
Card NUMA node 1

Capture Characteristics

Capture QED_2020-04-20_20minutes
Packet Count 3 301 177
Channel Count 0
Capture Duration 0:19:59
Beggining Date Mon, 20 Apr 2020 15:13:00
End date Mon, 20 Apr 2020 15:32:59
Timestamping resolution Microseconds
Packet Rate Peak (s) 51 160 pkt/s
Packet Rate Peak (100 ms) 51 160 pkt/s
Packet Rate Peak (10 ms) 153 100 pkt/s
Packet Rate Peak (1 ms) 225 000 pkt/s
Bit Rate Peak (s) 33.04 Mbps
Bit Rate Peak (100 ms) 93.98 Mbps
Bit Rate Peak (10 ms) 288.71 Mbps
Bit Rate Peak (1 ms) 423.30 Mbps
_images/QED_2020-04-20_20minutes_global_Packet_Rate.png _images/QED_2020-04-20_20minutes_global_Bit_Rate.png

Test Overview

This report will display the latency profile of the nxAccess QED solution under the following conditions:

  • Replay speed: X1
  • Scenarios:
    • 8 TCP sessions on a single TCP stack connected to one 10Gbs port
    • 8 TCP sessions on 2 TCP stack connected to two 10Gbs ports

For the test scenario using 2 TCP stacks, the trigger IDs 0, 2, 4 and 6 publish orders on the TCP stack 0 and the trigger IDs 1, 3, 5 and 7 publish on the TCP stack 1. This second scenario illustrates how much the output bandwidth of the solution affects its latency profile.

Scenario #1 : 8 TCP sessions on 1x10G

Replay #1: 1X Market Rate

Conditions

Instrument Count 60
Subscribed Channel Count 21
Instruments List 60 most active instruments
Lane Arbitration A only
Book Builder Configuration Delta updates
Feed Handler Book Depth 5
Requested Replay Rate 1

Observed Replay Rates

_images/Latency_nxAccess-QED-to-iLink3_Replay-1x_8-sessions_1-TCP-port_session-breakdown.setup2_global_Packet_Rate.png
Type Rate Replay Ratio
Average Packet Rate (1s resolution) 1 724 pkt/s 0.63
Average Packet Rate (100ms resolution) 2 164 pkt/s 0.63
Average Packet Rate (10ms resolution) 4 075 pkt/s 0.61
Average Packet Rate (1ms resolution) 11 346 pkt/s 0.88
Peak Packet Rate (1s resolution) 11 778 pkt/s 0.65
Peak Packet Rate (100ms resolution) 33 960 pkt/s 0.66
Peak Packet Rate (10ms resolution) 92 500 pkt/s 0.60
Peak Packet Rate (1ms resolution) 186 000 pkt/s 0.83
_images/Latency_nxAccess-QED-to-iLink3_Replay-1x_8-sessions_1-TCP-port_session-breakdown.setup2_global_Bit_Rate.png
Type Rate Replay Ratio
Average Bit Rate (1s resolution) 2.84 Mbps 0.62
Average Bit Rate (100ms resolution) 3.64 Mbps 0.62
Average Bit Rate (10ms resolution) 7.10 Mbps 0.59
Average Bit Rate (1ms resolution) 20.23 Mbps 0.86
Peak Bit Rate (1s resolution) 20.45 Mbps 0.62
Peak Bit Rate (100ms resolution) 57.76 Mbps 0.61
Peak Bit Rate (10ms resolution) 166.48 Mbps 0.58
Peak Bit Rate (1ms resolution) 356.51 Mbps 0.84

Results

Sample Distribution

Trigger ID Min 25% 50% Mean 90% 99% 99.9% 99.99% 99.999% Max Sample Count
0 0.43 0.43 0.44 0.49 0.60 0.75 1.29 2.24 2.83 2.83 81 439
1 0.57 0.58 0.74 0.70 0.75 1.66 2.66 2.98 2.98 2.98 3 557
2 0.72 0.73 0.89 0.87 0.90 1.94 2.99 3.13 3.13 3.13 1 619
3 0.87 0.88 1.04 1.03 1.04 2.09 3.27 3.27 3.27 3.27 915
4 1.02 1.03 1.19 1.17 1.19 2.08 3.11 3.11 3.11 3.11 624
5 1.16 1.17 1.33 1.31 1.34 2.36 3.25 3.25 3.25 3.25 461
6 1.31 1.32 1.48 1.46 1.48 2.50 3.40 3.40 3.40 3.40 386
7 1.46 1.47 1.63 1.60 1.63 2.65 3.55 3.55 3.55 3.55 336
General Statistics  
Input Raw Packet Processed Count 2 063 200
Input FIFO maximal usage 0.76 %
Input Raw Market Data Packet Drop 0
Output Total Order Count 89 337
Output Order Count on TCP Stack 1 89 337
Output Order Count on TCP Stack 2 0
Latency Distribution Over Time (50 percentile)

NOTE: This graph shows the evolution of the 50 percentile latency over the duration of the test for each trigger ID.

Scenario #2 : 8 TCP sessions on 2x10G

Replay #1: 1X Market Rate

Conditions

Instrument Count 60
Subscribed Channel Count 21
Instruments List 60 most active instruments
Lane Arbitration A only
Book Builder Configuration Delta updates
Feed Handler Book Depth 5
Requested Replay Rate 1

Observed Replay Rates

_images/Latency_nxAccess-QED-to-iLink3_Replay-1x_8-sessions_2-TCP-port_session-breakdown.setup2_global_Packet_Rate.png
Type Rate Replay Ratio
Average Packet Rate (1s resolution) 1 724 pkt/s 0.63
Average Packet Rate (100ms resolution) 2 165 pkt/s 0.63
Average Packet Rate (10ms resolution) 4 083 pkt/s 0.61
Average Packet Rate (1ms resolution) 11 350 pkt/s 0.88
Peak Packet Rate (1s resolution) 11 779 pkt/s 0.65
Peak Packet Rate (100ms resolution) 33 970 pkt/s 0.66
Peak Packet Rate (10ms resolution) 92 500 pkt/s 0.60
Peak Packet Rate (1ms resolution) 187 000 pkt/s 0.83
_images/Latency_nxAccess-QED-to-iLink3_Replay-1x_8-sessions_2-TCP-port_session-breakdown.setup2_global_Bit_Rate.png
Type Rate Replay Ratio
Average Bit Rate (1s resolution) 2.84 Mbps 0.62
Average Bit Rate (100ms resolution) 3.64 Mbps 0.62
Average Bit Rate (10ms resolution) 7.12 Mbps 0.60
Average Bit Rate (1ms resolution) 20.24 Mbps 0.86
Peak Bit Rate (1s resolution) 20.46 Mbps 0.62
Peak Bit Rate (100ms resolution) 57.76 Mbps 0.61
Peak Bit Rate (10ms resolution) 166.53 Mbps 0.58
Peak Bit Rate (1ms resolution) 354.56 Mbps 0.84

Results

Sample Distribution

Trigger ID Min 25% 50% Mean 90% 99% 99.9% 99.99% 99.999% Max Sample Count
0 0.43 0.43 0.44 0.49 0.60 0.75 1.20 1.59 2.32 2.32 81 439
1 0.51 0.51 0.60 0.61 0.68 1.15 1.56 1.77 1.77 1.77 3 557
2 0.57 0.58 0.74 0.71 0.75 1.35 1.94 1.96 1.96 1.96 1 619
3 0.65 0.66 0.82 0.79 0.83 1.31 1.91 1.91 1.91 1.91 915
4 0.72 0.73 0.89 0.86 0.90 1.33 1.79 1.79 1.79 1.79 624
5 0.80 0.81 0.97 0.94 0.98 1.41 1.86 1.86 1.86 1.86 461
6 0.87 0.88 1.04 1.00 1.04 1.49 1.94 1.94 1.94 1.94 386
7 0.95 0.96 1.12 1.08 1.12 1.56 2.01 2.01 2.01 2.01 336

NOTE: Further analysis of this test shows that the maximum latency recorded for the trigger ID 0 is due to a trigger occuring immediately after a burst of triggers on all sessions. As a result, the order of session 0 required buffering until all previously triggered orders were sent on the wire. As per the configuration of the algorithm and confirmed by the "Sample Count" column, the trigger ID 0 is very active compared to triggers 1 to 7 with trigger 0 accounting for more than 91% of the activity.


General Statistics  
Input Raw Packet Processed Count 2 063 200
Input FIFO maximal usage 0.76 %
Input Raw Market Data Packet Drop 0
Output Total Order Count 89 337
Output Order Count on TCP Stack 1 84 068
Output Order Count on TCP Stack 2 5 269
Latency Distribution Over Time (50 percentile)

NOTE: This graph shows the evolution of the 50 percentile latency over the duration of the test for each trigger ID.