API Benchmark
Introduction
The OncoKB Development team has conducted API performance tests to identify optimization opportunities and collect key metrics for evaluating overall API capabilities. You’ll find key performance indicators such as response times, throughput, and resource utilization across different endpoints.
Annotate Mutation by HGVS
10/24/2024
This test aims to determine the performance of the annotate/mutations/byHGVSg
endpoint given that the variants have been annotated and cached by Genome Nexus.
Datasets
We have chosen the following studies for benchmarking:
Whole Exome Sequencing Dataset: UCSC Xena: Simple Somatic Mutation (SNVs and indels) - Consensus Coding Mutations
441,309 variants across 2,756 samples
Whole Genome Sequencing Dataset: UCSC Xena: Simple Somatic Mutation (SNVs and indels) - Whole Genome Mutations (Non-US Specimens)
23,159,591 variants across 1,950 samples
Services Setup
These tests will be conducted by replicating the production setup. All configurations can be found here.
Test Setup
We will be using Locust.io to write our performance tests.
As a prerequisite, all variants from WES and WGS datasets have been annotated (and cached in Genome Nexus) prior to benchmarking OncoKB HGVSg endpoint.
Performance Benchmark Results
Test 1: How long does it take to annotate each study using a single thread?
Redis caching was disabled for this test. Each thread was ran sequentially until the entire dataset was annotated.
Annotate 1 thread containing a POST request with 100 variants.
WES Dataset:
441,309 variants: 835 seconds or 14 minutes (528 variants/second)
WGS Dataset:
23,159,591 variants: 17,508seconds or 4hrs 52mins (1,322 variants/second)
Test 2: Do we gain a performance boost using 5 threads instead of 1?
Redis caching was disabled for this test.
Annotate up to 5 threads concurrently, each executing a POST request containing 100 variants.
WES Dataset:
441,309 variants: 151 seconds or 2.51minutes (2,922 variants/second)
WGS Dataset:
23,159,591 variants: 3,482seconds or 58mins (6,652 variants/second)
Increasing the number of threads to five boosted the throughput, allowing for a fivefold increase in the number of variants annotated per second.
Genome Nexus VEP Benchark
When annotating mutations by genomic change or HGVSg, OncoKB uses Genome Nexus to convert these formats to HGVSp for annotation, which leverages Ensembl's Variant Effect Predictor (VEP). Genome Nexus recommends 2 main configuration options for using VEP:
1. Genome Nexus VEP (Local, Recommended)
2. Ensembl REST API (Public)
Both provide a REST API wrapper around the VEP command line interface and can be leveraged depending on the user's performance, security, and convenience needs. Below you can find the results of each tool when annotating POST requests of varying sizes using the vep/human/hgvs
endpoint:
Genome Nexus VEP
1
772ms
755ms
748ms
708ms
723ms
5
1.25s
1.28s
1.14s
1.21s
1.29s
10
1.90s
1.73s
1.90s
1.93s
1.80s
50
4.21s
4.40s
4.36s
4.05s
4.05s
100
6.02s
6.71s
6.33s
6.46s
6.62s
1000
62.82s
62.10s
62.16s
62.10s
62.52s
Ensembl REST API
1
1.79s
1.27s
1.79s
1.31s
1.82s
5
4.54s
3.40s
3.81s
3.82s
3.62s
10
8.01s
6.95s
7.08s
7.60s
6.75s
50
39.52s
37.79s
41.06s
39.93s
38.82s
100
83.28s
87.60s
87.78s
68.82s
72.78s
1000 (Limit is 300)
NA
NA
NA
NA
NA
Last updated
Was this helpful?