Variant Annotators

Introduction

Annotates variants in MAF with OncoKB™ annotation. Supports both python2 and python3.

Please run the commands below to see usage details in terminal.

python MafAnnotator.py -h
python FusionAnnotator.py -h
python CnaAnnotator.py -h
python ClinicalDataAnnotator.py -h
python OncoKBPlots.py -h

We recommend processing VCF files by vcf2maf with MSK override isoforms before using the MafAnnotator here.

Please go to OncoKB™ Annotator Github Repository to see source code and more details.

OncoKB™ API Token

When you run MafAnnotator.py, FusionAnnotator.py and CnaAnnotator.py, you need a token before accessing the OncoKB™ data via its web API. Please visit OncoKB™ Data Access Page for more information about how to register an account and get an OncoKB™ API token. With the token listed under OncoKB™ Account Settings Page, you could use it in the following format.

python ${FILE_NAME.py} -i ${INPUT_FILE} -o ${OUTPUT_FILE} -b ${ONCOKB_API_TOKEN}

Python Examples

MAF Annotator

When you type python MafAnnotator.py -h in terminal, you can see all Python command parameters as below. -i <input MAF file>, -o <output MAF file> and-b oncokb_api_bear_token are required.

MafAnnotator.py -i <input MAF file> -o <output MAF file> [-p previous results] [-c <input clinical file>] [-s sample list filter] [-t <default tumor type>] [-u oncokb-base-url] [-b oncokb_api_bear_token] [-a]
Essential MAF columns (case insensitive):
    HUGO_SYMBOL: Hugo gene symbol
    VARIANT_CLASSIFICATION: Translational effect of variant allele
    TUMOR_SAMPLE_BARCODE: sample ID
    AMINO_ACID_CHANGE: amino acid change
    PROTEIN_START: protein start
    PROTEIN_END: protein end
    PROTEIN_POSITION: can be used instead of PROTEIN_START and PROTEIN_END (in the output of vcf2map)
Essential clinical columns:
    SAMPLE_ID: sample ID
    ONCOTREE_CODE: tumor type code from oncotree (oncotree.mskcc.org)
Cancer type will be assigned based on the following priority:
    1) ONCOTREE_CODE in clinical data file
    2) ONCOTREE_CODE exist in MAF
    3) default tumor type (-t)
Default OncoKB™ base url is http://oncokb.org.
Use -a to annotate mutational hotspots

Fusion Annotator

When you type python FusionAnnotator.py -h in terminal, you can see all Python command parameters as below. The required parameters is the same with MAF Annotator.

FusionAnnotator.py -i <input Fusion file> -o <output Fusion file> [-p previous results] [-c <input clinical file>] [-s sample list filter] [-t <default tumor type>] [-u oncokb-base-url] [-b oncokb_api_bear_token]
  Essential Fusion columns (case insensitive):
    HUGO_SYMBOL: Hugo gene symbol
    VARIANT_CLASSIFICATION: Translational effect of variant allele
    TUMOR_SAMPLE_BARCODE: sample ID
    FUSION: amino acid change, e.g. "TMPRSS2-ERG fusion"
  Essential clinical columns:
    SAMPLE_ID: sample ID
    ONCOTREE_CODE: tumor type code from oncotree (oncotree.mskcc.org)
  Cancer type will be assigned based on the following priority:
     1) ONCOTREE_CODE in clinical data file
     2) ONCOTREE_CODE exist in Fusion
     3) default tumor type (-t)
  Default OncoKB™ base url is http://oncokb.org

CNA Annotator

When you type python CnaAnnotator.py -h in terminal, you can see all Python command parameters as below. The required parameters is the same with MAF Annotator.

CnaAnnotator.py -i <input CNA file> -o <output CNA file> [-p previous results] [-c <input clinical file>] [-s sample list filter] [-t <default tumor type>] [-u oncokb-base-url] [-b oncokb_api_bear_token]
  Input CNA file should follow the GISTIC output (https://cbioportal.readthedocs.io/en/latest/File-Formats.html#discrete-copy-number-data)
  Essential clinical columns:
    SAMPLE_ID: sample ID
  Cancer type will be assigned based on the following priority:
     1) ONCOTREE_CODE in clinical data file
     2) ONCOTREE_CODE exist in MAF
     3) default tumor type (-t)
  Default OncoKB™ base url is http://oncokb.org

Clinical Data Annotator

When you type python ClinicalDataAnnotator.py -h in terminal, you can see all Python command parameters as below. -i <input clinical file>, -o <output clinical file> and-a <annotated alteration files, separate by ,> are required.

ClinicalDataAnnotator.py -i <input clinical file> -o <output clinical file> -a <annotated alteration files, separate by ,> [-s sample list filter]
  Essential clinical columns:
    SAMPLE_ID: sample ID

OncoKB Plots

When you type python OncoKBPlots.py -h in terminal, you can see all Python command parameters as below. -i <input clinical file> and -o <output clinical file> are required.

OncoKBPlots.py -i <annotated clinical file> -o <output PDF file> [-c <categorization column, e.g. CANCER_TYPE>] [-s sample list filter] [-n threshold of # samples in a category] [-l comma separated levels to include]
  Essential clinical columns:
    SAMPLE_ID: sample ID
    HIGHEST_LEVEL: Highest OncoKB levels
  Supported levels (-l): 
    LEVEL_1,LEVEL_2,LEVEL_3A,LEVEL_3B,LEVEL_4,ONCOGENIC,VUS

Last updated