# Variant Annotators

## Introduction

Annotates variants in MAF with OncoKB™ annotation. Supports both python2 and python3.

* **MafAnnotator.py**: A Mutation Annotation Format (MAF) file is a tab-delimited text file that lists mutations. MAFAnnotator annotated genes from MAF file by [OncoKB™ Level of Evidences](https://www.oncokb.org/levels) rules.
* **FusionAnnotator.py**:  Annotate fusions by [OncoKB™ Level of Evidences](https://www.oncokb.org/levels) rules.
* **CnaAnnotator.py**: Annotate copy number alterations by [OncoKB Level of Evidences](https://www.oncokb.org/levels) rules.
* **ClinicalDataAnnotator.py**: Annotate clinical data by [OncoKB™ Level of Evidences](https://www.oncokb.org/levels) rules.
* **OncoKBPlots.py**: Draw OncoKB™ Actionability genes graph.

Please run the commands below to see usage details in terminal.

```bash
python MafAnnotator.py -h
python FusionAnnotator.py -h
python CnaAnnotator.py -h
python ClinicalDataAnnotator.py -h
python OncoKBPlots.py -h
```

{% hint style="info" %}
&#x20;We recommend processing VCF files by [vcf2maf](https://github.com/mskcc/vcf2maf/) with [MSK override isoforms](https://github.com/mskcc/vcf2maf/blob/master/data/isoform_overrides_at_mskcc) before using the **MafAnnotator** here.
{% endhint %}

Please go to [OncoKB™ Annotator Github Repository](https://github.com/oncokb/oncokb-annotator) to see source code and more details.

## OncoKB™ API Token

When you run `MafAnnotator.py`, `FusionAnnotator.py` and `CnaAnnotator.py`, you need a token before accessing the OncoKB™ data via its web API. Please visit [OncoKB™ Data Access Page](https://www.oncokb.org/dataAccess) for more information about how to register an account and get an OncoKB™ API token.\
With the token listed under [OncoKB™ Account Settings Page](https://www.oncokb.org/account/settings), you could use it in the following format.

```bash
python ${FILE_NAME.py} -i ${INPUT_FILE} -o ${OUTPUT_FILE} -b ${ONCOKB_API_TOKEN}
```

## Python Examples

### MAF Annotator

When you type `python MafAnnotator.py -h` in terminal, you can see all Python command parameters as below. `-i <input MAF file>`, `-o <output MAF file>` and`-b oncokb_api_bear_token` are required.

```bash
MafAnnotator.py -i <input MAF file> -o <output MAF file> [-p previous results] [-c <input clinical file>] [-s sample list filter] [-t <default tumor type>] [-u oncokb-base-url] [-b oncokb_api_bear_token] [-a]
Essential MAF columns (case insensitive):
    HUGO_SYMBOL: Hugo gene symbol
    VARIANT_CLASSIFICATION: Translational effect of variant allele
    TUMOR_SAMPLE_BARCODE: sample ID
    AMINO_ACID_CHANGE: amino acid change
    PROTEIN_START: protein start
    PROTEIN_END: protein end
    PROTEIN_POSITION: can be used instead of PROTEIN_START and PROTEIN_END (in the output of vcf2map)
Essential clinical columns:
    SAMPLE_ID: sample ID
    ONCOTREE_CODE: tumor type code from oncotree (oncotree.mskcc.org)
Cancer type will be assigned based on the following priority:
    1) ONCOTREE_CODE in clinical data file
    2) ONCOTREE_CODE exist in MAF
    3) default tumor type (-t)
Default OncoKB™ base url is http://oncokb.org.
Use -a to annotate mutational hotspots
```

### Fusion Annotator

When you type python `FusionAnnotator.py -h` in terminal, you can see all Python command parameters as below. The required parameters is the same with MAF Annotator.

```bash
FusionAnnotator.py -i <input Fusion file> -o <output Fusion file> [-p previous results] [-c <input clinical file>] [-s sample list filter] [-t <default tumor type>] [-u oncokb-base-url] [-b oncokb_api_bear_token]
  Essential Fusion columns (case insensitive):
    HUGO_SYMBOL: Hugo gene symbol
    VARIANT_CLASSIFICATION: Translational effect of variant allele
    TUMOR_SAMPLE_BARCODE: sample ID
    FUSION: amino acid change, e.g. "TMPRSS2-ERG fusion"
  Essential clinical columns:
    SAMPLE_ID: sample ID
    ONCOTREE_CODE: tumor type code from oncotree (oncotree.mskcc.org)
  Cancer type will be assigned based on the following priority:
     1) ONCOTREE_CODE in clinical data file
     2) ONCOTREE_CODE exist in Fusion
     3) default tumor type (-t)
  Default OncoKB™ base url is http://oncokb.org
```

### CNA Annotator

When you type python `CnaAnnotator.py -h` in terminal, you can see all Python command parameters as below. The required parameters is the same with MAF Annotator.

```bash
CnaAnnotator.py -i <input CNA file> -o <output CNA file> [-p previous results] [-c <input clinical file>] [-s sample list filter] [-t <default tumor type>] [-u oncokb-base-url] [-b oncokb_api_bear_token]
  Input CNA file should follow the GISTIC output (https://cbioportal.readthedocs.io/en/latest/File-Formats.html#discrete-copy-number-data)
  Essential clinical columns:
    SAMPLE_ID: sample ID
  Cancer type will be assigned based on the following priority:
     1) ONCOTREE_CODE in clinical data file
     2) ONCOTREE_CODE exist in MAF
     3) default tumor type (-t)
  Default OncoKB™ base url is http://oncokb.org
```

### Clinical Data Annotator

When you type python `ClinicalDataAnnotator.py -h` in terminal, you can see all Python command parameters as below. `-i <input clinical file>`, `-o <output clinical file>` and`-a <annotated alteration files, separate by ,>` are required.

```bash
ClinicalDataAnnotator.py -i <input clinical file> -o <output clinical file> -a <annotated alteration files, separate by ,> [-s sample list filter]
  Essential clinical columns:
    SAMPLE_ID: sample ID
```

### OncoKB Plots

When you type `python OncoKBPlots.py -h` in terminal,  you can see all Python command parameters as below. `-i <input clinical file>` and `-o <output clinical file>` are required.

```bash
OncoKBPlots.py -i <annotated clinical file> -o <output PDF file> [-c <categorization column, e.g. CANCER_TYPE>] [-s sample list filter] [-n threshold of # samples in a category] [-l comma separated levels to include]
  Essential clinical columns:
    SAMPLE_ID: sample ID
    HIGHEST_LEVEL: Highest OncoKB levels
  Supported levels (-l): 
    LEVEL_1,LEVEL_2,LEVEL_3A,LEVEL_3B,LEVEL_4,ONCOGENIC,VUS
```
