Small tutorial of kggseq for annotation and prioritization of exome sequencing variants

Miaoxin Li ( mxli@hku.hk)

 

Reference: http://statgenpro.psychiatry.hku.hk/limx/kggseq/doc/UserManual.html

Input data:

1.       A Variant Call Format (VCF) file (a simulated data set)

examples/rare.disease.hg19.vcf

2.       A linkage pedigree file:

 examples/rare.disease.ped.txt

 

Purpose: Identify de novo sequence mutation that may cause Crohn's disease


Run the commands step by step to see what will happen

1.       Direct filter by de novo events and QC

java -jar kggseq.jar  --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --gty-sec-pl 20 --gty-af-ref 0.05 --gty-af-het 0.25 --gty-af-alt 0.75 --genotype-filter 7

2.       Filter out the de novo mutations where both affected and unaffected subjects have the same heterozygous genotypes and the same ALTERNATIVE homozygous genotypes

java -jar kggseq.jar  --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --gty-sec-pl 20 --gty-af-ref 0.05 --gty-af-het 0.25 --gty-af-alt 0.75 --genotype-filter 4,7 --ignore-homo

3.       Annotate sequence variants by RefGenes:

java -jar kggseq.jar  --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --gty-sec-pl 20 --gty-af-ref 0.05 --gty-af-het 0.25 --gty-af-alt 0.75 --genotype-filter 4,7 --ignore-homo --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6

4.       Filter sequence variants by Common variants

java -jar kggseq.jar  --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --gty-sec-pl 20 --gty-af-ref 0.05 --gty-af-het 0.25 --gty-af-alt 0.75 --genotype-filter 4,7 --ignore-homo --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter dbsnp138,ESP5400 --rare-allele-freq 0.05

 

5.       Prioritize sequence variants by disease-causing prediction

java -jar kggseq.jar  --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --gty-sec-pl 20 --gty-af-ref 0.05 --gty-af-het 0.25 --gty-af-alt 0.75 --genotype-filter 4,7 --ignore-homo --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter dbsnp138,ESP5400 --rare-allele-freq 0.05 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant

 

6.       Prioritize sequence variants by other genomic and OMIM annotation 

java -jar kggseq.jar  --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --gty-sec-pl 20 --gty-af-ref 0.05 --gty-af-het 0.25 --gty-af-alt 0.75 --genotype-filter 4,7 --ignore-homo --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter dbsnp138,ESP5400 --rare-allele-freq 0.05 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --genome-annot --omim-annot

 

7.       Prioritize sequence variants by candidate genes with  protein interaction information

java -jar kggseq.jar  --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --gty-sec-pl 20 --gty-af-ref 0.05 --gty-af-het 0.25 --gty-af-alt 0.75 --genotype-filter 4,7 --ignore-homo --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter dbsnp138,ESP5400 --rare-allele-freq 0.05 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --genome-annot --omim-annot --candi-list ATG16L1,IL23R,IRGM --ppi-annot string --ppi-depth 1

 

8.       Prioritize sequence variants by candidate genes with  pathway information

java -jar kggseq.jar  --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --gty-sec-pl 20 --gty-af-ref 0.05 --gty-af-het 0.25 --gty-af-alt 0.75 --genotype-filter 4,7 --ignore-homo --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter dbsnp138,ESP5400 --rare-allele-freq 0.05 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --genome-annot --omim-annot --candi-list ATG16L1,IL23R,IRGM --ppi-annot string --ppi-depth 1 --pathway-annot cura

 

9.       Prioritize sequence variants by PubMed

java -jar kggseq.jar  --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt  --out test1 --excel --seq-qual 50 --seq-mq 20 --seq-sb -10 --gty-qual 20 --gty-dp 8 --gty-sec-pl 20 --gty-af-ref 0.05 --gty-af-het 0.25 --gty-af-alt 0.75 --genotype-filter 4,7 --ignore-homo --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter dbsnp138,ESP5400 --rare-allele-freq 0.05 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --genome-annot --omim-annot --candi-list ATG16L1,IL23R,IRGM --ppi-annot string --ppi-depth 1 --pathway-annot cura --pubmed-mining Crohn\'s+disease,inflammatory+bowel+disease