Small tutorial of kggseq for annotation and prioritization of exome sequencing variants

Miaoxin Li ( mxli@hku.hk)

 

Reference: http://statgenpro.psychiatry.hku.hk/limx/kggseq/doc/UserManual.html

Input data:

1.       A Variant Call Format (VCF) file (a simulated data set)

examples/rare.disease.hg19.vcf

2.       A linkage pedigree file:

 examples/rare.disease.ped.txt

 

Purpose: Identify sequence variant candidate that may cause a recessive Arthrogryposis


Run the commands step by step to see what will happen

1.       Filter by genetic feature and inheritance model (recessive)

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6

//when QC is imposed
java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8

2.       Annotate sequence variants by RefGenes:

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 

3.       Filter sequence variants by Common variants

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter 1kg201305,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-filter-hard dbsnp138nf --db-filter-hard dbsnp138nf

 

4.       Filter neutral sequence variants by disease-causing prediction

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter 1kg201305,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant

5.       Filter sequence variants in super-duplicate regions which are often error-prone

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter 1kg201305,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --superdup-filter

6.       Filter genes which have too many, say 4 or more, rare and pathogenic variants

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter 1kg201305,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --superdup-filter --gene-var-filter 4

7.       Prioritize sequence variants by other genomic and OMIM annotation 

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter 1kg201305,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --superdup-filter --gene-var-filter 4 --genome-annot --omim-annot

 

8.       Prioritize sequence variants by candidate genes with  protein interaction information

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter 1kg201305,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --superdup-filter --gene-var-filter 4 --genome-annot --omim-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2 --ppi-annot string --ppi-depth 1

 

9.       Prioritize sequence variants by candidate genes with  pathway information

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter 1kg201305,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --superdup-filter --gene-var-filter 4 --genome-annot --omim-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2 --ppi-annot string --ppi-depth 1 --pathway-annot cura

 

10.   Prioritize sequence variants by PubMed

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter 1kg201305,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict all --filter-nondisease-variant --superdup-filter --gene-var-filter 4 --genome-annot --omim-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2 --ppi-annot string --ppi-depth 1 --pathway-annot cura --pubmed-mining Arthrogryposis,Arthrogryposis+multiplex+congenita

 

 

 

Others

1.       Output with plink binary files
java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --o-plink-bed

 

2.       Output with ANNOVAR input files

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --o-annovar

 

3.       Output with VCF input files

java -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --o-vcf