Small tutorial of kggseq for annotation and prioritization of exome sequence variants  (for KGGSeqV1.0+)

Miaoxin Li ( mxli@hku.hk)

 

Reference: http://grass.cgs.hku.hk/limx/kggseq/doc10/UserManual.html

Input data:

1.      A Variant Call Format (VCF) file (a fabled data set for education purpose)

examples/rare.disease.hg19.vcf

2.      A linkage pedigree file:

 examples/rare.disease.ped.txt

 

Purpose: Identify sequence variant candidate that may cause Arthrogryposis


Run the commands step by step to see what will happen

1.       Filter by genetic feature and inheritance model (recessive)

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6

//when QC is imposed
java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8

2.       Filter sequence variants by common  and neutral variants

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-filter-hard dbsnp138nf --db-filter-hard dbsnp138nf

 

3.       Annotate sequence variants by gene models:

   java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-filter-hard dbsnp138nf --db-filter-hard dbsnp138nf --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 

4.       Filter sequence variants in super-duplicate regions which are often error-prone

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --superdup-filter

5.      Filter genes which have too many, say 4 or more, rare and pathogenic variants

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --superdup-filter --gene-var-filter 4

6.  Filter neutral sequence variants by disease-causing prediction

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --superdup-filter --gene-var-filter 4 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant

 

7.       Annotate sequence variants by alterative splicing, structure variation, OMIM annotation, mouse phenotype, zebrafish phenotype and  developmental disorder database

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --superdup-filter --gene-var-filter 4 --scsnv-annot --dgv-cnv-annot --omim-annot --mouse-pheno --zebrafish-pheno --ddd-annot

8.       Annotate sequence variants by candidate genes with  protein interaction and pathway information

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --superdup-filter --gene-var-filter 4 --scsnv-annot --dgv-cnv-annot --omim-annot --mouse-pheno --zebrafish-pheno --ddd-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2 --ppi-annot string --ppi-depth 1 --pathway-annot cura

 

9.       Predict pathogenicity of genes of candidate sequence variants by functional prediction and phenotype mining

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --superdup-filter --gene-var-filter 4 --scsnv-annot --dgv-cnv-annot --omim-annot --mouse-pheno --zebrafish-pheno --ddd-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2 --ppi-annot string --ppi-depth 1 --pathway-annot cura --patho-gene-predict --phenotype-term Arthrogryposis,Arthrogryposis+multiplex+congenital --phenolyzer-prediction

 

10.   Annotate sequence variants by PubMed (Time consuming)

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode --gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA --rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant --superdup-filter --gene-var-filter 4 --scsnv-annot --dgv-cnv-annot --omim-annot --mouse-pheno --zebrafish-pheno --ddd-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2 --ppi-annot string --ppi-depth 1 --pathway-annot cura --patho-gene-predict   --phenotype-term Arthrogryposis,Arthrogryposis+multiplex+congenital --phenolyzer-prediction --pubmed-mining

 

 

 

Others

1.      Output with kggseq binary files
java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --o-ked

2.      Output with plink binary files
java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --o-plink-bed

3.      Output with ANNOVAR input files

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --o-annovar

4.      Output with VCF input files

java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file examples/rare.disease.ped.txt --out test1 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --o-vcf