Reference:
http://grass.cgs.hku.hk/limx/kggseq/doc10/UserManual.html
Input data:
1.
A Variant Call Format (VCF) file (a fabled
data set for education purpose)
examples/rare.disease.hg19.vcf
2.
A linkage pedigree file:
examples/rare.disease.ped.txt
Purpose: Identify
sequence variant candidate that may cause Arthrogryposis
Run the commands step by step to see what will happen
1. Filter by genetic feature and inheritance model (recessive)
java -Xmx6g
-jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file
examples/rare.disease.ped.txt --out test1 --excel --genotype-filter
1,2,6
//when QC is
imposed
java -Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf
--ped-file examples/rare.disease.ped.txt --out test1 --excel --genotype-filter
1,2,6 --seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20
--gty-dp 8
2. Filter sequence variants by common and neutral variants
java
-Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file
examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6
--seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-filter-hard
dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA
--rare-allele-freq 0.01 --db-filter-hard dbsnp138nf --db-filter-hard
dbsnp138nf
3. Annotate sequence variants by gene models:
java -Xmx6g -jar kggseq.jar
--vcf-file examples/rare.disease.hg19.vcf --ped-file
examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6
--seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-filter-hard
dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA
--rare-allele-freq 0.01 --db-filter-hard dbsnp138nf --db-filter-hard
dbsnp138nf --db-gene refgene,gencode --gene-feature-in
0,1,2,3,4,5,6
4. Filter sequence variants in super-duplicate regions which are often
error-prone
java
-Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file
examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6
--seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode
--gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA
--rare-allele-freq 0.01 --superdup-filter
5. Filter
genes which have too many, say 4 or more, rare and pathogenic variants
java
-Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file
examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6
--seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode
--gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA
--rare-allele-freq 0.01 --superdup-filter --gene-var-filter 4
6. Filter neutral sequence
variants by disease-causing prediction
java
-Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file
examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6
--seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode
--gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA
--rare-allele-freq 0.01 --superdup-filter --gene-var-filter 4 --db-score
dbnsfp --mendel-causing-predict best --filter-nondisease-variant
7. Annotate sequence variants by alterative splicing, structure
variation, OMIM annotation, mouse phenotype, zebrafish phenotype and developmental
disorder database
java
-Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file
examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6
--seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode
--gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA
--rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant
--superdup-filter --gene-var-filter 4 --scsnv-annot --dgv-cnv-annot
--omim-annot --mouse-pheno --zebrafish-pheno --ddd-annot
8. Annotate sequence variants by candidate genes with
protein interaction and pathway information
java
-Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file
examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6
--seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode
--gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA
--rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant
--superdup-filter --gene-var-filter 4 --scsnv-annot --dgv-cnv-annot --omim-annot
--mouse-pheno --zebrafish-pheno --ddd-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2
--ppi-annot string --ppi-depth 1 --pathway-annot cura
9. Predict pathogenicity of genes of candidate sequence variants
by functional prediction and phenotype mining
java
-Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file
examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6
--seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode
--gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA
--rare-allele-freq 0.01 --db-score dbnsfp --mendel-causing-predict best --filter-nondisease-variant
--superdup-filter --gene-var-filter 4 --scsnv-annot --dgv-cnv-annot --omim-annot
--mouse-pheno --zebrafish-pheno --ddd-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2
--ppi-annot string --ppi-depth 1 --pathway-annot cura --patho-gene-predict
--phenotype-term Arthrogryposis,Arthrogryposis+multiplex+congenital --phenolyzer-prediction
10. Annotate sequence
variants by PubMed (Time consuming)
java
-Xmx6g -jar kggseq.jar --vcf-file examples/rare.disease.hg19.vcf --ped-file
examples/rare.disease.ped.txt --out test1 --excel --genotype-filter 1,2,6
--seq-qual 50 --seq-mq 20 --seq-fs 60 --gty-qual 20 --gty-dp 8 --db-gene refgene,gencode
--gene-feature-in 0,1,2,3,4,5,6 --db-filter-hard dbsnp138nf --db-filter exac,ehr,1kg201204,dbsnp138,dbsnp141,ESP6500AA,ESP6500EA
--rare-allele-freq 0.01 --db-score
dbnsfp --mendel-causing-predict best --filter-nondisease-variant
--superdup-filter --gene-var-filter 4 --scsnv-annot --dgv-cnv-annot --omim-annot
--mouse-pheno --zebrafish-pheno --ddd-annot --candi-list ECEL1,MYBPC1,TNNI2,TNNT3,TPM2
--ppi-annot string --ppi-depth 1 --pathway-annot cura --patho-gene-predict
--phenotype-term Arthrogryposis,Arthrogryposis+multiplex+congenital
--phenolyzer-prediction --pubmed-mining
|