Using 1000 genomes phase 3 variants for BQSR

Using 1000 genomes phase 3 variants for BQSR

0

Entering edit mode

4.3 years ago

prasundutta87 ▴ 710

Hi,

I have a general question regarding truth sets that need to be used for BQSR step in the GATK workflow. I am aware that a lot of variant datasets (SNPs and Indels) from phase 1 of 1000 genomes project are being currently used for this, but the consortium has come up with phase 3 variants as well. Their biallelic SNVs and Indels are present here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.wgs.shapeit2_integrated_snvindels_v2a.GRCh38.27022019.sites.vcf.gz

Will it be okay to use this instead of phase 1 datasets that can be seen here (SNPs and Indels)? - https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0;tab=objects?pli=1&prefix=&forceOnObjectsSortingFiltering=false

Would like to know what the community thinks about this.

There is also this dataset-ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz, but his has multiallelic variants, structural variation, etc, and hence, I won't be using it.

Regards, Prasun

SNP next-gen snp GATK • 1.1k views

ADD COMMENT • link 4.3 years ago by prasundutta87 ▴ 710

Login before adding your answer.

Similar Posts

Loading Similar Posts

Traffic: 1869 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6