Structural Variants Standard Dataset
2
4
Entering edit mode
11.8 years ago
PoGibas 5.1k

I am testing structural variants (deletions, duplications, inversions, insertions) calling tools and need a standard dataset to validate my calls.

Where can I find benchmark bam and corresponding vcf file for specific individual?

1000Genomes has calls for all the SV types only in the pilot release. But I am not able to find out which bam file (WGS or WES) should I use to call SV's and which vcf file should I use to check my calls.

Any help with an example will be appreciated.

1000genomes copynumber • 3.5k views
ADD COMMENT
2
Entering edit mode
11.8 years ago

This is the folder containing the variants called in 1000 Genomes: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/

If you look at the README in that folder, you will find a description of which files have been used to call these variants, and a short description of the filters applied.

Note that in these folders, Structural and Single Base Variants are merged together, in the same files. To extract the structural variants, you can use the option --keep-indels from the latest version of vcftools ( http://vcftools.sourceforge.net/options.html )

ADD COMMENT
0
Entering edit mode

And bam files should be downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/ ? If so, why bam file size differ (for example: low coverage Illumina bam file size varies from 9 to 24Gb)?

ADD REPLY
2
Entering edit mode
11.8 years ago
Dan ▴ 540

You could try generating simulated SVs then simulated reads from them? It's not a very complete list, but some NGS simulators are listed on SEQwiki: http://seqanswers.com/wiki/Special:BrowseData/Bioinformatics_application?Bioinformatics_method=Simulation

ADD COMMENT

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6