Variant Control Data
2
0
Entering edit mode
11.6 years ago
richardc.gsc ▴ 160

Hi all,

We have a few in house genomic samples where we validated a couple of hundred variants. To test new tools that are forever being published we could really use a control data set where there is a large number (thousands?) of validated variants. Ideally, the variants would be validated in a publicly available cell line so that we could run samples on our own machines and use the control data to calibrate any new tools we put in to the pipeline.

Is anyone aware of a good sample, with validated variants to use to calibrate machines and bioinformatic tools?

I realize we could use simulated data for this but right now I'm just interested in real data that we could generate with our sequencers.

thanks!

variant-calling • 1.9k views
ADD COMMENT
0
Entering edit mode
11.6 years ago
DG 7.3k

People often use NA12878 from the 1000 genomes project for validating their SNP calling algorithms, probably the most extensively sequenced genome on the planet. Multiple technologies and large scale Sanger validation of many variant calls. Daniel MacArthur used it and validated a ton of LOF variants for example. Might want to start there as it is a commonly used sample and publicly available.

ADD COMMENT
0
Entering edit mode
11.6 years ago
Jordan ★ 1.3k

I think you should take a look at TCGA data. Go to Data Matrix and select any kind of cancer and which platforms you need.

For e.g., if you need only somatic variants, select Somatic Mutations under Data Type, Availability: Available, and you can select Tumor Matched or Normal Matched, based on your needs. Then just select the ones you want to download. I think the Somatic Mutations are in .maf format.

Hope this helps.

ADD COMMENT

Login before adding your answer.

Traffic: 3073 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6