Dear all,
I'm trying to find raw target sequencing data (bam or fastq and vcf/gvcf) of any cancer types. I want to get publications associated with these data, because I need info about confirmation process description of allele frequency of founded variants in vcf (e.g., digital PCR). However, databases which I know don't provide biological validation of stored NGS data.
Hope you can help me.
I am not sure if these raw data are available, because of privacy policies.
These data can be under controlled access (e.g., TCGA).
Yes, indeed, and do you have access? If not, you can download some raw FASTQ data from cancer studies at SRA, process these, and then produce your own BAMs and VCFs,
I need annotated VCF files (or BAM with variant calling description, if it was done) as in silico control for bioinformatics pipeline and, also, I need wet lab confirmation of observed variant allele frequencies. I guess, my question is similar to this post.
For NGS data, you may struggle to find a normal sample for whom the variants have been confirmed in the wet lab. If you can imagine, validating all variants would be a costly and time-consuming task. GIAB (Genome in a Bottle) have samples for whom variants have been confirmed in parallel by multiple variant calling methods, but these are neither confirmed in the wet lab.
If you search the online repositories (mainly SRA - sequence read archive), then you may find what you need.
What in the other post (by Cyriac) is not 100% in line with what you need, or does the post by Cyriac 100% address your question?
Is a biological validation a costly and time-consuming task for variants from targeted sequencing? Can validation be performed only for the pool of interested variants (e.g., hot spots)?
Cyriac addressed to NCI's GDC Legacy Archive for validated BAM files, however, it is a bioinformatic validation. I found another Cyriac post, but I can't find files related to the second point of "How TCGA MAFs are made" header.