Benchmarking RNASeq Variant Calling Pipeline (Short Reads)
1
0
Entering edit mode
7 months ago
Esraa ▴ 10

Hello, I am currently working on optimizng a variant calling pipeline for short read RNA-Seq data, and i have been searching for any Gold Standard benchmarking datasets for the pipeline that has the VCF results provided and could not find any.

I know GIAB project provides Google-Illumina short read RNA-Seq datasets, but there is no curated VCF for the data that i can compare my final results with, so if anyone has an idea of what i can do it would be really helpful.

Thank you all in advance.

rna-seq vcf variant-calling • 668 views
ADD COMMENT
2
Entering edit mode

As long as it is the same GIAB sample you could compare your SNP with the SNP's available for the whole genome set.

ADD REPLY
0
Entering edit mode

Thank you so much for answering! I actually found some studies doing it the way you mentioned.

I ran the GATK best practices pipeline on the RNA-Seq reads and compared it to the high confidence variants using hap.py, but the results do not make sense as it gave F1 Scores of about 0.04, which indicates i am doing something wrong in my analysis.

I tried every troubleshoot i could think of like checking my references, tools parameters, etc.., but could not grasp the cause of the problem, do you have any idea of what i could be doing wrong?

ADD REPLY
0
Entering edit mode
7 months ago
lagartija ▴ 160

I don't know but another way of doing it would be to combine different datasets of different strains that you know are clonal. Then you know where the true variants should be based on the alignments of their genomes.

ADD COMMENT
0
Entering edit mode

Thank you! I will try searching for this more and see if it would fit my analysis purposes.

ADD REPLY

Login before adding your answer.

Traffic: 2102 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6