Question

Benchmarking RNASeq Variant Calling Pipeline (Short Reads)

0

Entering edit mode

6 months ago

Esraa ▴ 10

Hello, I am currently working on optimizng a variant calling pipeline for short read RNA-Seq data, and i have been searching for any Gold Standard benchmarking datasets for the pipeline that has the VCF results provided and could not find any.

I know GIAB project provides Google-Illumina short read RNA-Seq datasets, but there is no curated VCF for the data that i can compare my final results with, so if anyone has an idea of what i can do it would be really helpful.

Thank you all in advance.

rna-seq vcf variant-calling • 618 views

ADD COMMENT • link 6 months ago by Esraa ▴ 10

2

Entering edit mode

As long as it is the same GIAB sample you could compare your SNP with the SNP's available for the whole genome set.

ADD REPLY • link 6 months ago by GenoMax 147k

0

Entering edit mode

Thank you so much for answering! I actually found some studies doing it the way you mentioned.

I ran the GATK best practices pipeline on the RNA-Seq reads and compared it to the high confidence variants using hap.py, but the results do not make sense as it gave F1 Scores of about 0.04, which indicates i am doing something wrong in my analysis.

I tried every troubleshoot i could think of like checking my references, tools parameters, etc.., but could not grasp the cause of the problem, do you have any idea of what i could be doing wrong?

ADD REPLY • link 6 months ago by Esraa ▴ 10

score 0 · Answer 1 · 2024-05-13

0

Entering edit mode

6 months ago

lagartija ▴ 160

I don't know but another way of doing it would be to combine different datasets of different strains that you know are clonal. Then you know where the true variants should be based on the alignments of their genomes.

ADD COMMENT • link 6 months ago by lagartija ▴ 160

0

Entering edit mode

Thank you! I will try searching for this more and see if it would fit my analysis purposes.

ADD REPLY • link 6 months ago by Esraa ▴ 10