RNA-Seq mutation calling without germline data
4
0
Entering edit mode
3 months ago

Hi all,

I have some rRNA-depleted transcriptome sequencing data. They are from leukaemia samples. Is it possible to call mutations without germline seqeuncing data from RNA-Seq?

I know there are pipelines for WGS without germline data even though it's not ideal. Also, in recent years, more and more people started to call CNV/SNP/mutation from RNA-Seq data, which is still not a recommended practice from I've gathered. But is it still worth trying? Are there any known pipelines out there that would could do this?

I appreciate you input.

rna-seq mutation • 1.2k views
ADD COMMENT
2
Entering edit mode
3 months ago
DGTool ▴ 290

Although I'm not too much into RNA-seq variant calling, I've seen similar statements to what you've said, RNA-seq based variant calling seems to still be somewhat not as well investigated, or at least to the extent that DNA-seq based variant calling is. I've seen some articles around GATK's forums which had talked about (somatic) variant calling of RNA-seq (I think it was about tumor-only i.e. no germline normal), but it mentioned that it hasn't been thoroughly checked if many adjustments need to be made and how accurate the results are. This was from a couple years ago so things might have changed since then but I do not know. I would say it could be possible to GATK's RNA-seq variant calling best practices: https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels. The HaplotypeCaller step could be replaced with Mutect2 for somatic instead of germline variant calling.

There is also work on a Nextflow pipeline which implements the steps (from nf-core: https://nf-co.re/rnavar/1.0.0/). Although it states that the most recent "official" update was 2 years ago, there is still work being done on it in the dev branch on GitHub. It would make a good reference in case you want to check how certain steps might need to be run / what they require.

ADD COMMENT
0
Entering edit mode

Thanks DGTool I think I will try Mutect2 after the two passing on STAR. I posted on slack the other day about the nfcore pipeline and I think it's probably a similar pipeline with HaplotypeCaller. So it won't be very useful for bulk sequencing of leukaemia/tumour. It's sort of a backup/last resort in case we couldn't get WGS.

ADD REPLY
2
Entering edit mode
3 months ago

You can try it (and it is do-able using GATK). Note that given the cancer context, there's a high risk of missing key pathogenic or driver mutations. One issue is the drop-out of nonsense and frameshift mutations due to nonsense-mediated decay (NMD) mean it would be easy to miss clear pathogenic/driver mutations even with relatively high depth. Just keep that in mind for your investigation and see what comes out. Either way you should find some subset of the real missense and splicing variants and NMD-escaping frameshift/stop-gains.

ADD COMMENT
0
Entering edit mode

Thanks a lot benformatics I'll keep in mind the restrictions. On this topic may I ask if you have experience in two-pass STAR alignment? I have been going through this online and people keep saying there should be a filtering after first pass for novel and real junctions. I have seen people doing it without the so-called filtering and STAR even has an option to do a two pass alignment automatically. But is this filtering necessary for SNP calling in GATK?

ADD REPLY
0
Entering edit mode

I've never done the filtering myself but usually I was searching for novel splicing events so more noise was better than false negatives. I have no idea how custom filtering would be done, I know that there is default filtering performed that I would sometime change to err on the side of FN<FP. I would probably stick with the defaults unless you have some strong reason not to do so. Also, you are going to have a lot of smallRNA noise (probably) given you didn't do a poly-A selection and in my experience that might induce "fake" alternative splicing events. Anyway, none of this should be particularly important for variant calling - there is going to be noise anyway you're just going to have to set some conservative thresholds.

ADD REPLY
1
Entering edit mode
3 months ago
DBScan ▴ 450

In addition, you could have a look at the DeepVariant RNA-seq case study here https://github.com/google/deepvariant/blob/r1.6.1/docs/deepvariant-rnaseq-case-study.md. The tutorial is pretty well written.

ADD COMMENT
0
Entering edit mode

Thanks DBScan I see there is a DeepSomatic docker, probably more suitable for my case? It's quite interesting I will give it a shot.

ADD REPLY
1
Entering edit mode
3 months ago
Shred ★ 1.6k

You could try this pipeline by our group, without accounting for the last neoepitope prediction step https://github.com/ctglab/ENEO . If you spot any issue, feel free to report it on Github.

Mutect2 is not very well designed to work with RNA-seq data, as the statistical model behind strongly depends on detected allele frequency, which is strongly biased in RNAseq. Our pipeline works with Strelka2 using the --rna setup, which proves to be more sensitive in our testing.

ADD COMMENT
0
Entering edit mode

I have heard similar information about Mutect2 not being the most ideal regarding RNA-seq variant calling. The one question I do have, is that I remember when I was looking at Strelka2, the issue was that it didn't support tumor-only variant calling, at least the developers didn't have it planned for that (this might have been some time ago). Has that changed recently?

ADD REPLY
1
Entering edit mode

No, strelka2 is not actively supported. Strelka2 is way more sensitive than other methods, calling nearly all the alterations with enough evidence: what you'll obtain is a mixture of germline+somatic alterations, that needs to be filtered out. In our method we employed a statistical model to remove germline variants, obtaining good performances. There're also other methods out there, mostly based on machine learning algorithms, that worked on features retained from the variant calling of Mutect2.

In my experience, if you care more about specificity, go for mutect2, but you'll likely need to tune a lot of hyperparameters. Instead, if you care more about sensitivity, go for strelka2. In both cases, removing all the germline variants is not easy, and you need to keep it in mind.

ADD REPLY
0
Entering edit mode

Thanks Shred for sharing your pipeline. Can I ask if it runs each sample individually? Our HPC has limited space for each user so I have been running things in batches. Also we use SGE instead of SLURM. Hope that's not a big issue? And can I set my own genome/annotation/other things?

Sorry I have a lot of stupid questions. I am not really good at this.

ADD REPLY
0
Entering edit mode

Sorry for the delay. These are (legit, not stupid!) questions related to the usage, so would be better to do not flood the answer section of Biostars. I guess the only problem would be with the last point, but we could find a workaround if needed.

Send an email with all the questions to the corresponding author of the attached preprint, citing this discussion: I'd be happy to help you in the configuration and execution.

ADD REPLY

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6