Question

variant calling using pseudoaligners

0

Entering edit mode

5.3 years ago

jmgoldstein7 • 0

Hello,

I have recently been using a pseudoaligner (Salmon) to run RNA-Seq for differential gene expression, which has a lot of documentation and works very well. I was wondering whether anybody has done any work with using pseudoaligners such as Salmon to do variant calling and/or fusion detection?

I know variant calling on RNA-seq data has limitations as in it would not uncover intronic variants, and variant calling/fusion pipelines using more traditional splice-aware aligners (STAR) are out there. But if I were to want to do differential gene expression and variant calling on the same dataset it seems pointless to use a pseudoaligner if I would have to run something like STAR anyway for variant detection. Thanks.

RNA-Seq next-gen • 2.2k views

ADD COMMENT • link updated 5.2 years ago by igor 13k • written 5.3 years ago by jmgoldstein7 • 0

0

Entering edit mode

Hopefully, someone will let me know if this is wrong. The way I understand pseudoaligners is that don't have the stringency of checking the alignment of all bases of a read to a reference. They are checking what transcripts the reads are compatible with and may not align reads to reference completely if no further information can be gained. There are some youtube videos that do a good job of explaining the algorithm from a high level. . So I don't know if you can get good SNP data or not from these methods.

Apparently, there are some methods for fusion detection with pseudoaligners by working with reads that are compatible to multiple transcripts. There is a preprint that describes this: https://www.biorxiv.org/content/10.1101/166322v1 This group is really well known for pseudoalignment development.

I am learning myself , so do you own research.

ADD REPLY • link 5.3 years ago by curious ▴ 810

0

Entering edit mode

Thanks for the info!

ADD REPLY • link 5.2 years ago by jmgoldstein7 • 0

score 0 · Answer 1 · 2019-09-15

I was wondering whether anybody has done any work with using pseudoaligners such as Salmon to do variant calling and/or fusion detection?

You can do fusion detection with pseudoaligners. For example, kallisto quant has a --fusion option which additionally looks for reads that do not pseudoalign because they are potentially from fusion genes. There is a tutorial available.

But if I were to want to do differential gene expression and variant calling on the same dataset it seems pointless to use a pseudoaligner if I would have to run something like STAR anyway for variant detection.

Even if you are using the pseudoaligner only for quantification, it may provide a more accurate result. From Soneson et al:

when testing for changes in overall gene expression (DGE), traditional gene counting approaches may lead to an inflated false discovery rate compared to methods aggregating transcript-level TPM values