Question

Determine type of mutation

0

Entering edit mode

2.0 years ago

kirillkirilenko ▴ 40

Using hisat2, I did RNA-seq mapping against the genome. I now wish to identify the missense, nonsense, and silent mutations. I also have a ".gff" file that contains CDS, genes, etc. I can do it using for example IGV-browser, load on IGV genome, alignment (.bam), annotation (.gff), and manually examine it. But how can I automate it if I have thousands of mutations? As a result I want to see something like a table:

gene	position	type
A1	1020	missense
A1	1040	silent
B2	2000	silent

hisat2 alignment annotation • 1.3k views

ADD COMMENT • link 24 months ago by kirillkirilenko ▴ 40

1

Entering edit mode

Have you looked at the literature at all? The table you show represents the result of a fairly extensive pipeline consisting of several different tools, as described for instance by Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data (Adetunji et al., 2019), and the references listed therein, or related to that article as listed in the link. There are many ways to solve this problem, none of them are simple, but you should probably look over refs or tutorials so you can sketch out a plan.

ADD REPLY • link 24 months ago by seidel 11k

0

Entering edit mode

I did not think it is too hard. I have already performed variant calling and I have .vcf file. I thought there is a way to parse all these files: annotation (.gff), variants (.vcf), alignment (.bam), genome (.fasta) to get the result I wanted.

ADD REPLY • link 24 months ago by kirillkirilenko ▴ 40

score 2 · Answer 1 · 2022-11-27

2

Entering edit mode

24 months ago

Matthias Zepper 5.0k

You can run tools like SnpEff or the Variant Effect Predictor.

I trust you have performed steps like base quality score recalibration and variant filtration before to ensure that your calls are accurate. Variant-calling based on RNA-seq data is a quite dicey/challenging subject and way harder than from WGS data.