Transcriptome Variant Calling Mapping to Introns
0
0
Entering edit mode
28 days ago
vireen105 • 0

I am currently working on variant calling using the transcriptome or RNA-seq data of mice. I followed the pipeline for RNA-seq variant calling in GATK and found that some variants mapped to introns/intronic regions. Is there a way to pre-process my data so this does not happen or is it acceptable to just filter out those intron-mapping variants?

mouse introns transcriptome alignment variants • 423 views
ADD COMMENT
0
Entering edit mode

You should check for alternative splicing and intron retention. Also check the coverage of the regions in question. What could be intronic sequence in one isoform isn't necessarily an intron in another. Your best option is to visually inspect these alleged intronic regions in a genome browser. Load all gene models, not only the longest isoforms. I have seen that people tend to define intronic wrongly: as gene regions not part of an exon in the longest isoform, while it should be regions not part of an exon in ANY isoform, at least if you are using short reads. If the variants don't look real, I would check your alignment filtering parameters, e.g. remove multi-mapping reads, set alignment quality filter etc.

ADD REPLY
0
Entering edit mode

Looking at it in IGV, there seem to be variants mapped to introns regardless of alternative splicing? My current variant filters are DP >10 and QD > 2. What should I change with my read filtering, mapping or variant calling parameters?

enter image description here

ADD REPLY
1
Entering edit mode

I think you need to load the BAM file and look at individual reads. This looks like a contiguous region of <100 bp may be affected. What type of variants are these? I would possibly add a MQ < 30-50 (mapping quality filter) and remove multi-mappers (before variant calling). Also, you should load the repeat element track for this genome. You may have picked up an active retrotransposon somewhere in the genome. Anyway, you might end up ignoring these variants, regardless of what you do.

ADD REPLY
0
Entering edit mode

Yes, its also important to consider that introns are significantly more repetitive than coding sequence, and there will be both more multi-mappers and more mis-alignments in intronic regions. It might be worth getter hold of the mappability track from UCSC as well.

ADD REPLY
1
Entering edit mode

Couple of quick points -

  1. not all splice isoforms will be in an annotation.
  2. Intron retention can happen irrespective of directed "alternative splicing"
  3. RNA-seq will capture some proportion of the unspliced pre-mRNA. This is particularly true of ribo depleted, rather than polyA selected samples (maybe 50% of reads might be intronic in that case), but is also true to a lesser extent of polyA selected.
ADD REPLY

Login before adding your answer.

Traffic: 1179 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6