RNAseq variant calling: STAR align promotes mismatched overhanging mappings onto introns instead of calling splicing junctions
0
0
Entering edit mode
4.1 years ago
Francesco ▴ 20

Hi, I'm doing variant calling from my RNAseq data on a cell line (Paired End 2X100b, Illumina) I'm following GATK workflows and everything is working almost fine since I can find all the expected mutations already characterized in my cell line. The problem is that I see some additional mutations that are clearly the result of an alignment error (e.g. NM_001364837:exon8:c.727+1G>A, rs781065280) I'm using the latest release of STAR aligner 2.7.6a to perform the alignment with the following options

STAR --runThreadN 8 --genomeDir $GENOME_DIR \
  --readFilesIn ...   --readFilesCommand gunzip -c \
  --outSAMattrRGline ... \
  --outSAMtype BAM SortedByCoordinate \
  --twopassMode Basic \
  --outSAMmapqUnique 60 \
  --outFilterType BySJout --alignIntronMin 20 \
  --alignIntronMax 1000000 --alignMatesGapMax 1000000 \
  --alignSJoverhangMin 8 --alignSJDBoverhangMin 3 --scoreGenomicLengthLog2scale 0 \
  --outFileNamePrefix $OUTPUT_DIR/...

The problem resides in STAR mapping some reads with a mismatching overhang of 4 bp at beginning of the intron instead of picking a perfect match at the beginning of the next exon (see picture attached). enter image description here

As you can see, --alignSJDBoverhangMin is set to 3 and the length of the overhang is 4bp. Also the splice junction should be canonical so a spliced mapping should carry no penalty. Anybody has an explanation for that?

RNA-Seq STAR Splicing junction • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2005 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6