Question

STAR alignment results in large chunks of NNNNNs

0

Entering edit mode

6.2 years ago

Mark ★ 1.6k

Hi Everyone,

I'm using STAR to align reads to a eukaryotic reference genome. I used gtf file and generated the index and then mapped. I'm inspecting the alignment in Tablet and very large chunks of missing data seem to he introduced. This is short read data, 100bp paired end stranded illumina sequencing.

Here is an example:

SRR1106690.72357245

From: 23,097 U23,097 to 25,155 U25,155

Length: 2,059 U2,059 (1969 mismatches)

Cigar: 74M1969N16M

Read direction is FORWARD

SRR1106690.72357245 GTGGGTGTTGGTGAGGGCAGGTAATGCCAGGTATGAACCGGCACCTGACA GGGCTGGTGTAGTCACTGTCACCCNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTTCCAT GCACTTCTC

Can anyone shed light on why STAR is producing alignments like these? The majority of aligned reads are like this. Inspecting the alignment further, it's as if STAR is treating the paired end data as if it was single end and combining the paired reads into one 200bp and then introducing NNNN's to fix the issue. Here are my commands that I used to index/map:

 STAR --runThreadN 8 --runMode genomeGenerate --genomeDir X --genomeFastaFiles Y.fna --sjdbGTFfile Z.gtf --sjdbOverhang 199
 STAR --runThreadN 12 --genomeDir X --readFilesIn Y_1.fastq.gz Y.fastq.gz --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --sjdbGTFfile A.gtf

Thanks

RNA-Seq alignment sequencing • 1.9k views

ADD COMMENT • link updated 6.1 years ago by Biostar 20 • written 6.2 years ago by Mark ★ 1.6k

score 4 · Accepted Answer · 2018-08-30

4

Entering edit mode

6.2 years ago

Devon Ryan 104k

The CIGAR string indicates that the Ns aren't actually there, rather it's just spliced.

ADD COMMENT • link 6.2 years ago by Devon Ryan 104k

1

Entering edit mode

As Devon explains it's a splicing. You can open your bam file in IGV and look at this read and you will see that it's spliced.

ADD REPLY • link 6.2 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Hi Devon and Nicolas,

Thank you, that's very strange of tablet to show splicing as such. Thanks for the help guys, I should have looked at the CIGAR.

Thanks again

ADD REPLY • link 6.2 years ago by Mark ★ 1.6k