STAR output soft-clipped reads instead of insertion
1
0
Entering edit mode
5.7 years ago

Hi, all,

I used STAR to align reads to zebrafish genome. I know STAR output soft-clipped reads. But when I Blat the read to genome and find that actually there are only two insertions and one mismatch in the read.

The reads is

@NB501962:91:HVTHLBGX5:4:11603:19437:19319
CTCTGATTCAGTCTACTCAGACTTGATTGACCCATCACAGAGCTCAAGTTTAAATTCTGAGTTTGTTGACCCTCC
+
AAAAAEEEEEEEEEEEEEEEAEEEEEEEE/EEEAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAE/EEEEE

The STAR command I used is:

STAR --runThreadN 48 --genomeDir ./annotation --outFilterMismatchNoverLmax 0.05 --readFilesIn Read.fastq --outFileNamePrefix test_ --outSAMtype BAM SortedByCoordinate --outSAMattributes Standard

The result is:

NB501962:91:HVTHLBGX5:4:11603:19437:19319   0   1   6002    255 54M21S  *   0   0   CTCTGATTCAGTCTACTCAGACTTGATTGACCCATCACAGAGCTCAAGTTTAAATTCTGAGTTTGTTGACCCTCC AAAAAEEEEEEEEEEEEEEEAEEEEEEEE/EEEAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAE/EEEEE NH:i:1  HI:i:1  AS:i:53 nM:i:0

But when I check the genome browser and extract the sequence, the mapping should be like(underline is insertion, lowcase is mismatch):

 read:CTCTGATTCAGTCTACTCAGACTTGATTGACCCATCACAGAGCTCAAGTTTAAATTCTGAGTTTGTTGAcCCTCC
 gemo:CTCTGATTCAGTCTACTCAGACTTGATTGACCCATCACAGAGCTCAAGTTTAAA__CTGAGTTTGTTGAaCCTCC

I would expect it to be mapped as the whole read with insertion. But the 3' end sequence after insertion, STAR output it as unmapped (soft-clipped). If I use the EndtoEnd mode, this read can not be mapped to genome.

Can someone help me out with this? Or are there any parameters in STAR I can set to map this read properly?

Thank you in advance!

alignment • 2.4k views
ADD COMMENT
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

Thank you for the reminder. I was looking for the option to make it better. And you help me with it. Thanks!

ADD REPLY
3
Entering edit mode
5.2 years ago
pacome.pr ▴ 130

Hi Jason, You can try with the "--alignEndsType EndToEnd" option that will disable soft-clipping of reads if you really need the read to find the insertion. I don't know why this read is soft-clipped though as the mismatch limit is not crossed.

ADD COMMENT
0
Entering edit mode

You might check the --outFilterMismatchNmax of your version. Using the length-depended MM-filter, you should that to 999 according to the "ENCODE" settings.

ADD REPLY
0
Entering edit mode

Hi Michael, As far as I know, --outFilterMismatchNmax is set to 10 by default. Besides, in my command line, I have used --outFilterMismatchNoverLmax 0.05, and my read length is 75, so the maximum mismatch should 75*0.05=3.75. In that read, there is one mismatch and two deletions. Do you know how STAR treats deletions? Even we assume deletions are counted as "mismatches". The score is still below the maximum, isn't it?

ADD REPLY
0
Entering edit mode

Hi, Pacome.pr,

Thank you for the suggestion, but as I meant in my post, I have tried the EndtoEnd mode, but this reads can't be mapped to genome.

ADD REPLY

Login before adding your answer.

Traffic: 1961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6