Hi, all,
I used STAR to align reads to zebrafish genome. I know STAR output soft-clipped reads. But when I Blat the read to genome and find that actually there are only two insertions and one mismatch in the read.
The reads is
@NB501962:91:HVTHLBGX5:4:11603:19437:19319
CTCTGATTCAGTCTACTCAGACTTGATTGACCCATCACAGAGCTCAAGTTTAAATTCTGAGTTTGTTGACCCTCC
+
AAAAAEEEEEEEEEEEEEEEAEEEEEEEE/EEEAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAE/EEEEE
The STAR command I used is:
STAR --runThreadN 48 --genomeDir ./annotation --outFilterMismatchNoverLmax 0.05 --readFilesIn Read.fastq --outFileNamePrefix test_ --outSAMtype BAM SortedByCoordinate --outSAMattributes Standard
The result is:
NB501962:91:HVTHLBGX5:4:11603:19437:19319 0 1 6002 255 54M21S * 0 0 CTCTGATTCAGTCTACTCAGACTTGATTGACCCATCACAGAGCTCAAGTTTAAATTCTGAGTTTGTTGACCCTCC AAAAAEEEEEEEEEEEEEEEAEEEEEEEE/EEEAEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAE/EEEEE NH:i:1 HI:i:1 AS:i:53 nM:i:0
But when I check the genome browser and extract the sequence, the mapping should be like(underline is insertion, lowcase is mismatch):
read:CTCTGATTCAGTCTACTCAGACTTGATTGACCCATCACAGAGCTCAAGTTTAAATTCTGAGTTTGTTGAcCCTCC
gemo:CTCTGATTCAGTCTACTCAGACTTGATTGACCCATCACAGAGCTCAAGTTTAAA__CTGAGTTTGTTGAaCCTCC
I would expect it to be mapped as the whole read with insertion. But the 3' end sequence after insertion, STAR output it as unmapped (soft-clipped). If I use the EndtoEnd mode, this read can not be mapped to genome.
Can someone help me out with this? Or are there any parameters in STAR I can set to map this read properly?
Thank you in advance!
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you!
Thank you for the reminder. I was looking for the option to make it better. And you help me with it. Thanks!