Entering edit mode
4.2 years ago
MatthewP
★
1.4k
I have fastq file of ATAC-seq but many reads start with one "N" base, I wonder what could be the reason.
Reads example:
@A00838:273:HCV7KDSXY:4:1101:1506:1000 1:N:0:AGGCAGAA+TATCCTCT
NATCCAGAAAAAAAAAAAATCATGACCAAGCTTACCGTCCCCACTTAAAT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF
@A00838:273:HCV7KDSXY:4:1101:1687:1000 1:N:0:AGGCAGAA+TATCCTCT
NCATAGATCACATTAAGTACAAATATAAACAGTATTATTTCTTTACAATTGGATGTGTTGGAGACTTACTGATGT
+
#FFFFFF,FFFFFFFFF:FFFFFFFFFF:FFFFFFFFF:FFFF:FFF:FFFFF:FFFF,F,FFFFFFFFFFFFF:
Stats of such kind of reads:
$ zcat ATAC_R1.fq.gz | grep -e "^N" | wc -l
15150
$ zcat ATAC_R2.fq.gz | grep -e "^N" | wc -l
28
15150 and 28 reads of several millions is
many
? I would not even care about that tiny percentage to be honest. Just proceed with analysis. The aligner will clip off that one base.