I have an amplicon library, which I am trying to align with bowtie2 but I am having issues.
My reads have been trimmed to removed the 5' and 3' adapters, which ideally should result in 20bp trimmed reads. However, sometimes, there are reads that are 22-23bp. The indexes that I am aligning to are 20bp gRNA barcode sequences, matched to a gRNA barcode name.
When I run bowtie2 with end-to-end, it works quite well and I get ~85-90+% alignment.
bowtie2 --norc -N 1 -x library/crspri_grna-index -U inputfile.fastq.gz -S outputfile.sam > logfile.sam.log 2> errorfile.sam.err
Results:
3474015 reads; of these:
3474015 (100.00%) were unpaired; of these:
401777 (11.57%) aligned 0 times
2943712 (84.74%) aligned exactly 1 time
128526 (3.70%) aligned >1 times
88.43% overall alignment rate
However, some reads are not trimmed to exactly 20bp, and therefore I am getting biases in my results. I therefore need to run it on a local alignment.
However, very oddly, when I run bowtie2 with the --local or --very-sensitive-local parameter, I get 0 aligned reads.
bowtie2 --norc --local -N 1 -x library/crspri_grna-index -U inputfile.fastq.gz -S outputfile.sam > logfile.sam.log 2> errorfile.sam.err
Results:
3474015 reads; of these: 3474015 (100.00%) were unpaired; of these:
3474015 (100.00%) aligned 0 times
0 (0.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
0.00% overall alignment rate
Can someone please help me understand how to fix this to get the local alignment to work? Apologies if I am not posting properly, this is my first post.
If it's useful, here is an example of my fastq file. The majority of the time, my reads are 20bp. However, sometimes, a read that is 22-23bp or longer comes up. These are getting ignored by the end-to-end alignment
AE/EEEEEEEEAA/EEEEEE @SRR12815306.732 732/1 GTACAGGTTCTAACCCGTT
AEEEEEEEEEEE/EEEEEE @SRR12815306.733 733/1 GCTCGAGCTGGAGTTCGACC
/AEEA/A6E/EEEEE/E/EE @SRR12815306.734 734/1 ACACTCATCTCATTTATTCTTGT
AEEEEEEEEEEEEEEEEE6/AEE @SRR12815306.735 735/1 GATGTTTAAATGCTTTTTCG
Please try using
bowtie v.1.x
. It does ungapped alignments which is what you want with CRISPRseq.Ideally you may want to use MaGeCK or
2FAST2Q
(https://github.com/afombravo/2FAST2Q) meant for CRISPR analysis.