Hi,
I am using bowtie2 to do alignment to reference genome. Actually, instead of getting real data, I simulated the paired-end reads(100bp) with insertion length as 400bp between mate1 and mate2. Does anyone has any idea how to set the parameters for bowtie2 in this case? Because I simulated the reads, I know the read length is 100bp with 400bp insertion length between mates. See one of my alignment information as below, although it has 100% overall alignment, what is wrong with '1670714 (97.50%) aligned concordantly 0 times'?
Bowtie2 command:
bowtie2 -p 6 -x ./GCF_000157355.2_ASM15735v2_genomic.fna -1 input.read1.fastq -2 input.read2.fastq -S res.sam
1713638 reads; of these:
1713638 (100.00%) were paired; of these:
1670714 (97.50%) aligned concordantly 0 times
39863 (2.33%) aligned concordantly exactly 1 time
3061 (0.18%) aligned concordantly >1 times
----
1670714 pairs aligned concordantly 0 times; of these:
1638899 (98.10%) aligned discordantly 1 time
----
31815 pairs aligned 0 times concordantly or discordantly; of these:
63630 mates make up the pairs; of these:
0 (0.00%) aligned 0 times
14763 (23.20%) aligned exactly 1 time
48867 (76.80%) aligned >1 times
100.00% overall alignment rate
Thanks for reply. Do you any suggestion for me in this case? Does that will cause problem? I use DWGSIM to simulate reads with insertion length =400bp, and the standard deviation of the distance for pairs is 50bp which is default value. Because of simulated reads, I think they may be perfectly or highly like to match perfectly, like high aligned concordantly exactly proportion.
Unless you've given a custom -X parameter to bowtie2, fragments longer than 500bp are going to be considered discordant.
Thanks a lot. I add two parameters(-I 500 -X 700). Because I simulated 100bp paired end reads with 400bp insertion length. My fragments should be 600bp(400bp+100bp2), is that correct? And the std is 50bp. So 500bp to 700bp is already mean with 2std. Now I get '1618473 (94.45%) aligned concordantly exactly 1 time'. It seems that the problem has been solved?
I just use -X 1000 so it works on practically all input libraries. Aren't you getting fewer concordantly alignment reads now? Check the documentation on your simulator (and run picard CollectionInsertSizeMetrics) to see what fragment lengths you've actually simulated.
The community can't agree on whether to include read lengths in the length fields (the SAM spec even redefined their TLEN field, but tools didn't update to use the 'correct' one), so you need to check the definition used for every single tool you use because they might be different.
Thank you so much. The result now is reasonable. I changed the parameters to -I 200 and -X 1000, then the result is 100% concordantly(98.61% exactly 1 time, and 1.39% greater than 1 times.). I will accept your solution for this post. Thanks again!!!