Hi,
I want to check something about Bowtie so I tried to simulate NGS results by cutting my reference sequence into fragments (computationally) and extract paired-end reads to files.
I know the reads appear in the reference sequence (that's how I created them) but Bowtie fail to align some of them in paired-end mode, although I allowed mixed-mode and set the maximal distance to be very large (40000, larger then the reference sequence length).
When I tried to align the unaligned reads unpaired, Bowtie could align some of them but still not all of them. Why would Bowtie fail to align reads that appear within the reference sequence? (There is no YF tag in the result sam file) Why using the unpaired mode will align more reads than the mixed mode?
My code: paired-end mode -
bowtie2 -f -a --local -X 40000 -p 32 -x <reference sequence name> -1 <path to first-read-in-pair file> -2 <path to second-read-in-pair file> -S <path to output file>
unpaired mode-
bowtie2 -f -a --local -X 40000 -p 32 -x <reference sequence name> -U <path to first-read-in-pair file>,<path to second-read-in-pair file> -S <path to output file>
Example to unaligned reads from SAM file-
ref-1548...1847.1 4 * 0 0 * * 0 0 CCGACCACAGGCACTACTGCCATGACTACCACTCAGCCCTGGAACGACACTTTTGGCCCTACTTCGACCGAATTGACCACGGTCACGGGTACCAAGGGTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
ref-1557...1856.1 4 * 0 0 * * 0 0 GACACTACTGCCATGACTACCACTCAACCCTGGAACGACACTTTTAACCCTACTTCAACCGAATTGACCACAGTCACAGGTACCAAAGGTTTGCCAACTGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
ref-1130...1429.1 4 * 0 0 * * 0 0 TGTCATCAGAACTCCAACGACTGCGAGCACCATCATAACTACCACTGAGCCATGGAACGGCACTTTTACGTCTACTTCTACTGAAGTGACCACAGTCACTG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
Thanks, A
Can you share how you simulated your reads?
It's a long code, but basically...
I spliced my sequence to fragments of 300 bases. The first read is the first 101 bases of the fragment.
To create the second read I generated the complementary sequence of the last 101 bases in the fragment and than reversed the order (for example, if my fragment ends with TTCG the second read will start with CGAA)
Could you please share your reference sequence and unmapped reads? One idea is that there might be repetitive or low complexity regions and reads from such regions map to too many positions.
Hi,
My reference sequence is this:
These are my reads (after I replaces the ambiguous characters with non-ambiguous ones). The numbers in the read name are the coordinates of the fragment it in the reference sequence.
Thanks in advance!
Interestingly, your non-mapping reads correspond to reference regions including several N's. Have a look at the Bowtie2 "--n-ceil" (func for max # non-A/C/G/Ts permitted in aln (L,0,0.15)) and "--np" (penalty for non-A/C/G/Ts in read/ref (1)) options. As you said, you replaced ambiguous characters with non-ambiguous ones. This may be the origin of your problem.