Best Way To Get Truly Unique Reads In Bowtie/Sam?
1
5
Entering edit mode
12.1 years ago

Hi,

I'm currently trying to get truly unique aligning paired-end reads in bowtie2, setting -k 1 doesn't help in this case as it just reports the first alignment for each read - but I don't want reads that align more than one time.

It looks like SAM's NH:1:X-flag is for this, where X is the number of times the read aligns - however, bowtie2 does not seem to set that flag (and I can't find a setting to convince bowtie2 to do so).

My current "solution" is to iterate through the sam/bam-file and discard all IDs that are listed more than two times (once for each element of the pair), however, that's a bit slow as I have to go through the file twice and I have bam-files in the order of several hundred gigabytes.

Is there a better solution?

Thanks!

bowtie sam • 16k views
ADD COMMENT
4
Entering edit mode
12.1 years ago
Fidel ★ 2.0k

Bowtie2 by default always maps multi-reads which is in-line with the recommendation from the authors (see http://www.nature.com/nrg/journal/v13/n1/full/nrg3117.html). The command line options modify how much effort will bowtie2 put into searching a best match or how many positions you want to get.

As stated elsewhere (see Bowtie2, -M Alignment/Reporting Mode) to get rid of multi-reads you have to look for the XS flag. This flag is only set if the read is a muli-read and contains the alignment score for second-best alignment.

ADD COMMENT
0
Entering edit mode

Hi, I now ran into a problem:

I have these metrics in bowtie2 after a run with -X 500 -I 0 --no-discordant --no-unal --no-mixed

40949 reads; of these:
40949 (100.00%) were paired; of these:
16772 (40.96%) aligned concordantly 0 times
11759 (28.72%) aligned concordantly exactly 1 time
12418 (30.33%) aligned concordantly >1 times

I got 24177 paired alignments in the SAM-file, which equals the above number of unique and non-unique alignments.

When I check the SAM-file using less or grep, the XS-flag is not present! The metrics say that I got about 30% aligning more than once, but no XS-flag? Does "more than once" mean that the other alignments are worse? How come there's no XS-flag then, as these secondary alignments should have scores?

ADD REPLY
0
Entering edit mode

Did you filter for read quality? If I remember right multi-reads get a mapping quality of 1

ADD REPLY
0
Entering edit mode

Thanks for the reply!

I tried filtering with samtools view -q 2, however the numbers don't match.

I checked the manuals and it seems that a mapping quality of 1 for duplicate reads happens only in bowtie1 - the closest there is in bowtie2 is this: "A mapping quality of 10 or less indicates that there is at least a 1 in 10 chance that the read truly originated elsewhere." - i.e. if I filter by less than 10 I should have a reasonably good indication of "uniqueness".

ADD REPLY

Login before adding your answer.

Traffic: 3760 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6