get only reads that has multiple alignment on bowtie
2
1
Entering edit mode
10.2 years ago
dinarussia ▴ 10

Hello there,

Is there a way to get as a bowtie output ONLY the reads that aligns in more then one region ? It would be the negative of -m flag.

Thank you for your time. I do aprreciate it.

RNA-Seq bowtie • 4.9k views
ADD COMMENT
2
Entering edit mode
10.2 years ago
Manvendra Singh ★ 2.2k

After bowtie run, you can fetch non uniquely mapped reads from SAM file. For example, if NH:1:X is the flag for uniquely mapped reads in SAM file (bowtie output), just do

grep -v 'NH:1:X' <your SAM file>

HTH

ADD COMMENT
0
Entering edit mode

Thank you for your time @Manu.

I could not figure out the right way/flag to fetch ONLY non uniquely mapped reads from a SAM file.

I cannot find this 'NH:1' flag on my files.

Any new insights ?

ADD REPLY
0
Entering edit mode

I think that if it is bowtie 1, this flag is not set and you need to process the file your self with a script, maybe using python and pysam library.

ADD REPLY
2
Entering edit mode
5.9 years ago

Without the NH tag you could try something like that (!!!NOT TESTED!!!)

1.

samtools view -F 4 -f 64 file.bam | cut -f 1 | sort | uniq -c | awk '$1>1 {print $2}' > mate1_readsWithMultipleHits.ids
samtools view -F 4 -f 128 file.bam | cut -f 1 | sort | uniq -c | awk '$1>1 {print $2}' > mate2_readsWithMultipleHits.ids
  • Get all IDs of mate1
  • Sort-unique these IDs and count their occurence
  • If you find them more than once, write their fragment ID to a mate1 file
  • Redo this for mate2

2.

samtools view -f 64 file.bam | fgrep -w -f mate1_readsWithMultipleHits.ids > mapping_positions_of_mutiple_mapped_reads.sam
samtools view -f 128 file.bam | fgrep -w -f mate2_readsWithMultipleHits.ids >> mapping_positions_of_mutiple_mapped_reads.sam
  • Extract the mapping positions of reads that were mapped to several positions for mate1 and mate2
ADD COMMENT
1
Entering edit mode

This is what I want, extracting multiple mapped reads from bowtie alignment. I have tested for SE reads. Also you can extract unique mapped reads.

From Xianjun Dong's post, alternative, we can use NH:i:x tag, or the FLAG tag for this purpose.

awk '{if (and($2,0x100)) print}' # unique mapped

fgrep -w NH:i:1 # unique mapped

fgrep -vw NH:i:1 # multi mapped

But above solutions are not working for bowtie output.

However, this solution is not working for default bowtie param: -k 1, the multiple mapped reads reported only once.

fgrep is so powerful for this purpose. Thanks so much.

This is the log of bowtie with -m 1, tells 1124533 unique reads, 208288 multiple reads.

$ cat align.m1.bowtie.log
# reads with at least one reported alignment: 1124533 (48.35%)
# reads that failed to align: 993099 (42.70%)
# reads with alignments suppressed due to -m: 208288 (8.96%)

$ samtools view -F 4 -c align.m1.sam
1124533 

If I use -k 5 instead: (report up to <5> good alignments per read), 1332821 (1124533 + 208288) mapped reads were reported.

$ cat align.k5.bowtie.log
# reads processed: 2325920
# reads with at least one reported alignment: 1332821 (57.30%)
# reads that failed to align: 993099 (42.70%)
Reported 1569333 alignments

$ samtools view -F 4 -c align.k5.sam
1569333

Test, extract multiple read ids:

$ samtools view -F 4 align.k5.sam | cut -f 1 | sort | uniq -c | awk '$1>1 {print $2}' | wc
208288
ADD REPLY

Login before adding your answer.

Traffic: 2627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6