Question

Mapping miRNA to the Human Genome with Bowtie2

1

Entering edit mode

8.1 years ago

gkuffel22 ▴ 100

Hopefully someone can help. I have read conflicting info about aligners for miRNA data but I have finally decided on Bowtie2. I have 2 questions:

I am aligning the miRNA reads to the human genome and then using HTSeq-count with the gff file from miRBase to count the miRNAs aligned in the BAM file, is this a reasonable approach?
Does anyone know if HTSeq will accurately count miRNAs that map to multiple locations in the genome?

Thanks everyone!

alignment miRNA Bowtie2 Python HtSeq-count • 6.6k views

ADD COMMENT • link updated 8.0 years ago by apa@stowers ▴ 610 • written 8.1 years ago by gkuffel22 ▴ 100

0

Entering edit mode

You should use bowtie v.1 instead since miRNA are going to be short and you want to keep your alignments ungapped.

ADD REPLY • link 8.1 years ago by GenoMax 148k

0

Entering edit mode

I`m doing almost the same! Does It works for you? I use Bowtie2 and I get a 72% of overall alignment rate, I think that is fine. I use;

bowtie2 -p 8 -x $index --np 0 --n-ceil L,0,0.02 --rdg 0,6 --rfg 0,6 --mp 6,2 --score-min L,0,-0.18 -q $reads.fastq -S $result.sam

ADD REPLY • link 8.0 years ago by Buffo ★ 2.4k

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER should only be used to provide new answers for the original question.

ADD REPLY • link 8.0 years ago by GenoMax 148k

score 1 · Answer 1 · 2017-01-11

Bowtie2 should work as well as Bowtie1, since bowtie2 should not return a gapped alignments if a contiguous one exists. Generally speaking. However, some people prefer Bowtie1 since you can guarantee "--best --strata", which Bowtie2 does not do.

There are better aligners for miRNA data, though, which also include analysis of the alignment context to give a more accurate picture of expression: see MirDeep2 and ShortStack (GitHub link).

HTSeq-count will quantitate multireads if you force it to. It depends on two things: one, mapping quality of multireads (which is usually very low or zero), and two, any SAM tags indicating a multiread (which is only the NH tag, according to the documentation).

HOWEVER: only Tophat2 uses the NH tag -- Bowtie1/2 do not -- Bowtie2 uses the XS tag; not sure about Bowtie1.

But in any case it is only a SAM tag, which can be stripped. For instance, to blind htseq-count to multireads from a Tophat bam file, set "-a 0" to stop filtering on mapping quality, and strip the NH tag in-line:

samtools view in.bam | perl -pe 's/\s+NH:i:\d+//' | htseq-count -s no -a 0 -m intersection-nonempty - genes.gtf > counts.txt

If htseq-count ever uses other tags, then just strip those too.

score 0 · Answer 2 · 2016-11-22

0

Entering edit mode

8.1 years ago

WouterDeCoster 47k

With regard to your second question, HTSeq-count will be default ignore/discard multimapping reads, see for example the FAQ for an explanation about that.

ADD COMMENT • link 8.1 years ago by WouterDeCoster 47k