Mapping miRNA to the Human Genome with Bowtie2
2
1
Entering edit mode
8.1 years ago
gkuffel22 ▴ 100

Hopefully someone can help. I have read conflicting info about aligners for miRNA data but I have finally decided on Bowtie2. I have 2 questions:

  1. I am aligning the miRNA reads to the human genome and then using HTSeq-count with the gff file from miRBase to count the miRNAs aligned in the BAM file, is this a reasonable approach?

  2. Does anyone know if HTSeq will accurately count miRNAs that map to multiple locations in the genome?

Thanks everyone!

alignment miRNA Bowtie2 Python HtSeq-count • 6.6k views
ADD COMMENT
0
Entering edit mode

You should use bowtie v.1 instead since miRNA are going to be short and you want to keep your alignments ungapped.

ADD REPLY
0
Entering edit mode

I`m doing almost the same! Does It works for you? I use Bowtie2 and I get a 72% of overall alignment rate, I think that is fine. I use;

bowtie2 -p 8 -x $index --np 0 --n-ceil L,0,0.02 --rdg 0,6 --rfg 0,6 --mp 6,2 --score-min L,0,-0.18 -q $reads.fastq -S $result.sam
ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER should only be used to provide new answers for the original question.

ADD REPLY
1
Entering edit mode
8.0 years ago
apa@stowers ▴ 610

Bowtie2 should work as well as Bowtie1, since bowtie2 should not return a gapped alignments if a contiguous one exists. Generally speaking. However, some people prefer Bowtie1 since you can guarantee "--best --strata", which Bowtie2 does not do.

There are better aligners for miRNA data, though, which also include analysis of the alignment context to give a more accurate picture of expression: see MirDeep2 and ShortStack (GitHub link).

HTSeq-count will quantitate multireads if you force it to. It depends on two things: one, mapping quality of multireads (which is usually very low or zero), and two, any SAM tags indicating a multiread (which is only the NH tag, according to the documentation).

HOWEVER: only Tophat2 uses the NH tag -- Bowtie1/2 do not -- Bowtie2 uses the XS tag; not sure about Bowtie1.

But in any case it is only a SAM tag, which can be stripped. For instance, to blind htseq-count to multireads from a Tophat bam file, set "-a 0" to stop filtering on mapping quality, and strip the NH tag in-line:

samtools view in.bam | perl -pe 's/\s+NH:i:\d+//' | htseq-count -s no -a 0 -m intersection-nonempty - genes.gtf > counts.txt

If htseq-count ever uses other tags, then just strip those too.

ADD COMMENT
0
Entering edit mode
8.1 years ago

With regard to your second question, HTSeq-count will be default ignore/discard multimapping reads, see for example the FAQ for an explanation about that.

ADD COMMENT

Login before adding your answer.

Traffic: 1929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6