Small RNA sequence, how to calculate read counts? What's the criterion? Is there software to do this work?
2
2
Entering edit mode
9.9 years ago
xuling2015 ▴ 20

After alignment small RNA sequence to the reference genome using bowtie. I got SAM file, then how to calculate read counts? What's the criterion? Is there software to do this work?

sequence RNA-Seq • 9.0k views
ADD COMMENT
1
Entering edit mode

Are you interested in small RNAs in general or more specifically miRNAs? There are a lot of tools specifically meant to handle miRNAs, so if you're happy restricting yourself slightly to just those then just go with one of those tools (e.g., miRDeep2).

ADD REPLY
0
Entering edit mode

If you use Bowtie2, assure that you set the '-a' parameter. Otherwise you will only get one random hit reported, even if there are multiple mapping loci. small RNAs are known to occur in multiple copies (miRNAs, tRNAs, etc.).

Some more information about the problem: Attention: Bowtie2 And Multiple Hits

ADD REPLY
1
Entering edit mode
9.8 years ago

To directly answer your question, HTseq count, featureCounts from the subread package, and GenomicRanges summarizeOverlaps(). For example usage of the latter, see:

http://bioconductor.org/packages/release/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesHOWTOs.pdf

In your case, if you have a set of regions defining the miRNA locations on the genome, you can create a GRanges or GRangesList (see the rtracklayer package for how to import a BED file to a GRanges object) and then use summarizeOverlaps() as described. Once you have a table of counts, the typical RNA-seq tools like DESeq2, edgeR, and limma voom can be applied.

ADD COMMENT
0
Entering edit mode
9.8 years ago
Whoknows ▴ 960

You can use HTSeq for this,

And you can also do this on Fastq files with miRanalyzer2, groupReads from this link: http://bioinfo5.ugr.es/miRanalyzer/standalone.html#toc-Section-1

ADD COMMENT
0
Entering edit mode

Regarding HTSeq, one has to be quite careful with this when looking at small RNAs. Small RNAs can often exist with multiple copies in the genome and htseq (well, htseq-count) will default to ignoring these if they can't be uniquely aligned against.

ADD REPLY
0
Entering edit mode

When converting read counts of small RNA into reads per million (RPM), how to define "Total_number_mapped_reads".

For example, in one of the mapped small RNA library

2601833 reads; of these: 2601833 (100.00%) were unpaired; of these:

352919 (13.56%) aligned 0 times

362282 (13.92%) aligned exactly 1 time

1886632 (72.51%) aligned >1 times
  

86.44% overall alignment rate

Total_number_mapped_reads= 362282 + 1886632 = 2248914

But the fact that some of the "1886632" reads are mapped multiple times, the actual Total_number_mapped_reads(location) will be higher.

To compute reads per million, which "Total_number_mapped_reads" should be used. The one which is got after multimapping or the actual number of mapped reads.

thanks !!

ADD REPLY
0
Entering edit mode

Please ask a new question.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6