Entering edit mode
10.9 years ago
liu4gre
▴
210
I have a BED file (GSM1097888) like following:
chr1 17146 17163 t0062322_2 255 -
And now I am trying to calculate the RPM for miRNA based on this (Illumina HiSeq 2000 for smallRNA) data. I used miRNA annotation (hsa.gff3) from miRBase, with conversing to bed file firstly. Then I applied bedmap tool (bedmap --echo --echo-map --count hsa.bed seq.bed > result.bed) to map and count the reads for each miRNA. Finally I only get expression for 133 miRNAs, compared to more than 4000 in hsa.bed. As well, the total mapped read number is much less than the original one (14,000 vs 200,000). Would you please tell me if my analysis is correct?
Have you sorted the hsa.bed and seq.bed with the sort-bed command before doing the bedmap?
No, I didn't. Thanks for the info. After sort-bed and bedmap again, I got 246 miRNAs and the total mapped read number is ~25000. Is this normal?
What do you mean by is this normal? Do you wonder whether its normal that only 10% of the reads map to miRNAs? That differs from sample to sample I guess. Or do you want someone to do the same counting on the files and check if it is correct? then you should provide links to the files. By the way, when you want to count the amount of reads in a particular set of genes after alignment it is better to use a dedicated tool for that such as htseq count. It deals with read alignment ambiguity, paired end, stranded. Or is it dna-seq? Still you should think about these things
Did you only have 200,000 reads mapped to the genome? that's weird, normally you get million reads, or are these reads collapese already and you are not counting total number of reads, instead, you are counting total number of different sequences? Did you removed the 3' adapter? Which tool did you use to map? Maybe it would help the exact steps you did to get here.
cheers