Question

Summarizing Mirna-Seq Data Based On Ucsc Hg19 Alignment Results

3

Entering edit mode

10.8 years ago

gundalav ▴ 380

I have miRNA-seq data (Single end) which I map to the whole UCSC hg19 genome. Now given the SAM output of this mapping, is there a way I can summarize the alignment over several genomic features:

Unaligned
Mature miRNA
precursor miRNA
piRNA
lincRNA
human Ribosomal RNA
snoRNA
human5S rDNA
snRNA

Namely for each of the above features how many of my reads (or percentage) are aligned?

I know CLC-BIO or Illumina inbox software possibly already have that. But I'm looking for noncommercial and tweakable way to do it.

mirna alignment genome expression • 4.0k views

ADD COMMENT • link updated 10.8 years ago by Biostar 20 • written 10.8 years ago by gundalav ▴ 380

score 4 · Answer 1 · 2014-01-29

4

Entering edit mode

10.8 years ago

Martombo ★ 3.1k

you can use biomart http://www.ensembl.org/biomart/martview or ucsc tables http://genome.ucsc.edu/cgi-bin/hgTables to get the annotations you need for the different classes of transcripts you want to study (specify the feature type with the filter option). then you can use htseq-count http://www-huber.embl.de/users/anders/HTSeq/doc/count.html to count the number of reads in your sam files that map to the annotations. you may need to convert the table you downloaded in the gff format (ucsc tables can output the gtf format directly). all the different genomic feature can be merged in the same file. in that case you can deal with overlapping features as described on the htseq-count page.

ADD COMMENT • link 10.8 years ago by Martombo ★ 3.1k

0

Entering edit mode

a correction on this: if you're interested in transcripts that have different identical copies on the genome (like I realized snRNAs have, for example), you cannot use the default options of HTSeq which discard multi-mapped reads. You should lower the value of the -a option and be also aware that HTSeq would still not count reads with the NH field indicating a multiple mapping. Even better and more easily, you could use RSEM.

ADD REPLY • link 8.5 years ago by Martombo ★ 3.1k

score 3 · Answer 2 · 2014-01-29

3

Entering edit mode

10.8 years ago

brentp 24k

I would use BEDTools. If you have a BED file for each of the items 2-9, you can use, e.g.

bedtools coverage -abam your.bam -b snoRNA.bed

and it'd be pretty simple to write a script to do that for each feature type and write a summary output.

ADD COMMENT • link 10.8 years ago by brentp 24k