Summarizing Mirna-Seq Data Based On Ucsc Hg19 Alignment Results
2
3
Entering edit mode
10.8 years ago
gundalav ▴ 380

I have miRNA-seq data (Single end) which I map to the whole UCSC hg19 genome. Now given the SAM output of this mapping, is there a way I can summarize the alignment over several genomic features:

  1. Unaligned
  2. Mature miRNA
  3. precursor miRNA
  4. piRNA
  5. lincRNA
  6. human Ribosomal RNA
  7. snoRNA
  8. human5S rDNA
  9. snRNA

Namely for each of the above features how many of my reads (or percentage) are aligned?

I know CLC-BIO or Illumina inbox software possibly already have that. But I'm looking for noncommercial and tweakable way to do it.

mirna alignment genome expression • 4.0k views
ADD COMMENT
4
Entering edit mode
10.8 years ago
Martombo ★ 3.1k

you can use biomart http://www.ensembl.org/biomart/martview or ucsc tables http://genome.ucsc.edu/cgi-bin/hgTables to get the annotations you need for the different classes of transcripts you want to study (specify the feature type with the filter option). then you can use htseq-count http://www-huber.embl.de/users/anders/HTSeq/doc/count.html to count the number of reads in your sam files that map to the annotations. you may need to convert the table you downloaded in the gff format (ucsc tables can output the gtf format directly). all the different genomic feature can be merged in the same file. in that case you can deal with overlapping features as described on the htseq-count page.

ADD COMMENT
0
Entering edit mode

a correction on this: if you're interested in transcripts that have different identical copies on the genome (like I realized snRNAs have, for example), you cannot use the default options of HTSeq which discard multi-mapped reads. You should lower the value of the -a option and be also aware that HTSeq would still not count reads with the NH field indicating a multiple mapping. Even better and more easily, you could use RSEM.

ADD REPLY
3
Entering edit mode
10.8 years ago
brentp 24k

I would use BEDTools. If you have a BED file for each of the items 2-9, you can use, e.g.

bedtools coverage -abam your.bam -b snoRNA.bed

and it'd be pretty simple to write a script to do that for each feature type and write a summary output.

ADD COMMENT

Login before adding your answer.

Traffic: 1820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6