I have been tinkering with a new tool for BEDTools (it's currently alpha, so it is in the development repo) that attempts to do this. I pushed it to the public repository, if you'd like to play with it. The tool is called "tagBam" and works as follows. You supply a BAM file, and a list of annotation files and associated labels. For each alignment, it will search for overlaps with each annotation file. Each time there is an overlap, the label you supply will be appended to a custom "YB" tag that will be added to the BAM alignment. For example:
$ tagBam -i aln.bam -files exons.bed introns.bed cpg.bed utrs.bed \
-tags exonic intonic cpg utr \
> aln.tagged.bam
For alignments that have overlaps, you should see new BAM tags like "YB:Z:exonic", "YB:Z:cpg;utr"
I should emphasize that this is experimental, but I am hoping to make it available in the next release.
As always, comments and suggestions are welcome.
Another option is to write a custom script using existing interfaces such as pysam, the BioPerl BAM interface, Picard, samtools C-API, bamtools C++-API, etc. These solutions are nice because you will inevitably run up against a nuanced rule for the annotations that can't be addressed in a one-stop-shop. In particular, if you are a Python person, the pybedtools suite or the HTSeq suite are good options.
I would add to the list: coding, 5' UTR, 3' UTR, synonymous & non-synonymous
i will also accept an expansion of the list of stuff everyone wants to know
Uh, is this from RNASeq data?
anything, DNA-Seq, RNA-Seq, ChIP-Seq, exome, whatever
https://github.com/databio/GenomicDistributions might be worth adding here