Entering edit mode
14.7 years ago
Jeremy Leipzig
22k
I could use some help filling the gaps on getting basic hit information from Bioconductor's readAligned, part of the ShortRead package.
If you have solutions, additions, or improvements to this mini-cookbook feel free to respond or edit.
Here is a prebuilt index of e.coli (care of CBCB@UMD) and a few sample reads to get you started
bowtie e_coli -a --solexa-quals reads.fq > output.bowtie
In R:
library("ShortRead")
alignedReads <- readAligned("./", pattern="output.bowtie", type="Bowtie")
#how many reads did I attempt to align
#please fill me in on this one
#how many reads aligned (one or more times)
length(unique(id(alignedReads)))
#how many hits were there?
length(alignedReads)
#how many reads produced multiple hits
length(unique(id(alignedReads[srduplicated(id(alignedReads))])))
#how many reads produced multiple hits at the best strata?
#please fill me in on this one
#how many reads aligned uniquely (with exactly one hit)
length(unique(id(alignedReads)))-length(unique(id(alignedReads[srduplicated(id(alignedReads))])))
#how many reads aligned uniquely at the best strata (the other hits were not as good)
#please fill me in on this one
#how many unique positions were hit? what if I ignore strand?
#please fill me in on this one
#how many converging hits were there (two query sequences aligned to the same genomic position)
#please fill me in on this one