I've aligned PacBio subreads (fasta) to a reference using Heng Li's BWA (the long read aligner - BWA-SW, using the command 'bwa bwasw'), and I'm trying to read these in using
> r = readAligned("test.bam", type="BAM")
Now, I know there's a potential problem here, in that BWA-SW will produce multiple lines with the same read id when it finds disconnected alignment blocks within reads (which should be the case for my data - transcript reads mapped to the genome). However, the resulting readAligned object:
> r class: AlignedRead length: 59664 reads; width: 126..12721 cycles chromosome: NA NA ... NA NA position: NA NA ... NA NA strand: NA NA ... NA NA alignQuality: NumericQuality alignData varLabels: flag
has 59,664 reads, which is more than the number of unique read ids in the file, but fewer than the number of lines in the file. Furthermore, a test file containing 1 single alignment read in as so:
> r class: AlignedRead length: 0 reads; width: cycles chromosome: position: strand: alignQuality: NumericQuality alignData varLabels: flag
Can anybody suggest where I might have gone wrong?
session info:
> sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.6.9 AnnotationDbi_1.16.19 Biobase_2.14.0 [4] ShortRead_1.12.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 [7] Rsamtools_1.6.3 lattice_0.20-6 Biostrings_2.22.0 [10] GenomicRanges_1.6.7 IRanges_1.12.6 loaded via a namespace (and not attached): [1] BSgenome_1.22.0 DBI_0.2-5 RCurl_1.91-1 RSQLite_0.11.1 [5] XML_3.9-4 biomaRt_2.10.0 bitops_1.0-4.1 grid_2.14.1 [9] hwriter_1.3 rtracklayer_1.14.4 tools_2.14.1 zlibbioc_1.0.1
Rsamtools gives more flexible access to the BAM files, so you'll be able to read the data in, but probably need to do your own further manipulation.