Question

Has Anyone Successfully Used The Bioconductor Shortread Package To Read In Pacbio Read Alignments, In Sam/Bam Format?

2

Entering edit mode

12.6 years ago

Joe Fass ▴ 180

I've aligned PacBio subreads (fasta) to a reference using Heng Li's BWA (the long read aligner - BWA-SW, using the command 'bwa bwasw'), and I'm trying to read these in using

> r = readAligned("test.bam", type="BAM")

Now, I know there's a potential problem here, in that BWA-SW will produce multiple lines with the same read id when it finds disconnected alignment blocks within reads (which should be the case for my data - transcript reads mapped to the genome). However, the resulting readAligned object:

> r
class: AlignedRead
length: 59664 reads; width: 126..12721 cycles
chromosome: NA NA ... NA NA
position: NA NA ... NA NA
strand: NA NA ... NA NA
alignQuality: NumericQuality
alignData varLabels: flag

has 59,664 reads, which is more than the number of unique read ids in the file, but fewer than the number of lines in the file. Furthermore, a test file containing 1 single alignment read in as so:

> r
class: AlignedRead
length: 0 reads; width:  cycles
chromosome:
position:
strand:
alignQuality: NumericQuality
alignData varLabels: flag

Can anybody suggest where I might have gone wrong?

session info:

> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicFeatures_1.6.9 AnnotationDbi_1.16.19 Biobase_2.14.0       
 [4] ShortRead_1.12.4      latticeExtra_0.6-19   RColorBrewer_1.0-5   
 [7] Rsamtools_1.6.3       lattice_0.20-6        Biostrings_2.22.0    
[10] GenomicRanges_1.6.7   IRanges_1.12.6       

loaded via a namespace (and not attached):
 [1] BSgenome_1.22.0    DBI_0.2-5          RCurl_1.91-1       RSQLite_0.11.1    
 [5] XML_3.9-4          biomaRt_2.10.0     bitops_1.0-4.1     grid_2.14.1       
 [9] hwriter_1.3        rtracklayer_1.14.4 tools_2.14.1       zlibbioc_1.0.1

r bioconductor short bam • 4.4k views

ADD COMMENT • link 12.6 years ago by Joe Fass ▴ 180

score 3 · Answer 1 · 2012-04-03

3

Entering edit mode

12.6 years ago

Joe Fass ▴ 180

Ahem. From the help for readAligned():

[?]

So, no PacBio BAM files in ShortRead. Leaving this up as a testament to not thoroughly reading the documentation.

ADD COMMENT • link 12.6 years ago by Joe Fass ▴ 180

0

Entering edit mode

Rsamtools gives more flexible access to the BAM files, so you'll be able to read the data in, but probably need to do your own further manipulation.

ADD REPLY • link 12.6 years ago by Martin Morgan ★ 1.6k