Hi.
I am trying to use groHMM to analyze GRO-seq data. I followed their tutorial on bioconductor and was able to get everything to work properly. When I tried this with my own data, it threw up errors. So I went to look at the raw bam files from the tutorial and they look like this.
Bam file from their tutorial. Note that this is a binary formatted file, not plain text.
n 16 chr7 12312 255 1M * 0 0 * *
n 16 chr7 12317 255 1M * 0 0 * *
n 16 chr7 15791 255 1M * 0 0 * *
n 16 chr7 15791 255 1M * 0 0 * *
n 16 chr7 15791 255 1M * 0 0 * *
n 16 chr7 15791 255 1M * 0 0 * *
n 16 chr7 15793 255 1M * 0 0 * *
n 16 chr7 15793 255 1M * 0 0 * *
n 16 chr7 30670 255 1M * 0 0 * *
n 16 chr7 30670 255 1M * 0 0 * *
n 16 chr7 31059 255 1M * 0 0 * *
n 16 chr7 31065 255 1M * 0 0 * *
n 16 chr7 31069 255 1M * 0 0 * *
n 16 chr7 31069 255 1M * 0 0 * *
n 16 chr7 41620 255 1M * 0 0 * *
n 16 chr7 143075 255 1M * 0 0 * *
n 16 chr7 143082 255 1M * 0 0 * *
n 16 chr7 143114 255 1M * 0 0 * *
n 16 chr7 143863 255 1M * 0 0 * *
n 16 chr7 143863 255 1M * 0 0 * *
As compared to a more traditional bam file I am familiar with.
SRR5364096.1 4 * 0 0 * * 0 0 NAAAAACATGGTACAGTGAGTGAATATACCCCCATCCCCAAAAAAAAAAAN #0<FFFFFFFFFFIIFFIFIFF<BFFIFFIFIIIIIIIIIIIIIIIIIFFF
SRR5364096.2 0 chr17 66032340 25 51M * 0 0 TTAGGCCCCGGGGGTGGCTCTGCCACCAAGTCGTAGGCGAGCGTAATAAAN BBBFFFFFFFFFFI7BFFIFIIFIFIFFFFFFFBFFFFFFBFFBBBBFBFF XT:A:U NM:i:3 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:0C46G2A0
SRR5364096.3 16 chr8 131353360 37 51M * 0 0 GATGCATTACAGCAGAAGGTAAAGTCGACACAAAACTTCCAACTGACTGCT FIFFIIFBFIFIIFIIIIIIIIFFFFFFIFIFFFFIIIFFFFFFFFFFBBB XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:51
So I figured that this is probably where the problem lies. What is the difference between these two BAM file types and how can I make my bam file look like the one from the tutorial?
Thanks a million,
Simon