I have some SOLiD data from way back and aligned the reads in colorspace to reference genome my_ref via MosaikAligner
. The output of said assembler is a BAM
file that, if sorted, indexed (both via samtools
) and eventually converted to SAM
, exhibits information in colorspace.
user$ samtools view -h -o aln.sam aln.sorted.bam
user$ head aln.sam
@HD VN:1.0 SO:coordinate
@SQ SN:my_ref LN:152765 M5:d7d7fd29f9460f1026bf65295053b8d9
@RG ID:ZVHAC3A4NMS SM:unknown PL:solid
@PG ID:MosaikAligner VN:2.2.26 CL:/home/user/git/MOSAIK/bin/MosaikAligner -in my_data.dat -out aln -ia my_ref.dat.cs -ibs my_ref.dat
718_406_864 113 my_ref 380 8 1S38M1S = 548201 NAAAAAAGGAGCAATAGCTTCCCTCTTGTTTTATCAAGAN !>>99;==<667783322<88==4++55:))>>A;;>AB! RG:Z:ZVHAC3A4NMS NM:i:0 MD:Z:38 ZA:Z:<@;8;;;1;;><&;7;;;1;34M1S;34> CS:Z:T0220123300011022200202323301322020000032 CQ:Z:!@BA>;AA>?):;5>+4B=>8<?233=8776<A=>;9?>=3
800_1371_875 113 my_ref 380 8 1S38M1S = 520173 NAAAAAAGGAGCAATAGCTTCCCTCTTGTTTTATCAAGAN !==7114455226:**22;::==337:88!!==>999@@! RG:Z:ZVHAC3A4NMS NM:i:0 MD:Z:38 ZA:Z:<@;8;;;1;;><&;7;;;1;34M1S;34> CS:Z:T0220123300011022200202323301322020000032 CQ:Z:!?@@99>@==!98:;73?=>:;<22*<:62:5?4>17A==;
Now, I would like to call the consensus of the aligned reads via samtools mpileup
, but find that samtools
cannot map any of the reads.
user$ samtools mpileup -uf my_ref aln.sorted.bam
BCF##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed",IDX=0>
##samtoolsVersion=1.3.1+htslib-1.3.1
# multiple_lines_here_starting_with_##INFO=
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT unknown
<mpileup> Set max per-file depth to 8000
user$
I presume that this error is the result of colorspace information contained within the BAM
file and that mpileup
is unsuccessful due to this circumstance. How would you proceed here?
NOTE 1: Said BAM
file can be open and viewed without problem via BamView
, which I presume indicates the integrity of the BAM
file.
EDIT 1: To clarify my post: I am primarily interested in the question if the inability of 'samtools mpileup' to map the reads against the very reference genome that the reads were aligned to could be the result of the BAM file containing colorspace info, or do I have to consider alternative reasons?