Hi,
I am using samtools view to get reads covering a specific position. Howerver, these reads do not match the reads with the same id in the raw sequence files.
For example, say I run samtools view <file>.bam <position> I get a read with id x and sequence AAAA... Then, looking for reads with id x in the raw sequence files I get a read with sequence GCTTA...
Is samtools view reliable for a given position? Is it possible that my read ids got mixed up during mapping?
Thanks,
Daniel
There are indeed downsides with regard to reverse complement, but there are multiple benefits you have overlooked. 1) better compression. 2) Easier for programmers. If BAM kept reads in the original strand, every pileup, alignment viewer, variant caller, NM/MD recomputer and quality recalibrator would have to reverse complement half of reads. 3) Making the sequence consistent with CIGAR on the reference strand. Note that for variant calling, you can hardly use CIGAR on the read strand; that way you would need to left align gaps for forward reads and right align gaps for reverse reads, which is hugely confusing. 4) Easier for eyeballing.
There was a long discussion on how to keep paired-end reads. We evaluated alternatives like requiring /[12] in read names, adding a new fixed column, using tags, etc. The consensus is the current SAM way. I have concerns myself, but I see that is the best we could come up with. If you have better ideas, let me know. Also on the fasta/fastq format, it is wrong to say "truncating the names at the first space". Everything after the first space is not part of the read name. You never truncate names at spaces.
Thanks for your reply! So with my Ids truncated at the space, there is no way to tell which read came from my R1 or R2 files? Except for checking the sequence?
No, you can find that out by decoding the bitflag (second field). This is a decimal integer. If you translate it into binary, then if the 8th bit is set, it was read 2; otherwise, it was read 1.
You can do this in Windows 7 (and presumably other versions) with the calculator. Got to view -> programmer. Enter the number (say, 231), then switch from decimal to binary, and it will display the number in binary (11100111). In this case, the 8th bit is set, meaning it came from read 2.