I want to be able to extract, from BAM files, the reads that, for a given position, contain the variant (and conversely but separately, the ones that contain the reference). I've played around with various tools and searched the tubes, without finding any convenient way to do this. It doesn't seem like too odd a task to perform, so maybe my google fu is just failing me here.
Are you looking for all reads containing some variant at, say, chr1:1000 or instead all those that contain a variant in base 20 of their alignment? I assume the former, which is much quicker to do, but figured I'd ask.
The former, if I understand you correctly. I have a specific variant and I want to examine the reads that support that variant and compare to the reads that instead show the reference allele at the same position.
In the off chance that you really do need to have the full alignments in SAM format in separate files, then you can code something up in either python with pysam or C with HTSlib. In both cases you can use the pileup functions to just grab all alignments that overlap a given position (or set of them) and then iterate over the results. The pileup functions will tell you which base of each alignment overlaps a given position, so you can check what genotype it supports and treat it accordingly.
Edit: But really, if you think you want the full alignments in separate files then you should probably rethink what you're doing. The two answers posted before mine are vastly simpler and should suffice for most normal needs.
Are you looking for all reads containing some variant at, say, chr1:1000 or instead all those that contain a variant in base 20 of their alignment? I assume the former, which is much quicker to do, but figured I'd ask.
The former, if I understand you correctly. I have a specific variant and I want to examine the reads that support that variant and compare to the reads that instead show the reference allele at the same position.