Extract reads with variant at given position
5
2
Entering edit mode
10.0 years ago

I want to be able to extract, from BAM files, the reads that, for a given position, contain the variant (and conversely but separately, the ones that contain the reference). I've played around with various tools and searched the tubes, without finding any convenient way to do this. It doesn't seem like too odd a task to perform, so maybe my google fu is just failing me here.

sequencing alignment • 8.8k views
ADD COMMENT
0
Entering edit mode

Are you looking for all reads containing some variant at, say, chr1:1000 or instead all those that contain a variant in base 20 of their alignment? I assume the former, which is much quicker to do, but figured I'd ask.

ADD REPLY
0
Entering edit mode

The former, if I understand you correctly. I have a specific variant and I want to examine the reads that support that variant and compare to the reads that instead show the reference allele at the same position.

ADD REPLY
4
Entering edit mode
10.0 years ago

Another option, for querying any ad hoc range (chrN:X-Y):

$ echo -e "chrN\tX\tY" | bedops --element-of 1 <(bam2bed < reads.bam) - > overlapping_reads.bed

To query any given set of genomic ranges in variants.bed:

$ bedops --element-of 1 <(bam2bed < reads.bam) variants.bed > overlapping_reads.bed

The bam2bed tool will preserve all fields, which may be useful.

If variants are in VCF, then a second bash subshell to call vcf2bed can convert them to BED, e.g.:

$ bedops --element-of 1 <(bam2bed < reads.bam) <(vcf2bed < variants.vcf) > overlapping_reads.bed
ADD COMMENT
2
Entering edit mode
10.0 years ago

I would use my tool sam2tsv to extract the name of the reads having (or not) the mutation.

ADD COMMENT
2
Entering edit mode

Since this can take input via a pipe, it should be added that the simplest method would be to take the BAM file, sort and index it, and then:

samtools view -bu some_file.bam chr1:123 | java -jar dist/sam2tsv.jar -r ref.fa stdin > blah.txt
ADD REPLY
1
Entering edit mode

@Devon: added -bu

ADD REPLY
0
Entering edit mode

Good catch!

ADD REPLY
0
Entering edit mode

Why is it better to use a BAM file as stdin?

ADD REPLY
1
Entering edit mode
4.5 years ago
sacha ★ 2.4k

For a specific variant, you can use bamql

For instance, if you want to extract all reads from a bam file :

  • which support the alternate 'A' in snp : chr17:29827429 G>A

    bamql -f yourfile.bam 'chr(17) & nt_exact(29827429,A)' -o A.bam
    
  • which support the reference 'G' in snp : chr17:29827429 G>A

    bamql -f yourfile.bam 'chr(17) & nt_exact(29827429,G)' -o B.bam
    

I guess you can loop over variant file and do this for each variant. You can also filter with mate information. See documentation Here

enter image description here

ADD COMMENT
0
Entering edit mode

man this! thank you very much! this is specifically what i needed :)

ADD REPLY
0
Entering edit mode

BAMQL is unfortunetely not very HPC friendly. I kinda found this maybe it helps.

https://github.com/broadinstitute/VariantBam

ADD REPLY
0
Entering edit mode
10.0 years ago

In the off chance that you really do need to have the full alignments in SAM format in separate files, then you can code something up in either python with pysam or C with HTSlib. In both cases you can use the pileup functions to just grab all alignments that overlap a given position (or set of them) and then iterate over the results. The pileup functions will tell you which base of each alignment overlaps a given position, so you can check what genotype it supports and treat it accordingly.

Edit: But really, if you think you want the full alignments in separate files then you should probably rethink what you're doing. The two answers posted before mine are vastly simpler and should suffice for most normal needs.

ADD COMMENT
0
Entering edit mode
10.0 years ago

Thanks for the input, guys. I've realized that bam-readcount ultimately is the most appropriate tool for my purposes.

ADD COMMENT

Login before adding your answer.

Traffic: 3467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6