Sam/Bam: Extract Only Alignment Reads That Contain A Specific Position
3
5
Entering edit mode
12.3 years ago
Matt W ▴ 250

I am looking to extract all of the reads that overlap a particular SNP location. For example, if I have a SNP at 387840, I want any read containing that position to be output to a new file.

I have tried:

samtools view -o filteredresults.sam alignment.sorted.bam chr2:387840

Which outputs filteredresults.sam with 1636 lines. Yet according to the output of:

samtools mpileup -f reference.fa -d 1000 alignment.sorted.bam > alignment.mp

This location should only have 16 reads overlapping.

Is samtools wrong for this purpose? Or are my filtering criteria off? Or am I misinterpreting the mpileup output?

Thanks in advance!

P.S. I know my questions are fairly basic now, but I really appreciate everyone's help!

EDIT: alignment.mp at position 387840 says:

chr2 387840 C 16 G$G$,..,,...,..,^~, %!#BIIC#III+HI#B
snp sam bam • 16k views
ADD COMMENT
4
Entering edit mode
12.3 years ago

I think if you:

samtools view -o filteredresults.sam alignment.sorted.bam chr2:387840

That will output all reads after 387840. Try:

samtools view -o filteredresults.sam alignment.sorted.bam chr2:387840-387840
ADD COMMENT
0
Entering edit mode

That did the trick! Thanks!

ADD REPLY
0
Entering edit mode

You're right, and a careful reading of the manual confirms that this is intended. I definitely wouldn't have guessed that behavior, though! I guess when the OP said the reads were actually around the target coordinates, they were wrong or very unlucky that it was near the end of the chromosome.

There still might be a slight difference in counts due to the filtering I describe in my answer, but this is definitely the main problem for being orders of magnitude off.

ADD REPLY
0
Entering edit mode

Thanks. It seems the filteredresults.sam output does not have header, how could I output a sam file with header?

ADD REPLY
0
Entering edit mode
12.3 years ago

Just checking, but did you literally put in region:pos?

samtools view -o filteredresults.sam alignment.sorted.bam region:387840

rather than:

samtools view -o filteredresults.sam alignment.sorted.bam Chr$:387840
ADD COMMENT
0
Entering edit mode

No, I put in the specific chromosome, I was just keeping it general.

ADD REPLY
1
Entering edit mode

Are the coordinates in the output around 387840?

ADD REPLY
0
Entering edit mode

I checked, and yes, they are. Why does my mpileup file output 16? I'm missing a key piece of info somewhere.

ADD REPLY
0
Entering edit mode
12.3 years ago
matted 7.8k

Mpileup performs more filtering than view (which does none by default). From the documentation, you can see that the default mpileup invocation filters by base quality greater than 13 (-Q) and skips anomalous read pairs (without the -A flag). It's possible the base quality filtering is overly aggressive if you were supposed to use the -6 flag and didn't, or vice versa. The quick way to get unfiltered read coverage is to use samtools depth.

ADD COMMENT

Login before adding your answer.

Traffic: 1301 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6