Need to parse pooled bam file based on sequence
1
0
Entering edit mode
8.9 years ago

I have a bam file (pool.bam) that represents PCR amplicons of many samples, pooled into one sequencing lane. For some regions, the same amplicon was generated and sequenced for more than one sample. There is a 3 base index at the beginning of the PCR primer that is used to tell the samples apart. To look at the specific position of interest in each sample, I'd like to parse out the samples in the bam file, and use mpileup to evaluate the SNV that was amplified. I tried converting to sam file (samtools view -o pool.sam pool.bam) and using grep for the individual primers used for each sample (more pool.com | grep TCACAAGAAACCCTGCT > grep.sam), which appeared to work well. But if I try to convert to bam and index it before using mpileup, I get an error: missing SAM header; parse error at line 1; truncated file.

What is a better way to do this?

sequencing DNAseq • 2.3k views
ADD COMMENT
0
Entering edit mode
8.9 years ago
DG 7.3k

If you do samtools view -H pool.sam > grep.sam followed by your grep but using an append: more pool.com | grep TCACAAGAAACCCTGCT >> grep.sam You should have the complete sam header at the start of your file. That will solve the particular problem you are having.

That said you could potentially get some unexpected results filtered into your data by just using grep like that, and depending on your experimental design may be missing data as well. You are only looking at the primer from one direction (and thats ok, depending on your experiment you may only expect it that way), but you could also get some off-target matches. Although that would also indicate some bad primer choices for your amplicons as well.

ADD COMMENT

Login before adding your answer.

Traffic: 1666 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6