Tools to extract per-locus alignments from bam files aligned using bwa
1
2
Entering edit mode
7.5 years ago

Hello everyone,

I am looking for a quick bioinformatics solution to extract an alignment for each locus in 96 different bam files (separate bam file for each of my 96 samples). Each bam files contains up to 96 different loci that I mapped to using BWA to create the per-sample bam files that I currently have. I am going to use alignments for each locus (that contain all samples that aligned to that locus) in population genetics analyses in the PopGenome package in R.

I was planning on importing each bam file into Geneious under a folder for each sample, obtaining the consensus sequence for each locus within each sample folder, and then copying the consensus sequence for each sample into a locus-specific folder. Once all consensus sequences for a particular locus are copied into a folder, I would align them and then export the alignment as a fasta file for use in R.

My workflow in Geneious works just fine, except it is going to take a very long time. I am new to bioinformatics and am sure that I am missing an easy solution to this problem that would be much faster than using Geneious!

Thank you for any help you can offer.

alignment • 3.0k views
ADD COMMENT
1
Entering edit mode
  • use linux
  • use bash
  • use cmd-line tools like samtools
  • use loops
  • don't use commercial GUIs
ADD REPLY
0
Entering edit mode

I don't think my scripting skills are quite up to snuff to be able to figure out the commands for this just using bash. I have been using samtools quite a bit, though. Do you have any suggestions for a particular command in samtools? I read through the manual several times but I can't find anything that would clearly do this task (but I'm sure it's because I'm missing it)

ADD REPLY
1
Entering edit mode

It looks like the PopGenome package can also take VCF files, have you considered that instead? You could then use a more standard "align to the genome and call variants with GATK/samtools/freeBayes" protocol.

ADD REPLY
1
Entering edit mode
7.5 years ago
Michael 55k

Not totally sure what you want to achieve and why, but samtools can easily filter bam files by position, say your locus is defined as chr3:1000-2000. If you want to do this for all bam files in a directory, this can be done in bash like so:

 LOCI="chr3:1000-2000 chr2:1-10" # etc. samtools supports multiple loci
  for F in *.bam ; do
      samtools view -bh $F $LOCI > $F.filtered.bam
  done

Remark: If you generate a consensus sequence only, you will lose information on heterozygosity.

ADD COMMENT

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6