Hello everyone,
I am looking for a quick bioinformatics solution to extract an alignment for each locus in 96 different bam files (separate bam file for each of my 96 samples). Each bam files contains up to 96 different loci that I mapped to using BWA to create the per-sample bam files that I currently have. I am going to use alignments for each locus (that contain all samples that aligned to that locus) in population genetics analyses in the PopGenome package in R.
I was planning on importing each bam file into Geneious under a folder for each sample, obtaining the consensus sequence for each locus within each sample folder, and then copying the consensus sequence for each sample into a locus-specific folder. Once all consensus sequences for a particular locus are copied into a folder, I would align them and then export the alignment as a fasta file for use in R.
My workflow in Geneious works just fine, except it is going to take a very long time. I am new to bioinformatics and am sure that I am missing an easy solution to this problem that would be much faster than using Geneious!
Thank you for any help you can offer.
I don't think my scripting skills are quite up to snuff to be able to figure out the commands for this just using bash. I have been using samtools quite a bit, though. Do you have any suggestions for a particular command in samtools? I read through the manual several times but I can't find anything that would clearly do this task (but I'm sure it's because I'm missing it)
It looks like the PopGenome package can also take VCF files, have you considered that instead? You could then use a more standard "align to the genome and call variants with GATK/samtools/freeBayes" protocol.