creating a consensus fasta that is same length as the reference
1
1
Entering edit mode
6.8 years ago
vulpecula ▴ 30

Hi all,

I have been trying to create a same-length consensus fasta by mapping reads to a reference genome for input into angsd as the ancestral sequence. However, the scaffolds of the resulting consensus always seem to be a little short, likely due to indels etc.. I have iterated through several versions of code (bcftools call -c or bcftools call -m, not using any data filters, etc.). Is there a specific path or set of filters I should use to be doing this? I found some former issues documented of this in previous versions of samtools/bcftools that had to do with an older version of the program but that doesn't seem to be the issue here. I also tried to use LAST for mapping, but that didn't seem to sort it out.

Apologies for the juvenile question and thanks for any help.

mapping alignment angsd • 3.0k views
ADD COMMENT
0
Entering edit mode

It is not clear what the desired course should be if there are indels in the genome. For example, if the genome is missing 10 bases from the middle how would you "make" that full size again? Would you want to insert 10 bases?

ADD REPLY
0
Entering edit mode

Thanks for your reply! Yes--I agree this is problematic. I think that for indels it would be best to have N's where they are in the consensus sequence. This is discussed here (http://seqanswers.com/forums/showthread.php?p=129352#post129352 the thread is a bit long, sorry, comment #14-16 is relevant to this issue). There is some reference to making a "dummy sam", but I am not exactly sure what they mean by that.

ADD REPLY
1
Entering edit mode

This is such an unusual problem that there might not be a tool do exactly what you need.

One perhaps the simplest option would be to align your best consensus to the old with bwa mem then run sam2pairwise

https://github.com/mlafave/sam2pairwise

Now you have the two sequences lined up. At that point the "hard" part is done. But you'll still need to write a simple but custom script in say Python to navigate the two in parallel and produce your final consensus.

ADD REPLY
0
Entering edit mode
3.1 years ago
sagitaninta ▴ 20

I came to this post as I have the same need (making fasta of same length with reference chromosome) and found my own solution for them who might need it in the future.

I found that angsd -doFasta can do the magic (http://www.popgen.dk/angsd/index.php/Fasta). It call consensus fasta from your BAM:

angsd -i $BAM -doFasta 1 -doCounts 1 -out ${BAM%.bam}

Depending on the BAM file size it can only take some minutes. The resulting fasta will have chromosome length the same as the reference chromosome because it adds Ns at the end of the chromosome until the length of the reference chromosome available on the BAM header.

Other that angsd -doFasta, my other attempts are htsbox pileup which is more straightforward and not calling indels (I don't know how angsd -doFasta treat indels) but always calls several bases short from the reference, this needs a custom script to fill the end of chromosome so it is the same length with the reference chromosome.

ADD COMMENT

Login before adding your answer.

Traffic: 1919 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6