Converting BAM to FASTA
1
0
Entering edit mode
2.6 years ago
Tastulek • 0

Hello Biostars,

I have BAM files with many contigs and also many gaps, as given in the Figure.

enter image description here

What I want to do is to get a consensus sequence for each of the contigs, remove gaps, and splice the consensus sequences together to form a FASTA sequence. I want to use the resulted FASTA sequence for building phylogenetic trees.

If you can share any techniques for doing this, it would be greatly appreciated!

BAM genome assembly human FASTA • 930 views
ADD COMMENT
2
Entering edit mode
2.6 years ago

Well, scaffolding of eukaryotic genomes respectively microbial meta-genome assembly each is an art by itself, so depending on your organisms and type of data (long reads, short reads) there is no one size fits all answer to your question. The nf-core pipelines mag or bacass might be a good start, potentially.

BBTools has the powerful Tadpole workflow that however requires a bit more experience in choosing the right parameters. For bacterial metagenomics, also have a look at the tools developed by the Huttenhower lab if there is something that suits your application.

ADD COMMENT
0
Entering edit mode

Thank you for the information!

But the contigs are human Ychr contigs, not bacterial. I did not mention this in my original post, sorry for that.

ADD REPLY
1
Entering edit mode

Unfortunately, I can't help you on that one then.

Assemblies of the centromeric regions of Y chromosomes are extremely challenging due to long tracts of near-identical tandem repeats and usually require a mix of different sequencing technologies (short reads, long reads + chromosome conformation capture data) and expert curation. Obtaining a high-quality linear assembly of a Y chromosome is usually a publication on its own.

It seems that there are tools out there that might be somewhat useful, yet this is clearly not a task for which you can throw a tool at a bunch of files and then end up with perfect results using the default parameters. If I were you, I would make sure that you have such high-quality source data available for all organisms that you are interested in and then strike up a collaboration with experts in the field of Y chromosome assembly or revise your approach. Anything else will likely not produce any meaningful results.

ADD REPLY
0
Entering edit mode

Hi Matthias, Thank you very much for the insightful comments!

ADD REPLY

Login before adding your answer.

Traffic: 2268 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6