Question

Converting .bam file to .fasta file from a genic region

0

Entering edit mode

7.1 years ago

ricfoz ▴ 100

hello there, I am trying to retrieve .fasta files from a .bam whole neandertal chromosome. I started with a .bam file of chr6, and i have been able to get the genic region i am interested in, in a .bam file, now i need to convert that file to .fasta format in order to make further analysis in other pipelines.

I have been looking on other posts, and trying recommended scripts, but none have been useful, i would really appreciate some feedback on how to realize this task.

Greetings to all the community.

fasta gene bam • 4.0k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 7.1 years ago by ricfoz ▴ 100

0

Entering edit mode

Hello people, i just sorted my problem, i left a post describing the solution here: https://www.biostars.org/p/284674/#284701

Cheers.

ADD REPLY • link 7.1 years ago by ricfoz ▴ 100

score 0 · Answer 1 · 2017-11-18

0

Entering edit mode

7.1 years ago

michael.ante ★ 3.9k

Hi ricfoz,

In the Tophat suite, there is a tool called bam2fastx. With this you can directly convert bam to fasta format. RSeQC provides a tool called bam2fq.py, there you need to convert the resulting fastq into fasta.

Cheers,

Michael

ADD COMMENT • link 7.1 years ago by michael.ante ★ 3.9k

1

Entering edit mode

Will those scripts result in an assembly or pile up of the reads in the bam, or just return the individual reads?

ADD REPLY • link 7.1 years ago by WouterDeCoster 47k

0

Entering edit mode

They will result in a multi fasta file with one entry per read. That's what I understood from OP's question. He didn't ask for a consensus output.

ADD REPLY • link 7.1 years ago by michael.ante ★ 3.9k

0

Entering edit mode

OPs question indeed isn't unambiguous, let's wait for OP to tell us what's expected.

ADD REPLY • link 7.1 years ago by WouterDeCoster 47k

0

Entering edit mode

That's a good question .. i have been able to get the genic region i need, in a man-readable file, but it consists in individual reads, with headers and everything, piled up upon each other... what i am trying to get is the one header .fasta file, which contains all the continuous sequence of your desired gene.

the commands i gave to get this multi-single read fasta file is:

samtools bam2fq GeneFile.bam > GeneFile.fastq seqtk seq GeneFile.fastq > GeneFile.fasta

ADD REPLY • link 7.1 years ago by ricfoz ▴ 100

1

Entering edit mode

You just need the reference genome for that location? Then you don't need a bam file.

ADD REPLY • link 7.1 years ago by WouterDeCoster 47k

0

Entering edit mode

hello there

I know if i need a sequence of a gene i can get any on the web, tons of them... gene bank etc. , but i want to retrieve a Neandertal region, which i have only been able to get in .CFF3 format from ancient genome browsers. Still, i need this region in .fasta format, classic one, with one simple header with description and contiguous nucleotides from (nuc1-nuc1.x). Since Neandertal regions are easily retrieved in .bam format from the web, i need to go through a bit of help from the samtools as i have seen around.

cheers

ADD REPLY • link 7.1 years ago by ricfoz ▴ 100

0

Entering edit mode

But to what are these neanderthal reads aligned? Human genome?

ADD REPLY • link 7.1 years ago by WouterDeCoster 47k

0

Entering edit mode

originally, to get the whole genome .bam, before splitting into chromosomes, (point where i got my original.bam file), they were aligned to hg19 reference genome.

ADD REPLY • link 7.1 years ago by ricfoz ▴ 100

0

Entering edit mode

That means you need to do a de novo assembly. There are plenty of tools for that.

ADD REPLY • link 7.1 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks a lot, i am using samtools right now, but after your question, it seems like i have to install the "Tophat suite" as an extra tool package, am i right? ... i appreciate both suggestions, but a direct conversion sounds to me like the option, since a .bam > .fastq > .fasta would double a) time of conversion, and b), is more prone to errors ... since the original .bam format was already assembled, sorted and indexed.

still, i would gladly hear any feedback on these line of thinking about your two suggestions.

Cheers !

ADD REPLY • link 7.1 years ago by ricfoz ▴ 100