Converting .bam file to .fasta file from a genic region
1
0
Entering edit mode
7.1 years ago
ricfoz ▴ 100

hello there, I am trying to retrieve .fasta files from a .bam whole neandertal chromosome. I started with a .bam file of chr6, and i have been able to get the genic region i am interested in, in a .bam file, now i need to convert that file to .fasta format in order to make further analysis in other pipelines.

I have been looking on other posts, and trying recommended scripts, but none have been useful, i would really appreciate some feedback on how to realize this task.

Greetings to all the community.

fasta gene bam • 4.0k views
ADD COMMENT
0
Entering edit mode

Hello people, i just sorted my problem, i left a post describing the solution here: https://www.biostars.org/p/284674/#284701

Cheers.

ADD REPLY
0
Entering edit mode
7.1 years ago
michael.ante ★ 3.9k

Hi ricfoz,

In the Tophat suite, there is a tool called bam2fastx. With this you can directly convert bam to fasta format. RSeQC provides a tool called bam2fq.py, there you need to convert the resulting fastq into fasta.

Cheers,

Michael

ADD COMMENT
1
Entering edit mode

Will those scripts result in an assembly or pile up of the reads in the bam, or just return the individual reads?

ADD REPLY
0
Entering edit mode

They will result in a multi fasta file with one entry per read. That's what I understood from OP's question. He didn't ask for a consensus output.

ADD REPLY
0
Entering edit mode

OPs question indeed isn't unambiguous, let's wait for OP to tell us what's expected.

ADD REPLY
0
Entering edit mode

That's a good question .. i have been able to get the genic region i need, in a man-readable file, but it consists in individual reads, with headers and everything, piled up upon each other... what i am trying to get is the one header .fasta file, which contains all the continuous sequence of your desired gene.

the commands i gave to get this multi-single read fasta file is:

samtools bam2fq GeneFile.bam > GeneFile.fastq seqtk seq GeneFile.fastq > GeneFile.fasta

ADD REPLY
1
Entering edit mode

You just need the reference genome for that location? Then you don't need a bam file.

ADD REPLY
0
Entering edit mode

hello there

I know if i need a sequence of a gene i can get any on the web, tons of them... gene bank etc. , but i want to retrieve a Neandertal region, which i have only been able to get in .CFF3 format from ancient genome browsers. Still, i need this region in .fasta format, classic one, with one simple header with description and contiguous nucleotides from (nuc1-nuc1.x). Since Neandertal regions are easily retrieved in .bam format from the web, i need to go through a bit of help from the samtools as i have seen around.

cheers

ADD REPLY
0
Entering edit mode

But to what are these neanderthal reads aligned? Human genome?

ADD REPLY
0
Entering edit mode

originally, to get the whole genome .bam, before splitting into chromosomes, (point where i got my original.bam file), they were aligned to hg19 reference genome.

ADD REPLY
0
Entering edit mode

That means you need to do a de novo assembly. There are plenty of tools for that.

ADD REPLY
0
Entering edit mode

Thanks a lot, i am using samtools right now, but after your question, it seems like i have to install the "Tophat suite" as an extra tool package, am i right? ... i appreciate both suggestions, but a direct conversion sounds to me like the option, since a .bam > .fastq > .fasta would double a) time of conversion, and b), is more prone to errors ... since the original .bam format was already assembled, sorted and indexed.

still, i would gladly hear any feedback on these line of thinking about your two suggestions.

Cheers !

ADD REPLY

Login before adding your answer.

Traffic: 1943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6