Obtaining The Consensus Sequence From A Bam File In Fasta
2
0
Entering edit mode
11.8 years ago
Anima Mundi ★ 2.9k

Hello everybody,

I am dealing with BAM files for the first time, and I had some problems to handle them. My goal is to obtain the whole consensus sequence (between sequencing folds of a genomic range from the same sample) in FASTA format. My first approach has been to find a "ready to go" script/one line command (i. e. I had a look here Convert bam file to fasta file ) to make the conversion, but I failed. This is what I got trying to use Samtools:

[main_samview] fail to open "filename.bam" for reading.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "R_2:164876791-164886791.bam".

I tried to figure out how a BAM file is structured, but having a look with a text editor I saw that the only explicit information are the non-assembled reads, and it would take a while to obtain the consensus starting from there (and, I guess, there would be no point anymore to use a BAM file).

As a second choice, I tried to use some more graphically oriented tools, such Artemis. Until now, I had problems even to make one of this program work (but this are hopefully physiological issues which I can probably solve myself); furthermore, I did not find a way to extract a consensus from one of the samples pointed here http://www.sanger.ac.uk/resources/software/artemis/ngs/

What would you suggest? Is there an easy way to obtain what I need, while I get more confident with the format? I hope the question is clear enough.

bam fasta consensus msa genome • 6.9k views
ADD COMMENT
1
Entering edit mode
11.8 years ago

The post you linked to is about someone doing something else. They aren't trying to make a single consensus file; they are trying to make a multi-fasta of every individual read.

There is no plug-and-play simple script that will take any .bam and make a perfect consensus fasta, because doing that requires making judgement calls about all the reads that don't perfectly match your consensus, of which there will be a lot.

But you can use samtools/bcftools to make an all-points vcf from your .bam, and then the vcf2fq command in vcfutils will try to make a consensus taking into account the called SNPs.

ADD COMMENT
0
Entering edit mode
11.8 years ago

You are using a file in the wrong format.

Make sure that you understand the difference between text formats like SAM and binary formats like BAM. Once you have a correct file the advice that you got can be applied directly.

ADD COMMENT
0
Entering edit mode

So, if I understood well, my data are currently in SAM format. In fact, using Editra to display them shows tab-delimited fields which may correspond to the description fields of each read. Also, using the file command gives me this output: "ASCII text, with very long lines", so I am not dealing with binary files. Still, I don't see any header line and, by the way, trying to convert my files to BAM (samtools view -b -S myfile.sorted.bam) leads to this error: "[samopen] no @SQ lines in the header. [sam_read1] missing header? Abort!".

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Ok, thank you. I think this particular issue is solved; still I have some troubles, I will open a new question in case I am not able to solve them out.

ADD REPLY

Login before adding your answer.

Traffic: 1785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6