I started with a 9GB file of a human Xth chromosome with a .bam format, and I am trying to get a specific genic region in .fasta classic format (you know, a one header file, starting with ">", with description on header, and a single line, of continuous tandem single nucleotides).
I have been able to retrieve the reads of the genic region I want in bam format. I share the path I have followed:
samtools sort OriginalFile.bam -o OriginalFile.sort.bam
samtools inex OriginalFile.sort.bam
samtools view -b OriginalFile.sort.bam RegName:XX-XX+1 > GeneName.bam
after this I got the GeneName.bam file which contains all the reads on which I am interested in, but I can't run the phylogenetic tool I am trying to pipe the GeneName.bam sequence to ... in order to follow my workpath, I need to transform this GeneName.bam file, into this classic .fasta format I described lines ago.
I tried a couple of pipelines in order to achieve this, but what I have got as a retrieved .fasta file, is a man readable file, but with all the single fastq reads, with headers and everything, stacked on top of each other. I need to get the consensus of this, with one header and only contiguous nucleotides.
I got GeneName.fasta file running:
samtools bam2fq GeneName.bam > GeneName.fasta
or
samtools bam2fq GeneName.bam > GeneName.fastq
samtools seqtk seq -A GeneName.fastq > GeneName.fasta # (i installed bowtie2, boost, tophat2, seqtk, and bcf tools, as they seem to complement each other's works at some times)
and I got the same result on both, analogous file.
With this line of thinking, I thought I may have to run mpileup to the GeneName.bam file, before transforming to fasta, but mpileup gives as a result .bcf format... still, this last assumption is just a form of discussing my problem.
Have anyone out there found the solution for this, can a .bam file converted to classic .fasta format, or can help in any way?
Greetings
See also Converting .bam file to .fasta file from a genic region
Yes, that post is mine... i have posted a couple of times on the matter as i have been working through the problem ... now i am at the point at which i can retrieve the genicregion.fasta file, but it gives only single fastq reads piled up upon each other... i want a .fasta format, which is completely different from .fastq single read.
now that you got fastq file, try converting to fasta file using tools such as: https://bioconda.github.io/recipes/ucsc-fastqtofa/README.html. This would give you each read as single fasta record. You can download the binaries from http://hgdownload.cse.ucsc.edu/admin/exe/ and select as per your platform.
That's not what OP needs, rather a de novo assembly. OP wants a "reference" fasta for that region if I understood correctly, or create a consensus sequence of the bam file.