Hi,
I have multiple bam files (aligned with reference) and bed file of multiple genes (chr, start, end). I tried as follows for one bam file which gave me only reads information in fasta file :
samtools view -b -L genes.bed 1sorted.bam > 1sorted_genes.bam
samtools fasta 1sorted_genes.bam > 1sorted_genes.bam.fasta
I would like to extract the DNA sequence of whole genes (and not only reads) from all my bam files, including the gene name. I know I can add gene name in bed file, but how gene name can be integrated into fasta file. I want to translate DNA sequences to proteins for further downstream work.
Any guidance would be appreciated.
Thanks.
EDIT1: I have paired-end bam files.
samtools fasta will not generate a reference fasta. It turns each read from a one line sam format into two line fasta format. samtools.pl should have pileup2fq, that is closer to what you want.
Thanks, but I don't need reference fasta but consensus fasta sequence of all reads belonging to a gene which I can translate to protein.