translating a bam file to protein sequences
0
0
Entering edit mode
4.5 years ago
Assa Yeroslaviz ★ 1.9k

we would like to translate the genomic sequences in a bam file to its protein sequence (all six frames).

We would like to do it while keeping the structure of the bam file ( or at least a sam file), if possible.

I was wondering if anyone know about a tool to do such a task.

thanks

bam protein translate samtools • 1.8k views
ADD COMMENT
1
Entering edit mode

That is something really uncommon because you have millions of short reads (nucleotides) in the BAM file.

If you a looking for a particular protein, it is better to do some assembly and look for your protein in the contigs.

ADD REPLY
0
Entering edit mode

Yes, I know this is uncommon und I couldn't find any solution for that yet. But I still look for one, if possible. It must not be a huge bam file, it can be one where we have extracted reads based on a bed file for specific positions. It is mainly meant to be used in identifying SAP (single AA polymorphism), so we might split a possible bam file for only the positions we need.

ADD REPLY
0
Entering edit mode

we would like to translate the genomic sequences in a bam file

Sequence present in each line/record?

ADD REPLY
0
Entering edit mode

Yes, We might not need it for the whole bam file (see above), but I would like to know if there is an option.

ADD REPLY
2
Entering edit mode

You can convert the reads into fasta format by using BBMap suite.

reformat.sh in=your.bam out=file.fa
reformat.sh in=your.bam out1=R1.fa out2=R2.fa

Then use sixpack or transeq from EMBOSS to get the translations in all frames.

ADD REPLY
0
Entering edit mode

Thanks I'll try that. I was hoping for a tool which can keep the bam/sam structure, but I'll keep looking.

ADD REPLY

Login before adding your answer.

Traffic: 1694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6