Entering edit mode
4.5 years ago
Assa Yeroslaviz
★
1.9k
we would like to translate the genomic sequences in a bam file to its protein sequence (all six frames).
We would like to do it while keeping the structure of the bam file ( or at least a sam file), if possible.
I was wondering if anyone know about a tool to do such a task.
thanks
That is something really uncommon because you have millions of short reads (nucleotides) in the BAM file.
If you a looking for a particular protein, it is better to do some assembly and look for your protein in the contigs.
Yes, I know this is uncommon und I couldn't find any solution for that yet. But I still look for one, if possible. It must not be a huge bam file, it can be one where we have extracted reads based on a bed file for specific positions. It is mainly meant to be used in identifying SAP (single AA polymorphism), so we might split a possible bam file for only the positions we need.
Sequence present in each line/record?
Yes, We might not need it for the whole bam file (see above), but I would like to know if there is an option.
You can convert the reads into fasta format by using BBMap suite.
Then use
sixpack
ortranseq
from EMBOSS to get the translations in all frames.Thanks I'll try that. I was hoping for a tool which can keep the bam/sam structure, but I'll keep looking.