Is there any tool to convert BAM files that have been produced against Ensembl genome references to UCSC-compatible BAM files? And viceversa?
AFAICS, Ensembl sequence names don't have the 'chr' in front of the sequence name, but UCSC does, so when I try to attach my BAM file to UCSC, I get this error:
Then you need to fix the alignments to point to the same ref name that the header. You need to change columns 3 and 7 (paired ends), but remember not to change the ones with '*' or '='
Do the header SQ have the column M5? if it does I will double check that your reference seq md5 corresponds to the one you want to use in UCSC (just in case)
[edited]
and look at the Bio_X2Y answer for problems with Mitocondrial genome.
I would assume that you are doing this for visualising your reads in UCSC browser or ensembl browser, so you are only looking to chromosomal mappings. Isn't it?. So the first thing I would do is to extract all the chromosomal reads into a file and patch them. That way you avoid all the complex logic that would force you to realign your data with the other reference.
I've tried manually adding the '@HD' line at the top, but UCSC doesn't seem to like it. Any ideas? perl -e 'print "@HDtVN:1.0tGO:nonetSO:unknownn"' > $file.ucsc.sam
seems that UCSC is not recognising the bam created, have you tried to create a sam with no header and then use the ucsc indexed reference to create the bam like samtools -bt ucsc_ref.fai your_sam > your_bam?
This feels like a risky thing to do. As Pablo points out, there are subtle differences between the two references (e.g. names of the unplaced and unlocalised contigs, masked PAR regions, etc.). Depending on your goals, these may or may not be important.
Perhaps the trickiest difference is the mitochondrial sequences - Ensembl's "MT" sequence is based on the revised Cambridge Sequence (genbank NC_012920), while UCSC's "chrM" uses a different sequence (genbank NC_001807). AFAIK, there is no trivial way to convert alignments between the two of these.
If this concerns you, I suggest you create your new BAM by realigning your reads against the UCSC reference rather than retrofitting the Ensembl BAM.
ADD COMMENT
• link
updated 23 months ago by
Ram
44k
•
written 13.4 years ago by
Bio_X2Y
★
4.4k
I don't have any bam in front of me at the time of this writing. you might also have to change the @Header and to check what happens for the unmapped reads.
I've tried manually adding the '@HD' line at the top, but UCSC doesn't seem to like it. Any ideas? perl -e 'print "@HDtVN:1.0tGO:nonetSO:unknownn"' > $file.ucsc.sam
seems that UCSC is not recognising the bam created, have you tried to create a sam with no header and then use the ucsc indexed reference to create the bam like
samtools -bt ucsc_ref.fai your_sam > your_bam
?