Ucsc/Ensembl Bam Converter?
4
6
Entering edit mode
13.5 years ago

Is there any tool to convert BAM files that have been produced against Ensembl genome references to UCSC-compatible BAM files? And viceversa? AFAICS, Ensembl sequence names don't have the 'chr' in front of the sequence name, but UCSC does, so when I try to attach my BAM file to UCSC, I get this error:

Error Unrecognized format line 1 of ftp://somewhere.com/pub/myfile.Mus_musculus.NCBIM37.61.bam: ‹ (note: chrom names are case sensitive)
bam ucsc ensembl reference • 9.1k views
ADD COMMENT
13
Entering edit mode
13.5 years ago

In order to do it right, you need to take care of several things. If you have only chromosomal reads then it is easy:

You need first to reformat the header

samtools view -H file.bam | perl -lpe 's/SN:([0-9]+|[XY]|MT)\b/SN:chr$1/' > file.sam

Then you need to fix the alignments to point to the same ref name that the header. You need to change columns 3 and 7 (paired ends), but remember not to change the ones with '*' or '='

samtools view file.bam | \
  perl -lane '$F[2]=q{chr}.$F[2] unless ($F[2] eq q{*}); \
              if($F[6] ne q{*} && $F[6] ne q{=}){ \
                  $F[6]=q{chr}.$F[6] \
              }; \
              print join("\t", @F)' \
  >> file.sam

For very large files probably you would like to go for sed or perl with regular expressions for doing it faster.

If you have non chromosomal reads then you need more low level QC in order to know that you are doing the things right:

Finally for paranoiacs:

  • Do the header SQ have the column M5? if it does I will double check that your reference seq md5 corresponds to the one you want to use in UCSC (just in case)

[edited]

  • and look at the Bio_X2Y answer for problems with Mitocondrial genome.

I would assume that you are doing this for visualising your reads in UCSC browser or ensembl browser, so you are only looking to chromosomal mappings. Isn't it?. So the first thing I would do is to extract all the chromosomal reads into a file and patch them. That way you avoid all the complex logic that would force you to realign your data with the other reference.

ADD COMMENT
0
Entering edit mode

I've tried manually adding the '@HD' line at the top, but UCSC doesn't seem to like it. Any ideas? perl -e 'print "@HDtVN:1.0tGO:nonetSO:unknownn"' > $file.ucsc.sam

ADD REPLY
0
Entering edit mode

seems that UCSC is not recognising the bam created, have you tried to create a sam with no header and then use the ucsc indexed reference to create the bam like samtools -bt ucsc_ref.fai your_sam > your_bam?

ADD REPLY
6
Entering edit mode
13.5 years ago
Bio_X2Y ★ 4.4k

This feels like a risky thing to do. As Pablo points out, there are subtle differences between the two references (e.g. names of the unplaced and unlocalised contigs, masked PAR regions, etc.). Depending on your goals, these may or may not be important.

Perhaps the trickiest difference is the mitochondrial sequences - Ensembl's "MT" sequence is based on the revised Cambridge Sequence (genbank NC_012920), while UCSC's "chrM" uses a different sequence (genbank NC_001807). AFAIK, there is no trivial way to convert alignments between the two of these.

If this concerns you, I suggest you create your new BAM by realigning your reads against the UCSC reference rather than retrofitting the Ensembl BAM.

ADD COMMENT
4
Entering edit mode
7.6 years ago
balwierz ▴ 40

Note: I am writing this answer 6 years later. Now it is easy with 'samtools reheader'

First generate the header:

samtools view -H file.bam | perl -lpe 's/SN:([0-9]+|[XY]|MT)\b/SN:chr$1/' >header.ucsc.sam

And edit header.ucsc.sam to correct manually chrMT to chrM (sorry for no perl oneliner that does that!)

Then use reheader function

samtools reheader header.ucsc.sam file.bam > file.ucsc.bam
ADD COMMENT
1
Entering edit mode

it would be great if there was a toolkit for ID conversions like this

ADD REPLY
2
Entering edit mode
13.5 years ago

Transform your bam to sam and use 'sed'.

samtools view mouse.bam | sed 's/\t/\tchr/2' > mouse.sam
samtools view -b -t mouse.fa -S -o mouse2.bam mouse.sam

I don't have any bam in front of me at the time of this writing. you might also have to change the @Header and to check what happens for the unmapped reads.

ADD COMMENT

Login before adding your answer.

Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6