Question

Alternatives to LifeScope for realigning SOLiD BAM files

1

Entering edit mode

10.3 years ago

roysomak4 ▴ 40

I have BAM files from whole exome sequencing on a SOLiD 5500 platform. The xseq files were aligned using LifeScope using the ucsc hg19 reference. I am having trouble using the GATK tools due to contig mismatch and one of the recommendations is to realign to the b37 human genome reference. I am unable to use LifeScope due to several issues (including incompatible OS -my machine is on a Ubuntu 12.04 LTS). I also do not have access to the xseq files either.

I am relatively new to handling data from SOLiD platform and wanted to see if someone can share their experience with alternative methods to realign BAM files generated from LifeScope.

Thank you!!

roy

BAM realignment SOLiD • 4.7k views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 10.3 years ago by roysomak4 ▴ 40

0

Entering edit mode

I had success using bfast to align solid data. So that's what I would recommend for you :)

ADD REPLY • link 10.2 years ago by Raony Guimarães ★ 1.5k

0

Entering edit mode

Hi Raony,

I do not have much experience with bfast, but will give it a try. The PDF man pages appear a bit dated (2011). If there is a more updated manual, do you mind sharing the link?

Another question.. Can bfast be used to realign BAM files generated by Lifescope or other SOLiD aligners and realign them to a different human reference genome?

Thank you!

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by roysomak4 ▴ 40

0

Entering edit mode

I thought you figured out the header problem. If your are looking forward to realign the solid reads you better give a try to SHRiMP2. Unfortunately they are no longer developing or supporting it. But I have used it extensively to align solid reads and it has worked pretty good for me.

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Yes the manual is a bit old, but I don't know about any new version.

Try with this branch: http://sourceforge.net/projects/bfast/files/bfast%2Bbwa/0.7.0/ or this

Once you extracted your reads from BAM to FASTQ format you can actually try different aligners and any genome reference you want. :)

ADD REPLY • link 10.2 years ago by Raony Guimarães ★ 1.5k

Ram · Answer 1 · 2015-04-25

Thank you all for your helpful suggestions.

I revisited the thread referred to by Ashutosh and was able to assemble my version of the hg19 that GATK did not complain about!!!

Here is what I did.

1. I extracted the list of contigs in the same order from hg19.fai that I downloaded from GATK resource bundle (hg19 liftover).

2. I tweaked the list to move the ChrM after ChrY in the list of contigs. This is exactly how my BAM files were aligned to the reference genome used by lifescope.

3. Downloaded the chromFa.tar.gz from uscs for hg19 and uncompressed the individual contigs.

4. Ran the following shell script to read from my contig list and assemble the individual chromosome fasta files from step 3.

#!/bin/bash
#create empty file
touch hg19.fa

#iterate through the list of contigs and append them to hg19.fa
FILE=$1
while read line
do
  echo "Appending $line.fa..."
  filename=$line.fa
  cat $filename >> hg19.fa
done < gatk_ucsc_contig_order.txt

5. Indexed this new genome using bwa, created the dictionary and fa index.

6. GATK was happy!!!! :-)

The downside to this although is I am unable to use the known hg19 indel/snp vcfs from GATK. Working on that.

Cheers!

UPDATE (4/29/15) I was able to use VCFsorter to rearrange the contig ordering in the vcf files from gatk resource bundle to match the hg19 reference (above) and subsequently index the vcf files using IGVtools index command. This enabled me to use the VCFs as known sites for indel realignment and BQSR in GATK.

Having said all this, the best option, as biocyberman mentioned, is to use the b37 reference from GATK resource bundle for the primary alignment process. Most downstream steps are taken care of.

Thank you all once again for helping me out.

Ram · Answer 2 · 2015-04-23

1

Entering edit mode

10.3 years ago

biocyberman ▴ 870

Hello @roysomak4,

Congratulation on your new job (maybe) but prepare your patience to work with SOLiD data. I have been in your situation. The most reliable and least headache way in this case: you should create a new reference for Lifescope with b37 fasta sequence provided in the GATK's resource bundle. Contact Lifetech for instruction about this.

When you have bam files generated with correct reference in Lifescope, GATK will be able to run with some minor adjustment here and there.

Alternatively, novocraft.com has a novalignCS aligner which work with BAM, CSFASTQ and XSQ files.

The less preferred way (because I haven't tried much or looked into the code) is to use Crossmap to "lift" your BAM file from hg19 to b37.

Hope this helps.

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by biocyberman ▴ 870

0

Entering edit mode

Hi @biocyberman

I also have problem with SOLiD data. I would like to try to realign b37 with my SOLiD5500XL BAM file from lifescope (My xsq file was align with hg19 library from lifetech)..should I start to convert BAM to fastq before apply into BWA? and then realign b37 fasta with BWA.

thanks

ADD REPLY • link 10.1 years ago by nutechu • 0

0

Entering edit mode

@nutechu

It is simpler if you could obtain XSQ file or CSFASTQ file. But if it is not possible, many aligners allows BAM files as inputs. Again, if using BAM files is not an option, you can try on one sample and see how BAM to FASTQ conversion. That is to find out whether the the conversion give color space sequences or converted base space reads. I would prefer the original color space read for alignment to minimize unknown problems with the conversion. Alternatively, and to repeat what I answered others: You can try crossmap to lift your BAM from hg19 to b37.

Hope that helps

ADD REPLY • link 10.1 years ago by biocyberman ▴ 870

Ram · Answer 3 · 2015-04-22

0

Entering edit mode

10.3 years ago

Zaag ▴ 870

The easiest way is to get your hands on the ucsc hg19 reference and the files you need from the gatk resource bundle, see here for more information:

http://gatkforums.broadinstitute.org/discussion/1213/whats-in-the-resource-bundle-and-how-can-i-get-it

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Zaag ▴ 870

Ram · Answer 4 · 2015-04-22

HI Zaag,

I tried that option but GATK still gives the contig mismatch error. See below

sudo java -Xmx20g -jar ~/biotools/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ~/biotools/human_genome/gatk_hg19/ucsc.hg19.fasta -I 14_0064.combo.bam -o 14_0064.combo.bam.intervals

MESSAGE: Input files reads and reference have incompatible contigs: Relative ordering of overlapping contigs differs, which is unsafe.

##### ERROR   reads contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]
##### ERROR   reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]

Thanks!