Alternatives to LifeScope for realigning SOLiD BAM files
4
1
Entering edit mode
9.7 years ago
roysomak4 ▴ 40

I have BAM files from whole exome sequencing on a SOLiD 5500 platform. The xseq files were aligned using LifeScope using the ucsc hg19 reference. I am having trouble using the GATK tools due to contig mismatch and one of the recommendations is to realign to the b37 human genome reference. I am unable to use LifeScope due to several issues (including incompatible OS -my machine is on a Ubuntu 12.04 LTS). I also do not have access to the xseq files either.

I am relatively new to handling data from SOLiD platform and wanted to see if someone can share their experience with alternative methods to realign BAM files generated from LifeScope.

Thank you!!

roy

BAM realignment SOLiD • 4.0k views
ADD COMMENT
0
Entering edit mode

I had success using bfast to align solid data. So that's what I would recommend for you :)

ADD REPLY
0
Entering edit mode

Hi Raony,

I do not have much experience with bfast, but will give it a try. The PDF man pages appear a bit dated (2011). If there is a more updated manual, do you mind sharing the link?

Another question.. Can bfast be used to realign BAM files generated by Lifescope or other SOLiD aligners and realign them to a different human reference genome?

Thank you!

ADD REPLY
0
Entering edit mode

I thought you figured out the header problem. If your are looking forward to realign the solid reads you better give a try to SHRiMP2. Unfortunately they are no longer developing or supporting it. But I have used it extensively to align solid reads and it has worked pretty good for me.

ADD REPLY
0
Entering edit mode

Yes the manual is a bit old, but I don't know about any new version.

Try with this branch: http://sourceforge.net/projects/bfast/files/bfast%2Bbwa/0.7.0/ or this

Once you extracted your reads from BAM to FASTQ format you can actually try different aligners and any genome reference you want. :)

ADD REPLY
2
Entering edit mode
9.7 years ago
roysomak4 ▴ 40

Thank you all for your helpful suggestions.

I revisited the thread referred to by Ashutosh and was able to assemble my version of the hg19 that GATK did not complain about!!!

Here is what I did.

1. I extracted the list of contigs in the same order from hg19.fai that I downloaded from GATK resource bundle (hg19 liftover).

2. I tweaked the list to move the ChrM after ChrY in the list of contigs. This is exactly how my BAM files were aligned to the reference genome used by lifescope.

3. Downloaded the chromFa.tar.gz from uscs for hg19 and uncompressed the individual contigs.

4. Ran the following shell script to read from my contig list and assemble the individual chromosome fasta files from step 3.

#!/bin/bash
#create empty file
touch hg19.fa

#iterate through the list of contigs and append them to hg19.fa
FILE=$1
while read line
do
  echo "Appending $line.fa..."
  filename=$line.fa
  cat $filename >> hg19.fa
done < gatk_ucsc_contig_order.txt

5. Indexed this new genome using bwa, created the dictionary and fa index.

6. GATK was happy!!!! :-)

The downside to this although is I am unable to use the known hg19 indel/snp vcfs from GATK. Working on that.

Cheers!

UPDATE (4/29/15) I was able to use VCFsorter to rearrange the contig ordering in the vcf files from gatk resource bundle to match the hg19 reference (above) and subsequently index the vcf files using IGVtools index command. This enabled me to use the VCFs as known sites for indel realignment and BQSR in GATK.

Having said all this, the best option, as biocyberman mentioned, is to use the b37 reference from GATK resource bundle for the primary alignment process. Most downstream steps are taken care of.

Thank you all once again for helping me out.

ADD COMMENT
0
Entering edit mode

Glad to hear you've overcome the trouble.

ADD REPLY
0
Entering edit mode

Thanks for sharing... I realized that bfast take too much time to create index file. Then I followed this method.

ADD REPLY
1
Entering edit mode
9.7 years ago
biocyberman ▴ 870

Hello @roysomak4,

Congratulation on your new job (maybe) but prepare your patience to work with SOLiD data. I have been in your situation. The most reliable and least headache way in this case: you should create a new reference for Lifescope with b37 fasta sequence provided in the GATK's resource bundle. Contact Lifetech for instruction about this.

When you have bam files generated with correct reference in Lifescope, GATK will be able to run with some minor adjustment here and there.

Alternatively, novocraft.com has a novalignCS aligner which work with BAM, CSFASTQ and XSQ files.

The less preferred way (because I haven't tried much or looked into the code) is to use Crossmap to "lift" your BAM file from hg19 to b37.

Hope this helps.

ADD COMMENT
0
Entering edit mode

Hi @biocyberman

I also have problem with SOLiD data. I would like to try to realign b37 with my SOLiD5500XL BAM file from lifescope (My xsq file was align with hg19 library from lifetech)..should I start to convert BAM to fastq before apply into BWA? and then realign b37 fasta with BWA.

thanks

ADD REPLY
0
Entering edit mode

@nutechu

It is simpler if you could obtain XSQ file or CSFASTQ file. But if it is not possible, many aligners allows BAM files as inputs. Again, if using BAM files is not an option, you can try on one sample and see how BAM to FASTQ conversion. That is to find out whether the the conversion give color space sequences or converted base space reads. I would prefer the original color space read for alignment to minimize unknown problems with the conversion. Alternatively, and to repeat what I answered others: You can try crossmap to lift your BAM from hg19 to b37.

Hope that helps

ADD REPLY
0
Entering edit mode
9.7 years ago
Zaag ▴ 870

The easiest way is to get your hands on the ucsc hg19 reference and the files you need from the gatk resource bundle, see here for more information:

http://gatkforums.broadinstitute.org/discussion/1213/whats-in-the-resource-bundle-and-how-can-i-get-it

ADD COMMENT
0
Entering edit mode
9.7 years ago
roysomak4 ▴ 40

HI Zaag,

I tried that option but GATK still gives the contig mismatch error. See below

sudo java -Xmx20g -jar ~/biotools/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ~/biotools/human_genome/gatk_hg19/ucsc.hg19.fasta -I 14_0064.combo.bam -o 14_0064.combo.bam.intervals

MESSAGE: Input files reads and reference have incompatible contigs: Relative ordering of overlapping contigs differs, which is unsafe.

##### ERROR   reads contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]
##### ERROR   reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]

Thanks!

ADD COMMENT
1
Entering edit mode

You need to sort your reference files so that the order of chromosomes is same as in the bam file OR create a new reference fasta file. See this forum for some help Karyotypically Ordered Hg19

ADD REPLY
0
Entering edit mode

I could sent you a fai and dict file to use with gatk that work with the Lifescope ref, but biocyberman solution is probably better.

ADD REPLY

Login before adding your answer.

Traffic: 2088 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6