Hi all,
I have short reads from SOLID5500XL sequencing platform. The reads are in '.xsq' format. I have used XSQ tools from life technologies http://www.lifetechnologies.com/fi/en/home/technical-resources/software-downloads/xsq-software.html to convert .xsq to .csfasta and .qual files as shown below;
xsqconvert -c FRAG_BC_01_Can19.xsq
which results in
FRAG_BC_01_Can19_F3.csfasta and FRAG_BC_01_Can19_F3.qual files.
Then i have used 'qualfa2fq.pl' script from bwa to convert to fastq format as shown below:
qualfa2fq.pl FRAG_BC_06_Can19_F3.csfasta FRAG_BC_06_Can19_F3.QV.qual
The fastq file still has the base pairs in color space format. My goal is to detect SNP's by aligning to reference genome. For this purpose i would need the data in base space format. Could someone help to do this?
Any help is highly valuable.
thank you. Prior to alignment, Quality control to filter lowquality bases appears to be an ideal step for SNP detection. Would you suggest some thresholds and tools to do this? also could you suggest some tools to convert aligned reads from color space to base space?
I wrote a tool to convert mapped colorspace reads to base-space, but I'm not sure if it works anymore. I'll look into it. Bioscope should be able to do it, of course, but it's really crappy.
I DO NOT recommend quality-trimming Solid reads because unlike Illumina reads, the quality profile varies by the position modulo 5 rather than the raw position. Thus, low-quality bases are scattered throughout the read and trimming the ends is not effective.
You can use https://github.com/brentp/bio-playground/blob/master/solidstuff/solid-trimmer.py tool from brentp. I have used a lot for my research.
You can also use SHRiMP2 a color space read aligner. I have used it extensively for aligning csfastq or csfasta/qual files. Once you have the aligned SAM/BAM file you can use any variant callers that take bam files.
thank you. Is it required to filter low quality bases before aligning using SHRiMP2? If so, would you suggest a threshold value? I normally use Q20 for Illumina. Is it the same for SOLID reads. Finally, the bam from SHRiMP2 is in color space or base space?
You can use Q15 or Q20 as a threshold. The bam file will contain both basespace and colorspace sequences. The base space sequence will represented in the 10th column (SEQ field) and colorspace sequence will be a part of the TAGs (last) column. This bam file can now be used with almost all of the tools that work with Illumina bam files.
hello again. I used SHRiMP2 to align and came across the error: "my_realloc error: realloc failed" . Could not find an effective solution elsewhere. Could you help if you have faced this error?
Hi, Below is what I have done:
SHRiMP_2_2_3/bin/gmapper-cs Can19.fastq canFam3.fa -N 24 -Q --qv-offset 33 > Can19.sam
The reference sequence in the above command is in letter-space format and the reads are in color space format.should the reference also be given in the cs format or does shrimp handles the letter space format to align to cs reads?
With the above command, I met with an error
my_realloc error: realloc failed
. Did anyone came across this error?I am getting the same error. What is weird is that I have used the same command to run 10 samples in parallel. 2 of which seem to have done just fine, the other 8 have given me the "realloc error". It seems to occur during the genome loading step. Maybe some memory error?
Have you managed to solve this issue?