Hello biostars,
I know that this issue has been widely discussed along several threads. But I can't figure out how to do the conversion. I've downloaded some SRA datasets and I've converted to .csfasta and .qual using abi-dump
. So, I've these two files:
.csfasta file:
#
# Title: solid0065_201006
#
>SRR407311.1 2_21_490_F3
T3.23121101332.0133.2221.23.2.2103.330320302..32320
.qual file:
#
# Title: solid0065_201006
#
>SRR407311.1 2_21_490_F3
31 0 12 24 17 20 29 21 16 18 30 22 24 0 24 10 26 22 0 19 26 23 14 0 27 26 0 13 0 20 6 10 11 0 15 30 19 15 22 4 18 31 4 0 0 33 14 9 8 5
I want to convert them to basespace fastq format. I've tried to use solid2fastq.pl
script from BWA
, but I get an empty fastq file. And also I've tried solid2fastq
from bfast
aligner, but I get:
@SRR407311.1 2_21_490
T3.23121101332.0133.2221.23.2.2103.330320302..32320
+
@!-925>613?79!9+;7!4;8/!<;!.!5'+,!0?407%3@%!!B/*)&
I've already read the threads related about "Not doing this conversion", but I need to map the files with STAR, and STAR actually does not accept colorspace data.
Thanks in advance,
Why do you need to align this with STAR? The results will largely be complete crap (the
.
bases will completely destroy the conversion).So, should I use another RNA mapper to map these reads without doing the conversion? I "need" to map them with STAR, because I'm doing some tests with different RNASeq datasets in order to improve the annotation of a genome. And since for the other datasets (not solid), I've used STAR, I wanted to use STAR also for this dataset, to prevent bias related to the used mapper. Hope I'm explained myself a little bit.
That is not a good way to improve anything - using the exact same tool regardless of whether it is appropriate or not.
In general I have observed this tendency and I am not criticizing you in particular rather the field in general - many bioinformatician scientists confound the concept of repeatability and reliability of any given result with using the exact same tool with the same parameters. To me that is actually completely backwards. How much stock should anyone put into an annotation that is produced only when one uses one particular tool? I would much sooner assume that there is a flaw in the way the tool works (rather than it being super effective) if only that tool can produce a result.
I'm in complete agreement with Istvan on this one. Use a color-space aware aligner with this dataset to get the best results. The bias of using likely poor-quality results will dwarf any difference due to an aligner-effect.
I agree with you two, thanks a lot. I'll discuss these issues with my supervisor. Thank you all, again :)