Align fastq SOLiD data
2
0
Entering edit mode
3.2 years ago
Vic ▴ 100

Hello everyone,

I have downloaded some data from the short read archive using the sratoolkit. The data is SOLiD data. I have seen people using the Lifescope (Life Technologies) to align the reads, as I presume it works for this type of data. But unfortunately, I can't get anyone to help with administrator permissions on our cluster, so I'm looking at alternative ways to perform this.

My data was aquired by using the fastq-dump looks like this:

fastq example

I have seen people mention the bfast solid2fastq or bwa solidtofastq.pl

My question is once I have converted the data using one of these can I proceed with using the bwa aligner and then filter for mapping quality and continue? Or should I align using bfast? What I need is a resulting bam file, as I would like to merge the lanes (Each sample is across 5 or so), edit the readgroups and sort the files so they can be variant called with another data set.

thanks for your time!

fastq aligner solid • 1.7k views
ADD COMMENT
3
Entering edit mode
3.2 years ago

SOLiD is way out of date now.

  • Are you interested in quantitative counting data (RNA-seq, metagenomics etc) ? --> Don't use it - PCR errors

    Are you interested in WGS, WES (SNVs) ? --> Don't use it - very different to Illumina, coverage bias/dropouts

Alignment is tricky, bowtie1 does NOT do gapped alignment.

There are a couple of other alignment programs for SOLid listed here : https://en.wikipedia.org/wiki/List_of_sequence_alignment_software

We found the commercial novoalign-cs was by far the best, but slow.

Definitely align color-space data to a genome, not FASTQ data, else you'll get to the magical color-space frameshifts which mess everything up.

Avoid if you can !

ADD COMMENT
0
Entering edit mode

Unfortunately, I work on a non-model species and I need to incorporate this data in my study. It has been used in a recent study (2018) but was generated in (2014), your suggestion is just to not use it?

My plan is to align this data to a reference genome that I am converting to a colorspace reference using the command ./bowtie-build -C

ADD REPLY
0
Entering edit mode

Is it WGS? WES ? RNA-seq ? If you can't avoid, then treat with extreme caution. Sure, you're on the right track, but as I pointed out bowtie1 is not a great aligner, since it cannot align reads with indels. Have fun ....

ADD REPLY
0
Entering edit mode

Apologise, the data is WGS.

ADD REPLY
2
Entering edit mode
3.2 years ago
ATpoint 85k

This is so-called colorspace data. Bowtie can align colorspace data until version 1.3.0 when they dropped support for it. Hence I would get version 1.2.3 and align with it. Easiest is probably to get with conda:

conda install -c bioconda bowtie=1.2.3

You will first have to build a colorspace index from the reference genome (or use one provided on the bowtie website). Then align with bowtie which can optionally output a SAM file (read its manual) which you can later convert to BAM with samtools.

ADD COMMENT
0
Entering edit mode

I've installed via conda as you have said.

I need to build a colorspace reference, specifying -C for a colorspace reference:

./bowtie-build -C myreference.fa new_ref

align using bowtie, and specify a sam output:

./bowtie -S new_ref my_data.fastq my_data.sam

then off to samtools, which I'm used to using. Thanks so much ATpoint, you are very helpful!

ADD REPLY

Login before adding your answer.

Traffic: 2689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6