Question

Align fastq SOLiD data

0

Entering edit mode

3.4 years ago

Vic ▴ 110

Hello everyone,

I have downloaded some data from the short read archive using the sratoolkit. The data is SOLiD data. I have seen people using the Lifescope (Life Technologies) to align the reads, as I presume it works for this type of data. But unfortunately, I can't get anyone to help with administrator permissions on our cluster, so I'm looking at alternative ways to perform this.

My data was aquired by using the fastq-dump looks like this:

fastq example

I have seen people mention the bfast solid2fastq or bwa solidtofastq.pl

My question is once I have converted the data using one of these can I proceed with using the bwa aligner and then filter for mapping quality and continue? Or should I align using bfast? What I need is a resulting bam file, as I would like to merge the lanes (Each sample is across 5 or so), edit the readgroups and sort the files so they can be variant called with another data set.

thanks for your time!

fastq aligner solid • 1.8k views

ADD COMMENT • link 3.4 years ago by Vic ▴ 110

score 3 · Answer 1 · 2021-09-15

3

Entering edit mode

3.4 years ago

colindaven 7.0k

SOLiD is way out of date now.

Are you interested in quantitative counting data (RNA-seq, metagenomics etc) ? --> Don't use it - PCR errors

Are you interested in WGS, WES (SNVs) ? --> Don't use it - very different to Illumina, coverage bias/dropouts

Alignment is tricky, bowtie1 does NOT do gapped alignment.

There are a couple of other alignment programs for SOLid listed here : https://en.wikipedia.org/wiki/List_of_sequence_alignment_software

We found the commercial novoalign-cs was by far the best, but slow.

Definitely align color-space data to a genome, not FASTQ data, else you'll get to the magical color-space frameshifts which mess everything up.

Avoid if you can !

ADD COMMENT • link 3.4 years ago by colindaven 7.0k

0

Entering edit mode

Unfortunately, I work on a non-model species and I need to incorporate this data in my study. It has been used in a recent study (2018) but was generated in (2014), your suggestion is just to not use it?

My plan is to align this data to a reference genome that I am converting to a colorspace reference using the command ./bowtie-build -C

ADD REPLY • link 3.4 years ago by Vic ▴ 110

0

Entering edit mode

Is it WGS? WES ? RNA-seq ? If you can't avoid, then treat with extreme caution. Sure, you're on the right track, but as I pointed out bowtie1 is not a great aligner, since it cannot align reads with indels. Have fun ....

ADD REPLY • link 3.4 years ago by colindaven 7.0k

0

Entering edit mode

Apologise, the data is WGS.

ADD REPLY • link 3.4 years ago by Vic ▴ 110

score 2 · Answer 2 · 2021-09-15

2

Entering edit mode

3.4 years ago

ATpoint 86k

This is so-called colorspace data. Bowtie can align colorspace data until version 1.3.0 when they dropped support for it. Hence I would get version 1.2.3 and align with it. Easiest is probably to get with conda:

conda install -c bioconda bowtie=1.2.3

You will first have to build a colorspace index from the reference genome (or use one provided on the bowtie website). Then align with bowtie which can optionally output a SAM file (read its manual) which you can later convert to BAM with samtools.

ADD COMMENT • link 3.4 years ago by ATpoint 86k

0

Entering edit mode

I've installed via conda as you have said.

I need to build a colorspace reference, specifying -C for a colorspace reference:

./bowtie-build -C myreference.fa new_ref

align using bowtie, and specify a sam output:

./bowtie -S new_ref my_data.fastq my_data.sam

then off to samtools, which I'm used to using. Thanks so much ATpoint, you are very helpful!

ADD REPLY • link 3.4 years ago by Vic ▴ 110