I have downloaded some data from the short read archive using the sratoolkit. The data is SOLiD data. I have seen people using the Lifescope (Life Technologies) to align the reads, as I presume it works for this type of data. But unfortunately, I can't get anyone to help with administrator permissions on our cluster, so I'm looking at alternative ways to perform this.
My data was aquired by using the fastq-dump looks like this:
My question is once I have converted the data using one of these can I proceed with using the bwa aligner and then filter for mapping quality and continue? Or should I align using bfast? What I need is a resulting bam file, as I would like to merge the lanes (Each sample is across 5 or so), edit the readgroups and sort the files so they can be variant called with another data set.
Unfortunately, I work on a non-model species and I need to incorporate this data in my study. It has been used in a recent study (2018) but was generated in (2014), your suggestion is just to not use it?
My plan is to align this data to a reference genome that I am converting to a colorspace reference using the command ./bowtie-build -C
Is it WGS? WES ? RNA-seq ?
If you can't avoid, then treat with extreme caution. Sure, you're on the right track, but as I pointed out bowtie1 is not a great aligner, since it cannot align reads with indels.
Have fun ....
This is so-called colorspace data. Bowtie can align colorspace data until version 1.3.0 when they dropped support for it. Hence I would get version 1.2.3 and align with it. Easiest is probably to get with conda:
conda install -c bioconda bowtie=1.2.3
You will first have to build a colorspace index from the reference genome (or use one provided on the bowtie website). Then align with bowtie which can optionally output a SAM file (read its manual) which you can later convert to BAM with samtools.
Unfortunately, I work on a non-model species and I need to incorporate this data in my study. It has been used in a recent study (2018) but was generated in (2014), your suggestion is just to not use it?
My plan is to align this data to a reference genome that I am converting to a colorspace reference using the command
./bowtie-build -C
Is it WGS? WES ? RNA-seq ? If you can't avoid, then treat with extreme caution. Sure, you're on the right track, but as I pointed out bowtie1 is not a great aligner, since it cannot align reads with indels. Have fun ....
Apologise, the data is WGS.