Hello everyone,
I inherited some colorspace data of various samples that I am trying to make use of. I don't have access to lifescope/bioscope, and finding open-source tools that handle colorspace data well seems like a challenge. What I have are basically *.bam files of mapped/unmapped reads as well as *.csfasta and *.quals files. What I am trying to do is perform variant analysis to see if I can find SNPs/indels in my data, both across samples as well as in comparison to the reference genome.
I am thinking about using the GATK pipeline, but I wanted to ask if there is anything 'better' for doing what I'd like to do. The bam files were generated using the hg18 reference genome, and common aligners like bwa don't seem to support colorspace anymore. From what I understand, due to differences in how errors are handled, converting the csfasta/quals files to fastq files isn't recommended, although it seems there are some people who analyze colorspace data this way.
Can anyone here recommend a pipeline for me to basically take my RNA-seq data and either 1) re-align using a newer reference genome or 2) use the existing *.bam files to perform variant analysis to find sequence differences?
Thanks, and all the best.
Keep in mind that most aligners being discussed here are not splice-aware. Using a non-splice-aware aligner is likely to make your job of finding variants in RNA-seq even harder (and it is a hard problem to begin with).