I realize that almost no one uses SOLiD sequencing anymore, but unfortunately this paper does and for my project (chicken related) it would be useful to include as many relevant datasets as possible in my analysis. Since a new chicken reference genome assembly came out in January, I need to take those SOLiD reads and align to the new reference genome.
Nowhere in the paper that I linked does it say whether their reads are paired-ends or single-ends, only that it is 35 bp long. Does anyone know how I can tell whether the reads are paired end or single end? I just assume that it is paired end because there are two runs per sample on SRA.
Now there is very little documentation online available for snp calling from SOLiD reads. From reading forums I think BFAST should be used for aligning and then I can convert the alignment to bam format and use GATK to carry on my analysis. However, I haven't found any documentation on which trimmers to use or if trimming is needed at all since SOLiD reads are so short. Additionally, the BFAST website also does not provide any instructions nor the creator's contact information and the manual that I found online is outdated and doesn't provide all the information.
So here are my questions:
1) Does anyone know when BFAST is advantageous to BFAST-BWA and vice versa? In what situations would you use one over the other?
2) There are three options for alignment (match, easyalign, and localalign), how do I determine which one to use
I know this is quite a lot for one person to answer, if anyone could just point me to an online resource or tutorial I would really appreciate it. Thank you!
Thanks for your help! For your dataset, could you tell which reads were forward and which ones were reverse by looking at the headers of each read? Since the paper that I linked didn't give any clear information, I am now trying to figure out what kind of reads I have by looking at the header directly.
No worries, but I haven't worked with SOLiD for four years now so can't really help. What I would do is - if the read lengths are the same - map a sample of each read set to a decent reference genome and check the orientations manually.
Try mapping as a) paired end b) each of the pair separately and you should gain a lot of info just by checking in a genome browser. Also, the paired end mapping rate should be very low if you falsely specify single end reads to be part of a pair.