While running abyss-pe, I would often get this message printed to stdout.
Building the suffix array...
Building the Burrows-Wheeler transform...
Building the character occurrence table...
Mateless 0
Unaligned 9840128 12.3%
Singleton 20579451 25.7%
FR 28709736 35.9%
RF 20287 0.0253%
FF 130290 0.163%
Different 20752726 25.9%
Total 80032618
Ambiguous paths: 86
Merged: 48
No paths: 0
Too many paths: 4
Too complex: 1
Dissimilar: 33
I thought that these represented the percentage of original reads mapped to the scaffold produced by abyss, but the total is greater than the number of original reads. Also, I would appreciate any information or literature regarding the meaning of FR, RF, FF and Singleton.
Thanks a lot!
I agree with most of this but I'm not sure on your explanation for 'singleton'. ABySS uses the aligned pairs to build a distribution of insert sizes so, to me it would not make much sense to already limit the mapping to an expected insert size. I therefore think that singleton is only pointing to the read pairs were only one reads actually aligns to a sequence.
Yes, you are right, Lieven. At this stage in the pipeline, ABySS is looking at read pairs that align to the same contig in order to an estimate of the fragment size distribution. So:
Thx for resolving this Ben. One related question: is it correct to assume then that it are mainly the reads from the 'different' category that will contribute to the contig and scaffold building stage?
Yes, exactly :-) The pairs that map to different unitigs/contigs are the ones that provide the linking information.