Help Interpreting Abyss-pe Character Occurrence Table
1
1
Entering edit mode
8.5 years ago

While running abyss-pe, I would often get this message printed to stdout.

Building the suffix array... 
Building the Burrows-Wheeler transform... 
Building the character occurrence table... 
Mateless          0 
Unaligned   9840128  12.3%
Singleton  20579451  25.7% 
FR         28709736  35.9% 
RF            20287  0.0253% 
FF           130290  0.163% 
Different  20752726  25.9% 
Total      80032618 
Ambiguous paths: 86 
Merged:          48 
No paths:        0 
Too many paths:  4
Too complex:     1 
Dissimilar:      33

I thought that these represented the percentage of original reads mapped to the scaffold produced by abyss, but the total is greater than the number of original reads. Also, I would appreciate any information or literature regarding the meaning of FR, RF, FF and Singleton.

Thanks a lot!

abyss assembly • 1.8k views
ADD COMMENT
0
Entering edit mode
8.5 years ago
mastal511 ★ 2.1k

Singleton means that only one read of the pair aligned, or that the two reads of the pair didn't align as a pair, that is, within the expected distance of each other. F means forward, and R means reverse, RF, FR and FF refers to the relative orientation of the two reads of a pair in the alignment.

ADD COMMENT
1
Entering edit mode

I agree with most of this but I'm not sure on your explanation for 'singleton'. ABySS uses the aligned pairs to build a distribution of insert sizes so, to me it would not make much sense to already limit the mapping to an expected insert size. I therefore think that singleton is only pointing to the read pairs were only one reads actually aligns to a sequence.

ADD REPLY
1
Entering edit mode

Yes, you are right, Lieven. At this stage in the pipeline, ABySS is looking at read pairs that align to the same contig in order to an estimate of the fragment size distribution. So:

  • Unaligned = both reads in pair unmapped
  • Singleton = only one read in pair mapped to the assembly
  • FR = pairs that mapped to the same contig in the Forward-Reverse orientation
  • RF = pairs that mapped to the same contig in the Reverse-Forward orientation
  • FF = pairs that mapped to the same contig in the Forward-Forward orientation
  • Different = pairs where each read mapped to a different contig (orientation unknown)
ADD REPLY
0
Entering edit mode

Thx for resolving this Ben. One related question: is it correct to assume then that it are mainly the reads from the 'different' category that will contribute to the contig and scaffold building stage?

ADD REPLY
0
Entering edit mode

Yes, exactly :-) The pairs that map to different unitigs/contigs are the ones that provide the linking information.

ADD REPLY

Login before adding your answer.

Traffic: 2335 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6