Abyss Output File Formats
5
3
Entering edit mode
14.6 years ago
Rob Syme ▴ 540

There are a series of output files produced by ABYSS that describe how the contigs and scaffolds were constructed. The types of files seems to differ from version to version, so I thought it would be great to have an exhaustive list of the file format definitions.

Files that people I've seen include:

  • .adj
  • .dist
  • .path
  • .pair
  • overlap.fa

What data do these files contain, and what format do they take?

assembly abyss • 8.7k views
ADD COMMENT
0
Entering edit mode

Hi @Rob, please consider editing into a question format. As is it, your post is really a comment and it is hard to see how we can either contribute or help you here. Cheers

ADD REPLY
3
Entering edit mode
14.6 years ago
Rob Syme ▴ 540

I'll start, but it won't be a complete answer. These descriptions are based on an answer from Shaun on the ABySS mailing list in Oct09

dist

DistanceEst --help Estimate distances between contigs using paired-end alignments.

21 : | 420877,-30,47,5.5,0
39 : 125053,-30,17,9.2,1 | 10061,-30,25,7.6,1 132861,47,16,9.5,0 219341,45,15,9.8,0 294681,44,16,9.5,0 300788,43,17,9.2,0 419176,44,16,9.5,0 426650,46,15,9.8,0
110 : 44048,-30,30,6.9,0 |
170 : 397327,9,20,8.5,0 410536,23,43,5.8,1 50658,-5,11,11.4,0 | 161076,51,11,11.4,1 283551,-27,14,10.1,0 324935,-13,25,7.6,1 4522,81,10,12.0,0
171 : 298574,39,10,12.0,1 313291,-7,20,8.5,1 | 134144,22,27,7.3,1
173 : 119549,-15,59,4.9,0 129210,-30,26,7.4,1 | 181775,-11,37,6.2,1 200470,-30,13,10.5,1
  • id0 : id1,distance,number-of-pairs,error,strand | id2,...
  • id1 comes after id0
  • id2 comes before id0

I'm not sure what it means when there are multiple sets of space-separated groups (e.g. the 170: entry)

adj

AdjList --help Find all contigs that overlap by exactly k-1 bases.

0 [ 396267,1 ] [ 55802,0 ]
1 [ 35373,0 102691,0 ] [ 249534,0 ]
2 [ 277705,0 ] [ 350054,1 ]
3 [ 132932,0 ] [ 271245,0 182962,0 ]
  • id0 length [ id1,strand ... ] [ id2,strand ... ]
  • id1 comes after id0
  • id2 comes before id0

overlap.fa

Overlap --help Find overlaps between blunt contigs that have negative distance estimates. Output the small contigs that fill in the gaps.

>47 25 0 33+ 38+ -15
  • id0 length coverage (always 0) id1 id2 overlap
  • id1 comes before id0
  • id2 comes after id0
  • id1 and id2 overlap by the specified number of bases (negative distance indicates overlap)

path

SimpleGraph --help Find paths through contigs using distance estimates.

@ 19,1 -> 416835,0
@ 37,0 -> 123886,1
@ 38,0 -> 401731,1 4851,0
@ 109,0 -> 43659,0
@ 168,0 -> 50189,0 69838,0 202180,1 117665,1 393524,0 406599,1
  • id0,direction -> id1,strand ...
  • direction = 0: id1 comes after id0
  • direction = 1: id1 comes before id0
ADD COMMENT
0
Entering edit mode
13.3 years ago
Madalina • 0

how did obtain those files? because when I run ABySS, I obtain only the contig file?

ADD COMMENT
0
Entering edit mode

This question was based on an older version of ABySS. The newer versions have different outputs. I'll probably end up deleting this entry to save future confusion.

ADD REPLY
0
Entering edit mode
13.2 years ago
Kaushik • 0

The above explained files are obtained when we provide paired end files

ADD COMMENT
0
Entering edit mode
12.6 years ago

Here are my guesses after I tried to look at the source code:

.adj files overlapindex length coverage(total count of mapping base-pairs) ; contigs connected to head [-d overlap size] ; contigs connected to tail

I do not know what is the convention for head and tail or +/- end.

ADD COMMENT
0
Entering edit mode
3.5 years ago

Output files of ABySS

1.${name}-contigs.fa The final contigs in FASTA format

2.${name}-bubbles.fa The equal-length variant sequences (FASTA)

  1. ${name}-indel.fa The different-length variant sequences (FASTA)

4.${name}-contigs.dot The contig overlap graph in Graphviz format

Intermediate output files of ABySS

  1. .adj: contig overlap graph in ABySS adj format

6 .dist: estimates of the distance between contigs in ABySS dist format

7 .path: lists of contigs to be merged

8 .hist: fragment-size histogram of a library

9 coverage.hist: k-mer coverage histogram

ADD COMMENT

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6