Question

Interpreting abyss output

0

Entering edit mode

5.1 years ago

goatsrunfaster ▴ 60

I just completed my first genome assembly using abyss (made from 1 set of paired-end files, 1 single end file, and 7 mate pair sets). Abyss creates a large number of output files, and what I'd like to do now is blast my assembly to remove contigs that are potentially there due to contamination. Nevertheless, I am confused about what .fa file represents my final assembly (as there are 13 different .fa files (see below)). What do the numbered files represent (bowfin-1.fa, bowfin-2.fa, etc.)? Will I need to combine these for a final assembly (assuming I want to eventually map reads to the assembly from different individuals and get a .vcf file for pop gen analysis)? I'm just a little confused as to why there are so many output files...

bowfin-1.dot     bowfin-6.dot          coverage.hist
bowfin-1.fa      bowfin-6.fa           mpc-6.dist.dot
bowfin-1.path    bowfin-6.path         mpc-6.hist
bowfin-2.dot     bowfin-6.path.dot     mpd-6.dist.dot
bowfin-2.dot1    bowfin-7.dot          mpd-6.hist
bowfin-2.fa      bowfin-7.fa           mpe-6.dist.dot
bowfin-2.path    bowfin-7.path         mpe-6.hist
bowfin-3.dist    bowfin-8.dot          mpf-6.dist.dot
bowfin-3.dot     bowfin-8.fa           mpf-6.hist
bowfin-3.fa      bowfin-bubbles.fa     mpg-6.dist.dot
bowfin-3.fa.fai  bowfin-contigs.dot    mpg-6.hist
bowfin-4.dot     bowfin-contigs.fa     mph-6.dist.dot
bowfin-4.fa      bowfin-indel.fa       mph-6.hist
bowfin-4.fa.fai  bowfin-scaffolds.dot  mpi-6.dist.dot
bowfin-4.path1   bowfin-scaffolds.fa   mpi-6.hist
bowfin-4.path2   bowfin-stats          pea-3.dist
bowfin-4.path3   bowfin-stats.csv      pea-3.hist
bowfin-5.dot     bowfin-stats.md       peb-3.dist
bowfin-5.fa      bowfin-stats.tab      peb-3.hist
bowfin-5.path    bowfin-unitigs.fa     slurm.rhea-07.751256.out

Assembly genome • 2.3k views

ADD COMMENT • link updated 5.1 years ago by lieven.sterck 15k • written 5.1 years ago by goatsrunfaster ▴ 60

score 2 · Accepted Answer · 2020-03-09

2

Entering edit mode

5.1 years ago

lieven.sterck 15k

the one with the highest number appended to your run-name is the final output

bowfin-8.fa in your case (from what I can see) , in every step of abyss it takes the previous result and increases the number by one , so the highest numbered one is the most advanced one

Unless for specific purposes you can ignore all other non .fa ones

there are so many output files because they represent distinct and different analysis steps