Getting Number Of Reads In A Contig From Velvet Assembly
4
4
Entering edit mode
13.7 years ago
Abhi ▴ 50

How to get the contig details from the Velvet assembly? I have the AMOS output from the velvet i.e. velvet_asm.afg file.

Thanks -Abhi

contigs velvet • 6.5k views
ADD COMMENT
2
Entering edit mode
13.7 years ago

It's basically these 4 steps:

  • install AMOS if you haven't already
  • create a bank from your afg file
  • select a contig and create a new bank from it
  • extract the reads from the new bank

You can find the details on this blog post if you want to get it done quickly but AMOS also has a nice wiki.

Alternatively, I think that hawkeye does tell you the number of reads in a contig as well.

ADD COMMENT
0
Entering edit mode

Thanks for your quick response. I do have AMOS installed and I can see the bnk in hawkeye and see the number of reads in each contig. But wanted to get that information programmatically. I followed the steps you had mentioned in your blog, but hit the same problem i.e. no {FRG in my .afg file.

I am trying to convert to ACE format and get the details, but 'amos2ace' takes too long. Wondering if there are other quick options.

ADD REPLY
0
Entering edit mode

Can this be done with ABySS as well?

ADD REPLY
0
Entering edit mode

As long as ABySS is able to create a mapping file that AMOS can read (afg, ACE, etc.) it should.

ADD REPLY
1
Entering edit mode
13.7 years ago

from http://www.cbcb.umd.edu/research/contig_representation.shtml

{CTG
iid:1
eid:1
seq:
CCTCTCCTGTAGAGTTCAACCGA-GCCGGTAGAGTTTTATCA
.
qlt:
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
.
{TLE
src:1027
off:0
clr:618,0
gap:
250 612
.
}
}

we can parse this in perl easily enough without getting into all that bank stuff

open( ASM, "<velvet_asm.afg" );
while (<ASM>) {
    if (/\{RED/) { $reads++ }
    elsif (/{SCF/) {
      #i need to ignore TLEs in scaffolds b/c they are contigs
      $inScf=1;
    }
    elsif (/\{CTG/) { $inScf=0;$getCtgID = 1; }
    elsif (/iid:(\d+)/){if($getCtgID){$ctgID=$1;$getCtgID=0; }}
    elsif (/\{TLE/) { if(!$inScf){$inTile = 1; $tiles++;}}
    elsif (/src/) {
        if ($inTile) { 
          $src++;
          if($readSrc{$_}){
            warn "seen $_ before"
          }
          $readSrc{$_} = $ctgID;
          $inTile = 0; 
        }
    }
}
foreach my $read(keys %readSrc){
  print "read $read is in contig ".$readSrc{$read}."\n";
}
print "looks like I used ".scalar(keys %readSrc)." out of $reads total reads\n";

it should be pretty obvious how you can flip that hash to something more contig-centric

ADD COMMENT
0
Entering edit mode

Thanks Jeremy.

It gives a different result than AMOS bank. For my example: your code counted 245572 reads in contigs, whereas hawkeye shows 252454 reads in contigs.

ADD REPLY
0
Entering edit mode

hmmm well a read can have kmers in two or more contigs. i wonder if hawkeye is double counting those? well if you find the culprit lemme know.

ADD REPLY
0
Entering edit mode

what do you see when you type: bank2contig -e -S mybankname.bnk | cut -f3 | sort | uniq | wc -l

ADD REPLY
0
Entering edit mode

It shows 245568

ADD REPLY
1
Entering edit mode
13.7 years ago
Abhi ▴ 50

Used PERL AmosLib API to parse velvet_asm.afg file to get the count of reads in each contig.

use AMOS::AmosLib

my $fileName = "velvet_asm.afg";

open(fileHandle, $fileName);

my $totalReads = 0;

my $totalContigs = 0;

while (my $record = getRecord(\*fileHandle)){

        my ($rec, $fields, $recs) = parseRecord($record);

        if($rec eq "CTG"){

                $totalContigs++;

                my $nReads = 0;

                for(my $r = 0; $r <= $#$recs; $r++){

                        my ($srec, $sfields, $srecs) = parseRecord($$recs[$r]);

                        if($srec eq "TLE"){

                                $nReads++;

                                $totalReads++;

                        }

                }

                print "Number of Reads in contig#", $$fields{iid}, " are ", $nReads, "\n";

        }

}

print "Total # of Contigs :", $totalContigs, "\n";

print "Total # of reads in contigs:", $totalReads, "\n";

close(fileHandle);
ADD COMMENT
0
Entering edit mode

looks good. it would be interesting if you counted distinct read srcs since i honestly don't know how reads get credited when kmers are shared between different contigs

ADD REPLY
0
Entering edit mode
11.5 years ago
flacchy ▴ 40

Hi,

I am trying to figure out how many reads made it into each contig after I run velvet. I used the flag -read_trkg yes and -amos_file yes. I found some advise online on how to process them but I am a bit stuck. So from the amos file I have tried this path:

$ /home/admin/amos-3.1.0/src/Bank/bank-transact -m velvet_asm.afg -b velvet_asm.bnk -c

and then $/home/admin/amos-3.1.0/src/Validation/analyze-read-depth velvet_asm.bnk -i -d -r > output

The problem is that the file output is something like this : 1 1646 1340 1340 108.275 108.275 2 1 97 97 1.03093 1.03093 3 2085 1589 1589 113.904 113.904 4 175 75 75 219.547 219.547 5 187 162 162 105.944 105.944 6 124 76 76 157.171 157.171 7 1023 768 768 112.914 112.914 8 9 99 99 9.09091 9.09091 9 151 168 168 72.8214 72.8214 10 2224 1602 1602 123.306 123.306 11 6582 5312 5312 107.116 107.116 12 2850 2487 2487 98.653 98.653 13 3 105 105 2.62857 2.62857 14 3 103 103 2.91262 2.91262 15 4350 3350 3350 115.28 115.28 16 1788 1268 1268 128.341 128.341 17 1746 1325 1325 116.024 116.024

could anyone tell me what the filed are or if there is a different way to get the information I need??

Thank you,

F.

ADD COMMENT

Login before adding your answer.

Traffic: 2075 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6