Tracking Reads/Contigs In Velvet
2
3
Entering edit mode
13.2 years ago
Abhi ★ 1.6k

Hi All

Just wondering if anyone has a script ready which can extract the exact read names that went into a contig construction by velvet. What we need is to find the names and number of reads that velvet used per contig. Using the read tracking option on velvet generates this afg file but I think someone has to still mine and extract the read info in a simple text file.

Just wanted to check before I write anything on my own.

Thanks! -Abhi

velvet read • 4.9k views
ADD COMMENT
2
Entering edit mode
13.2 years ago

You need to run velvet with read tracking enabled to produce afg files.

Then you can use AMOS to extract the reads used in each contig. A tutorial is available here: Basically, you create a bank with your assembly information and raw reads, and then extract the corresponding reads from the contig(s) you are interested in.

bank-transact -c -z -m your_assembly.afg -b your_assembly.bnk
toAmos -x your_raw_reads.xml -o your_raw_reads.afg
bank-transact -c -z -m your_raw_reads.afg -b your_raw_reads.bnk
extractContig your_assembly.bnk 115 contig115.bnk
dumpreads -e contig115.bnk > contig115.fasta

Options may require a little tweaking, but the framework is there.

edit: I actually think there is an error in the tutorial and you need to transact your assembly and your reads to the same bank, if they are not present in the assembly file already. I don't have AMOS installed though, so I can't test it right now.

ADD COMMENT
1
Entering edit mode
13.2 years ago
Swbarnes2 ★ 1.6k

Why not realign with an aligner like bwa, against your contigs, and then read it off the .sam file? It won't be exactly like what the aligner did, but it shouldn't be too different.

ADD COMMENT
0
Entering edit mode

Want to avoid any more computation if it can be done with velvet.

-Abhi

ADD REPLY
0
Entering edit mode

Mapping reads requires far less compute resource than de novo assembly. This is especially true when you ask velvet to track the read placement.

ADD REPLY
0
Entering edit mode

@lh3 : Heng in our case we megablast the contigs againt nt and see which contigs are interesting and which could be possible contamination depending on what we know about the sample/biology. Now since we have to do the assembly/megablast anyhow if the #reads/contig could be produced natively during the assembly then the extra mapping step(agree: less computation) could be avoided.

ADD REPLY

Login before adding your answer.

Traffic: 2318 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6