Question

How Can I Align My 454 Reads To A Reference Sequence And Generate A Recruitment Plot?

1

Entering edit mode

13.4 years ago

User 6228 ▴ 110

I'd like to align my trimmed 454 metagenomic reads to a reference sequence and generate plots of the coverage / percent identity / recruitment.

Both the reads file and the reference file are in fasta format. I don't have access to the original .sff files from the 454 sequencer.

I think consed can align reads like this but I cannot get it to work to align 2 fasta files and the documentation is almost no help. If anyone knows how to get consed to do this that would be great.

Otherwise, I am completely open to suggestions of other pieces of software. I am currently aligning the reads to the reference sequence using fr-hit (http://weizhong-lab.ucsd.edu/frhit/) but I don't know how to plot the resulting data into a 'percent identity plot' or 'recruitment plot'. Any help would be greatly appreciated.

Thanks!

fasta • 4.7k views

ADD COMMENT • link updated 7.4 years ago by Biostar 20 • written 13.4 years ago by User 6228 ▴ 110

Istvan Albert · Answer 1 · 2011-07-26

I ended up borrowing (with permission) the Javascript from this site: http://weizhong-lab.ucsd.edu/frv/, then I wrote a tiny php backend that parses Blast and FR-Hit files and sends the data to the Javascript plotter in the right format.

It works very well on my system, but you will have to apply a little elbow grease to get the backend working for yourself.

Here is the code I am using to parse the Blast & FR-Hit files and send it to the Javascript plotter:

    function get ($file)

{ if (preg_match('/blast/', $file)) { $type = 'blast'; } else { $type = 'frv'; }

$lines = file($SERVER['DOCUMENTROOT'].'/fr-view/resources/data/'.$file);

$graphdata = array(); $maxx = 0; $numberofmatches = 0; foreach ($lines as $l) { if ($l[0] == '#' || $l[0] == ' ') continue;

$l = explode("\t", $l); foreach ($l as &$each) { $each = trim($each, "\n\r\t ,%"); }

if ($type == 'frv') { $percentid = $l[7]; $beginning = $l[9]; $end = $l[10]; } else if ($type == 'blast') { $percentid = $l[2]; $beginning = $l[6]; $end = $l[7]; } else { die('Invalid type!'); }

if ($end > $maxx) $maxx = $end;

$graphdata[] = array($beginning, $percentid, $end); $numberofmatches++; } die(jsonencode(array( 'numberofmatches' => $numberofmatches, 'maxx' => $maxx, 'graphdata' => $graph_data ))); }

You can put that in a PHP file, and make an ajax call to that URL in your Javascript to retrieve the data.

If none of that makes any sense, just ignore this answer.

score 1 · Answer 2 · 2011-07-09

1

Entering edit mode

13.4 years ago

Benm ▴ 710

You can use gsMapper(Newbler v2.5.3, it is not open source software) to align your 454 (sff) reads to reference sequence. But I don't find the suitable software to do 'recruitment plot', I don't know if the CAMERA 1.3.2 could help you, I think you can make it with R using the alignment infor. from gsMapper results. Sorry for the weak suggestion.

ADD COMMENT • link 13.4 years ago by Benm ▴ 710

0

Entering edit mode

I can align it with a reference sequence (blast, fr-hit, etc) but I need something to plot it. I will take a look at R.

ADD REPLY • link 13.4 years ago by User 6228 ▴ 110

Istvan Albert · Answer 3 · 2011-07-09

have a look at this [http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0050077 paper, basically you would want to BLASTn you reads against a reference genome. then, parse the BLAST results to extract the hits' coordinates and percent ID. finally, for plotting you would need to create a csv file ("coordinate", "%ID") and use the likes of ggplot's geom_point in R for plotting, e.g:

foo = read.csv('your_results.csv', header=T)
ggplot(foo, aes(x=coord, y=X.id)) +  geom_point(aes(size=1.5, alpha=X.id,legend.position="none")) + scale_colour_manual(value = colours) + opts(theme_bw(), axis.text.x = theme_text(size=8, angle = 90, hjust=1), panel.background = theme_blank(), panel.grid.major = theme_blank(), title= "choose_title")

--OR-- another option is to upload your metagenomic data to the mg-rast server (http://metagenomics.anl.gov/) where a recruitment analysis option is available against genomes available on their server.