Question

Illumina Sequence Analysis

2

Entering edit mode

13.3 years ago

Pkuku ▴ 20

I have got 20 million reads of 75bp in length. I do not have a reference annotated genome to map these sequences. How would i do a reference annotated genome, so that i can upload it to do the mapping? Is there any method to combine my reference sequence and also annotated sequence to predict the genes ?? I am naive to the field. Please help me out.

illumina reference sequence short annotation • 2.7k views

ADD COMMENT • link updated 13.3 years ago by Vitis ★ 2.6k • written 13.3 years ago by Pkuku ▴ 20

score 1 · Answer 1 · 2011-08-26

1

Entering edit mode

13.3 years ago

Pasta ★ 1.3k

For the reference genome question, you can make a "de novo" assembly of your genome using software like Velvet or Edena. Also you could use a closely related genome as a reference if there is any.

BTW, are you working on a Prokaryotic cell ?

ADD COMMENT • link 13.3 years ago by Pasta ★ 1.3k

0

Entering edit mode

COMMENT FROM @pkuku: Thank you for very much for your reply.We are working with an algae species. I do can use a closely related species as a reference but how could i do an annotated genome. Can i use these tools to do the annotation too?

ADD REPLY • link 13.3 years ago by Eric Normandeau 11k

0

Entering edit mode

Well, that is another problem :) These tools are made for genome assembly not genome annotation. After, it all depends how related your species are.

ADD REPLY • link 13.3 years ago by Pasta ★ 1.3k

score 1 · Answer 2 · 2011-08-27

Are the reads from genomic DNA or mRNA, or small RNA? The process depends on the source of your data and your goals. If you want to characterize this new genome from the scratch, and if your reads are from genomic DNA, I think the best way is to do a de novo assembly, as suggested by pasta. Then you can use all the standard comparative genomics tools (mummer? blast? blat?) to compare the de novo contigs with your closely related species, annotate or find synteny, etc. If your reads are from mRNAs, you can also do de novo assemblies and treat the resulted contigs as ESTs or cDNAs, so you can use tools like PASA pipeline to annotate your genome. Of course, during this process, your closely related species would be of great help. Or, you may directly map your reads (whether from DNA or RNA) to the closely related reference genome, allowing adequate mismatches to accommodate the divergence (better have a rough idea of percentage difference before mapping) and reconstruct the new genome from the mapping alignments. Again, when facing these high-throughput data, you gotta figure out your goals first, then pick up tools.