Illumina Sequence Analysis
3
2
Entering edit mode
13.3 years ago
Pkuku ▴ 20

I have got 20 million reads of 75bp in length. I do not have a reference annotated genome to map these sequences. How would i do a reference annotated genome, so that i can upload it to do the mapping? Is there any method to combine my reference sequence and also annotated sequence to predict the genes ?? I am naive to the field. Please help me out.

illumina reference sequence short annotation • 2.7k views
ADD COMMENT
1
Entering edit mode
13.3 years ago
Pasta ★ 1.3k

For the reference genome question, you can make a "de novo" assembly of your genome using software like Velvet or Edena. Also you could use a closely related genome as a reference if there is any.

BTW, are you working on a Prokaryotic cell ?

ADD COMMENT
0
Entering edit mode

COMMENT FROM @pkuku: Thank you for very much for your reply.We are working with an algae species. I do can use a closely related species as a reference but how could i do an annotated genome. Can i use these tools to do the annotation too?

ADD REPLY
0
Entering edit mode

Well, that is another problem :) These tools are made for genome assembly not genome annotation. After, it all depends how related your species are.

ADD REPLY
1
Entering edit mode
13.3 years ago
Vitis ★ 2.6k

Are the reads from genomic DNA or mRNA, or small RNA? The process depends on the source of your data and your goals. If you want to characterize this new genome from the scratch, and if your reads are from genomic DNA, I think the best way is to do a de novo assembly, as suggested by pasta. Then you can use all the standard comparative genomics tools (mummer? blast? blat?) to compare the de novo contigs with your closely related species, annotate or find synteny, etc. If your reads are from mRNAs, you can also do de novo assemblies and treat the resulted contigs as ESTs or cDNAs, so you can use tools like PASA pipeline to annotate your genome. Of course, during this process, your closely related species would be of great help. Or, you may directly map your reads (whether from DNA or RNA) to the closely related reference genome, allowing adequate mismatches to accommodate the divergence (better have a rough idea of percentage difference before mapping) and reconstruct the new genome from the mapping alignments. Again, when facing these high-throughput data, you gotta figure out your goals first, then pick up tools.

ADD COMMENT

Login before adding your answer.

Traffic: 2005 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6