Genome Annotation
4
5
Entering edit mode
13.2 years ago
Charsonic_Wu ▴ 30

Hi all,

I am completely new to sequencing. I am a computer science student but I am working on a bioinformatics project on whole genome functional annotation.

My data is in csfasta format. How do I change this to fasta format? I am also very confused..what is the difference between the F3.csfasta file and the F5.csfasta file?

Additionally, I have been told that the data is in clc format..what does this mean?

How do I go about doing a whole genome annotation? Does anyone know of any good tools to do whole genome functional annotations?

I am extremely desperate and very very confused. Any information would be very much appreciated.

Thank you.

function genome • 5.2k views
ADD COMMENT
3
Entering edit mode
13.2 years ago
Carson ▴ 30

I can't help with the cfasta conversion, but I can with the annotation portion. There are basically two types of annotation that you might be referring to de novo or variant annotation. I'll try and describe both.

If this is a newly sequenced organism and you are doing de novo annotation (i.e no existing reference genome), you can use MAKER for structural annotation as well as MAKER and InterProScan for functional annotation. Also look at gmod.org for other annotation tools from the generic model organism database project.

If this is a human genome (or an organism with an existing reference genome), and you want to annotate functional variants, use BWA to align to the reference, GATK or samtools to identify and variants (SNPs and indels). Then use VAAST or annonovar to classify and prioritize the variants.

ADD COMMENT
1
Entering edit mode

FYI: There are two workshops on MAKER in the next month or so:

Sept 28-30, Genome Annotation course at UC Davis http://gmod.org/wiki/News/UC_Davis_Courses_this_September

Oct 14 at OICR in Toronto: http://gmod.org/wiki/October_2011_GMOD_Meeting#Scheduled_Satellite_Meetings

ADD REPLY
0
Entering edit mode

+1 for MAKER - makes life easy!

ADD REPLY
3
Entering edit mode
13.2 years ago
Barry ▴ 40

Also, to follow up on Carson's reply if this is ABI data for a novel genome and you're hoping to annotate the genome, you'll need to assemble it some how first. There are plenty of tools out there for this sort of task, and which one you choose will depend on a number of factors. Google will lead you to plenty of discussion - I'd have a look at Abyss (http://www.bcgsc.ca/platform/bioinfo/software/abyss) and then read a few threads like this (http://seqanswers.com/forums/archive/index.php/t-1424.html) to get a flavor for some of the issues involved. Coming from CS you'll feel right at home with all the technical details of the De Bruijn and Euler graphs involved in these tools - it's fun stuff!

ADD COMMENT
2
Entering edit mode
13.2 years ago
Mdeng ▴ 530

Hey,

if you have data, where the filename is like "_F3.csfasta" there should be a corresponding "_F3.qual" file. Both files together are your reads, coming out the sequencer. Now, depending on which sequencing plattform has been used, you have "create/apply" your "pipeline". In the case that you are working on a whole genome project, the data should be whole genome seq.. The infix F3.xxx is meaning that these are single end reads, paired end would be R3.xxx.

First of all you should search for a pipeline, with the attributes of single end reads, your seq plattform and whole genome seq. You will find some ;)

So the steps would be:

  • Map your data to a reference (search for "hg18" or "hg19", human genome - 19 is newer) using maybe BWA
  • Call your SNPs, GATK or samtools
  • Annotate your SNPs, this is, also like the mapping, a science by itself. ATM I am using NGS-SNP.

These are the real basic steps.

ADD COMMENT
1
Entering edit mode

The official names of the human reference genome assemblies are NCBI36 and GRCh37, respectively (NCBI36 = hg18, GRCh37 = hg19).

ADD REPLY
0
Entering edit mode

Thank you very very much. That makes things clearer^^

ADD REPLY
1
Entering edit mode
13.2 years ago
Rob Syme ▴ 540

I've not had to mess around with colour space data before, but I'm pretty sure that the the instrument manufacturers ABI share software to do that sort of conversion. The software is Corona-lite which can be downloaded from here.

You'll need to register with ABI, but I think it's free.

ADD COMMENT
0
Entering edit mode

That's what they recommend over at SeqAnswers too.

ADD REPLY
0
Entering edit mode

Thank you very very much~

ADD REPLY

Login before adding your answer.

Traffic: 2074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6