I've been given a sequence to write a report as if I was a bioinformatician working with a research group who have recently obtained this sequence as part of a genome sequencing project.
I've divided the report into two parts, PLAN
part 1 - Annotating the sequence 1) Find longest open reading frame via https://www.ncbi.nlm.nih.gov/orffinder/ and then find the longest ORF and therefore the amino acid sequence of the protein encoded by blasting the ORF 2)Wanted to find potential motifs of the protein (not sure how I would do this or with what bioinformatic tool) 3) Possibly multiple sequence alingments (Also not sure how) 4) Calculating SNP of the sequence (Unsure) 5) Identifying read coverage based on statistical base frequencies, residue frequencies and CpG islands (Unsure) 6) Determing such as molecular weight, isoelectric point, transmembrane regions and hydrophobicity (Unsure)
Part 2 - Analysis of the likely function of the protein 1) by identifying homologs of the protein based on amino acid sequence and thereby predicting function based on the similarity with other proteins that share a high sequence identify with the protein of interest. 2) Not sure on what else I should do here.
Any help on how I would do this would be really appreciated, thank you.
As this is an assignment/homework, you will need to show the efforts you have made toward achieving this, and specific things you are stuck on before we will be able to provide any help. We are not here to do you work for you, rather to help you understand.
As a general hint to get you going though, you can pretty much google any of the terminology you need to find an appropriate tool. Try searching google for "protein motif prediction" or "multiple sequence alignment tools".
A useful starting site will probably be
omictools.com
.Just a quick tip for using
omictools.com
, if you land on the annoying "subscribe for free" page, just use private navigation.I completely get you, you're completely right I do need to show efforts, I've tried contacting tutors etc and not really getting anywhere with it and the module itself I'm really really struggling to comprehend what's going on. I wanted to find a report on someone who actually has done this before so I could come to terms on what I need to systematically do on writing this report as I'm really clueless.
I'll begin going protein motif prediction now, as well as using omictools.com
Appreciate your response :)
Bioinformatics is a trial and error discipline anyway. There is no harm in throwing tools at the data and taking a little time to try to understand what comes out the other end.
Scientific assignments are usually less about getting the right answer, and more about showing that you know what the data says (if it's a good assignment anyway). Your reasoning is more important than your answer.
You must provide more details to get more specific help. Your PLAN implies you have been given, in addition to one or a few contigs, raw sequencing reads (Identifying read coverage), but you didn't clearly said so. Also, it is possible you already know what the organism is, or at least know if it a prokaryote or eukaryote - this information is important for deciding which software or approach is most appropriate.