Annotating Gene Sequences
0
0
Entering edit mode
2.6 years ago
Laura ▴ 50

Hi there,

I've been tasked with annotating genetic sequences for a work project. I have never done this before so I have been looking into different methods, and I'm feeling a bit overwhelmed! I'm hoping someone here can help point me in the right direction.

In other posts about annotating, I've seen recommendations for Galaxy, BLAST, e!ensemble, Augustus, and more. I've played around with all of these programs but am still unsuccessful.

I have around 1,000 regions with sequences in L1 subfamilies. The goal of this research has been to see if there are sub-sequences/motifs that are repeated more than others. I started with a bed file of regions, retrieved fasta data with twobittofa, and ran multiple sequence alignments with clustalo. Now, I have a visualization that I am happy with, except I am missing annotation information.

My visualization looks like this:

enter image description here

and I'd like to have bars across the top to represent areas that are protein coding, known motifs, etc. I'd like to end up with something similar to this figure, with annotated information above the sequence information:

enter image description here

I get errors when I use e!ensemble's BioMart. I have the sequences in a text file formatted like:

chr12:70406846:70412860:1,chr3:177106559:177112539:1,chr1:174812365:174818381:1

and I subset the data to fit in their suggested 500 maximum sequences. This program works when I go one sequence at a time, but I have way too many for this to work.

Mostly I am wondering, of all of these various tools, what tool should I focus on learning to use to accomplish my goal? How do I find out what regions are coding or not coding in my sequences?

Thank you!

galaxy gene blast annotation biomart • 401 views
ADD COMMENT

Login before adding your answer.

Traffic: 2568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6