Best Practices For Naming De Novo Transcriptome Sequences?
2
1
Entering edit mode
11.1 years ago

What are best practices for naming fasta sequences from a de novo transcriptome assembly?

Specifically, I'm thinking about

  • naming sequence that is intuitive and logical
  • forward-compatibility in the case of future genome-sequencing or more RNAseq
  • useful for other researchers performing data-base mining or such

I realize this may just end-up being project specific, but I'm hoping to avoid the problem of unstructured text in biological databases down the road.

transcriptome fasta • 2.6k views
ADD COMMENT
2
Entering edit mode
11.1 years ago

As long as you delimited your headers correctly, any future manipulations to conform to another format should be easy. I would make sure to:

  • Choose a sensible delimiter. Obviously something you will not use in your meta-data. Characters like tabs or pipes are used commonly.
  • Have the same amount of delimited meta-data for each header
  • If certain meta-data is not applicable or available, make sure to put an empty place-holder like "NA" or something
  • If you have incrementing numbers, pad the numbers with starting zeroes so all numbers have the same string length. For example: 00001, 00002, 01234, 12345
ADD COMMENT
0
Entering edit mode

Padding zeroes certainly important!

ADD REPLY
2
Entering edit mode
11.1 years ago
Ann ★ 2.4k

My advice: Create names that can be easily parsed. For example, if your de novo assembly generates multiple transcript variants per locus, then use ".N" suffixes to indicate alternative transcripts coming from the same gene. And if you intend to make the sequences available as part of a searchable Web site, use names that are likely to be unique to your species. For example, for Vaccinium corymbosum (blueberry) you might do something like:

Vc1.1 for gene Vc1, transcript 1.

Do a quick google search to find out what your proposed names will bring up.

ADD COMMENT
0
Entering edit mode

Sensible approach to dealing with transcripts, though in the absence of a genome this is one of the aspects of de novo transcriptome assembly I'm least comfortable with.

ADD REPLY

Login before adding your answer.

Traffic: 1633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6