What kinds of gene ID types like below??
1
0
Entering edit mode
4.0 years ago
soojinima ▴ 10

What kinds of gene ID types like below??

c100000_g1
c100000_g2
c100001_g1
c100002_g1
c100003_g1
c100003_g10
c100003_g11
c100003_g2
c100003_g3
c100003_g4
c100003_g5
c100003_g6
c100003_g7
c100003_g8
c100003_g9
c100004_g1
c100005_g1

I cannot find reference. thank you

RNA-Seq genome gene sequence • 1.1k views
ADD COMMENT
1
Entering edit mode

looks like IDs from running trinity (or perhaps CDHIT or such).

anyway without any additional info it's a good a guess as anything else.

From the accompanying paper:

Transcriptome assembly was performed using Trinity software (v2.4.0) with min_kmer_cov set to 2 by default and all other parameters set to default.

so unless they also provided the transcriptome assembly result somewhere these IDs are rather pointless

EDIT: and I truly first guessed before I looked up the paper :)

ADD REPLY
0
Entering edit mode

Where do you see these? Is it in an RNAseq experiment (which I'm deducing from your tag, but you should have mentioned in your post)? Please give us more information - as much of it as you can.

ADD REPLY
0
Entering edit mode

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE123550

I tried to analyse GEO data base. Their raw data provided gene_id, read_count and fpkm value as a result of high throughput sequencing.

In this data, gene_id was provided as mentioned above.

ADD REPLY
1
Entering edit mode
4.0 years ago
vkkodali_ncbi ★ 3.8k

These are Trinity transcript identifiers of the kind described here. The GEO link also has FASTA file with sequence data with the same identifiers:

$ zgrep 'c100000_g1' GSE123550_Trinity.fasta.gz
  >c100000_g1_i1 len=1241 path=[11595:0-1240]
ADD COMMENT
0
Entering edit mode

Thank you for good information.

Can I get "trinity transcript identifiers" list? I need gene symbol. thank you.

ADD REPLY
0
Entering edit mode

Trinity does de novo RNAseq assembly - I don't understand how you can get a list of known identifiers for contigs produced by de novo assembly.

Does a gerbil reference genome/GTF annotation exist? If not, you cannot map the contigs to genes of any kind.

ADD REPLY
0
Entering edit mode

Yes, these are all just transcripts assembled from reads. To find our which gene that is, you will have to align these transcripts (use the FASTA file from the GEO web link) to a reference genome for which an annotation exists. I think Trinity tutorial uses GMAP for alignments but you can probably use minimap2 as well. Once you have the alignments you can use other tools like bedtools to map the alignments to genes.

ADD REPLY

Login before adding your answer.

Traffic: 2217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6