Question about aligning the transcript_id to the NCBI protein name

0

Entering edit mode

4.5 years ago

758104598 • 0

Dear all

I am new in RNA-Seq, and I learnt to do the DEseq2 and got my differential expression data. But I can't match my transcript_id to the protein(or transcript?). Since I found that I can only find the genome and the annotation in the NCBI database rather than Ensembl, so I can't use BioMart to match the ID data. is there any tool to match the data from NCBI dataset?

Thanks a lot!!!

And these are the strange transcript ids of the annotation file, which match the RNA-Seq fastq, M437DRAFT_42372 M437DRAFT_42614 M437DRAFT_60249 M437DRAFT_88113 M437DRAFT_50976 M437DRAFT_43335.

RNA-Seq • 651 views

ADD COMMENT • link 4.5 years ago by 758104598 • 0

0

Entering edit mode

Where did you get the reference and annotation file from? What genome is this referring to?

ADD REPLY • link 4.5 years ago by GenoMax 148k

0

Entering edit mode

I got it from the NCBI genome assembly, and it is referring to the fungus, aureobasidium melanogenum.

ADD REPLY • link 4.5 years ago by 758104598 • 0

0

Entering edit mode

I don't see transcript models in the genome entry for this organism in NCBI's genome database. If you are looking at a different entry can you post the link?

My guess is these are likely computationally predicted (based on DRAFT in the name) transcript. They may not have any additional information available and no protein name.

ADD REPLY • link 4.5 years ago by GenoMax 148k

0

Entering edit mode

Thanks for your reply, and I can match these names in the NCBI dataset. but many of them are hypothetical proteins and no detailed information about them. Does it means I can't use this annotation file to search for the proteins?

ADD REPLY • link 4.5 years ago by 758104598 • 0

0

Entering edit mode

I can't use this annotation file to search for the proteins?

You can do DE analysis with those ID's but if you identify any as significant you will need to do additional work to see if they can be informatically characterized.

ADD REPLY • link 4.5 years ago by GenoMax 148k

Login before adding your answer.