Entering edit mode
4.5 years ago
758104598
•
0
Dear all
I am new in RNA-Seq, and I learnt to do the DEseq2 and got my differential expression data. But I can't match my transcript_id to the protein(or transcript?). Since I found that I can only find the genome and the annotation in the NCBI database rather than Ensembl, so I can't use BioMart to match the ID data. is there any tool to match the data from NCBI dataset?
Thanks a lot!!!
And these are the strange transcript ids of the annotation file, which match the RNA-Seq fastq, M437DRAFT_42372 M437DRAFT_42614 M437DRAFT_60249 M437DRAFT_88113 M437DRAFT_50976 M437DRAFT_43335.
Where did you get the reference and annotation file from? What genome is this referring to?
I got it from the NCBI genome assembly, and it is referring to the fungus, aureobasidium melanogenum.
I don't see transcript models in the genome entry for this organism in NCBI's genome database. If you are looking at a different entry can you post the link?
My guess is these are likely computationally predicted (based on
DRAFT
in the name) transcript. They may not have any additional information available and no protein name.Thanks for your reply, and I can match these names in the NCBI dataset. but many of them are hypothetical proteins and no detailed information about them. Does it means I can't use this annotation file to search for the proteins?
You can do DE analysis with those ID's but if you identify any as significant you will need to do additional work to see if they can be informatically characterized.