Entering edit mode
17 months ago
Assa Yeroslaviz
★
1.9k
Hi,
we plan to run a proteomics experiment with an organism, which have almost no entries in UniProt. On the other hand it has quite a few sequencing data (WGS, RNA-Seq, or other short-reads).
Would it be possible to analyze the sequencing data and gain information about its proteins (or isoforms)?
Does anyone know about a protocol, a paper or a pipeline on this approach? Searching online didn't deliver the expected results.
thanks in advance.
Assa
Using a combination of HMMER+Pfam might be a good place to start. Also, depending on your organism, maybe trying to compare close related species to find otrhologous genes using OrthoFinder.
Thanks for the suggestion. I'm not sure I understand though what you mean?
Do you mean, I should use HMMER+Pfam to compare the protein sequences I have to the repository and try to identify homologs/orthologs?
Kinda, you would learn which protein domains your peptides have and maybe it is easier to determine a function (for example, if a peptide has a Homeobox, chances that this is a transcription factor are rather high). But I think the better way is to find orthologs and maybe assign function to most genes these way.
In addition to what biofalconch suggests, you could do a de novo transcriptome assembly with the best quality RNAseq reads you can find. If you only have short reads, you can use something like Trinity, but if you can find ONT or PacBio RNAseq reads, you can use a more powerful approach like IsoQuant.
And then along the lines of what biofalconch says, you can annotate the hypothetical protein sequences using InterProScan, and cluster them with a well annotated species using something like OrthoFinder or OrthoMCL.