Question

WGS sequencing data based proteomics analysis

0

Entering edit mode

2.0 years ago

Assa Yeroslaviz ★ 1.9k

Hi,

we plan to run a proteomics experiment with an organism, which have almost no entries in UniProt. On the other hand it has quite a few sequencing data (WGS, RNA-Seq, or other short-reads).

Would it be possible to analyze the sequencing data and gain information about its proteins (or isoforms)?

Does anyone know about a protocol, a paper or a pipeline on this approach? Searching online didn't deliver the expected results.

thanks in advance.

Assa

workflow UniProt proteomics WGS RNA-Seq • 1.1k views

ADD COMMENT • link updated 2.0 years ago by biofalconch ★ 1.3k • written 2.0 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

Using a combination of HMMER+Pfam might be a good place to start. Also, depending on your organism, maybe trying to compare close related species to find otrhologous genes using OrthoFinder.

ADD REPLY • link 2.0 years ago by biofalconch ★ 1.3k

0

Entering edit mode

Thanks for the suggestion. I'm not sure I understand though what you mean?

Do you mean, I should use HMMER+Pfam to compare the protein sequences I have to the repository and try to identify homologs/orthologs?

ADD REPLY • link 2.0 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

Kinda, you would learn which protein domains your peptides have and maybe it is easier to determine a function (for example, if a peptide has a Homeobox, chances that this is a transcription factor are rather high). But I think the better way is to find orthologs and maybe assign function to most genes these way.

ADD REPLY • link 2.0 years ago by biofalconch ★ 1.3k

0

Entering edit mode

In addition to what biofalconch suggests, you could do a de novo transcriptome assembly with the best quality RNAseq reads you can find. If you only have short reads, you can use something like Trinity, but if you can find ONT or PacBio RNAseq reads, you can use a more powerful approach like IsoQuant.

And then along the lines of what biofalconch says, you can annotate the hypothetical protein sequences using InterProScan, and cluster them with a well annotated species using something like OrthoFinder or OrthoMCL.

ADD REPLY • link 2.0 years ago by dthorbur ★ 3.0k