I have an R dataframe with a column of ENSG IDs.
I believe it contains non-protein coding IDs that I do not want
I only want to keep the rows that correspond to protein coding mRNAs
I am looking for a source of ENSG IDs (list or similar) that only contains IDs corresponding to protein coding mRNA.
I don't really need help with the coding, I just am looking for the data source.
The best thing I can think to do is scrape gencode's "Protein-coding transcript sequences" fasta, but there is hopefully a better way.
Thank you.