Group Genes by Protein Coding/Non-coding feature
1
0
Entering edit mode
3.6 years ago
asumani ▴ 70

Hi,

I have converted human ENSEMBL gene ids to GENE SYMBOL using AnnotationDbi package. So, I have a list of genes names like "TSPAN6","PDE11A". Is there any tool to group genes by protein coding and non-coding features? It could be a more detailed grouping also, but this broad grouping would also work for me.

Thanks in advance.

NON-CODING GENE ID_CONVERSION CODING • 1.5k views
ADD COMMENT
0
Entering edit mode

Get Ensembl GTF file for human here. Uncompress using gunzip.

Protein coding genes:

$ grep protein_coding Homo_sapiens.GRCh38.104.chr.gtf | awk -F "\t" '{if ($3 == "gene") print $0}' | awk -F "gene_id |;" '{print $2,$4}' | head -10
"ENSG00000284662"  gene_name "OR4F16"
"ENSG00000186827"  gene_name "TNFRSF4"
"ENSG00000186891"  gene_name "TNFRSF18"
"ENSG00000160072"  gene_name "ATAD3B"
"ENSG00000041988"  gene_name "THAP3"
"ENSG00000142611"  gene_name "PRDM16"
"ENSG00000067606"  gene_name "PRKCZ"
"ENSG00000131584"  gene_name "ACAP3"
"ENSG00000169972"  gene_name "PUSL1"
"ENSG00000157911"  gene_name "PEX10"

You can look at gene_biotype field and create other groups.

ADD REPLY
0
Entering edit mode

Thanks for the awk pipes! But I don't look for something like extracting information from GTF. I assumed there is already a tool which gives out the gene feature in gene_biotype given the gene symbol.

ADD REPLY
1
Entering edit mode

You could use Ensembl BioMart to get that information then.

ADD REPLY
0
Entering edit mode

please post example input, expected output and tried code (if there is any).

ADD REPLY
0
Entering edit mode
3.6 years ago
EagleEye 7.6k

extract only geneID and gene symbol from GTF file

For the above example, 4th column consists of gene_type (gene_class). You can use this information to group your list.

ADD COMMENT

Login before adding your answer.

Traffic: 2228 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6