Hi Everyone,
I want to get geneID(gene_id) and its associated gene symbol(gene_name) from a gtf file into a file that can be read into R.
My gtf file looks like this:
chr1 HAVANA gene 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUSG00000102693.1"; gene_type "TEC"; gene_status "KNOWN"; gene_name "RP23-271O17.1"; transcript_type "TEC"; transcript_status "KNOWN"; transcript_name "RP23-271O17.1"; level 2; havana_gene "OTTMUSG00000049935.1";
chr1 HAVANA transcript 3073253 3074322 . + . gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_status "KNOWN"; gene_name "RP23-271O17.1"; transcript_type "TEC"; transcript_status "KNOWN"; transcript_name "RP23-271O17.1-001"; level 2; tag "basic"; havana_gene "OTTMUSG00000049935.1"; havana_transcript "OTTMUST00000127109.1";
Can anyone tell me how to write a little script (preferably python) to do it?
Any idea would be appreciated!
This package was removed from CRAN on the 4th March 2019 (https://cran.r-project.org/web/packages/refGenome/index.html)
The link shows that this package is still available. I can install this package today (07/24/2019)
How can I extract the "genes" with the subset of data in csv/tab format? thanks
In R
Use sep="\t" for tab delimited