Entering edit mode
16 months ago
Alex
•
0
Hi all!
How to import a gtf file in R so that it appears in tabular format as if it were a data frame?
This is the file name:
GCF_023701775.1_HaSCD2_genomic.gtf
This is the file head:
#gtf-version 2.2
#!genome-build HaSCD2
#!genome-build-accession NCBI_Assembly:GCF_023701775.1
#!annotation-source NCBI RefSeq Helicoverpa armigera Annotation Release 101
NC_064776.1 Gnomon gene 4347 6338 . - . gene_id "LOC126054536"; transcript_id ""; db_xref "GeneID:126054536"; description "uncharacterized LOC126054536"; gbkey "Gene"; gene "LOC126054536"; gene_biotype "protein_coding";
It's just a tab-separated file. You can use
read.delim(filename, sep="\t", skip=n)
where n = header lines. Is that really what you want?Actually tab-separated with nested columns separated by semicolon, so the rtracklayer route should be preferred since it takes care of separating those columns-in-columns.
Thus my question to the OP..."Is that really what you want?" as it wasn't clear to me what they were trying to accomplish with a data frame. But yes, I figured the real solution would involve txdb or rtracklayer, which are both great for this.