extract or recode a gtf file based on a gene id list
2
0
Entering edit mode
7.7 years ago
berge2015 ▴ 110

Hi,

Does anyone here know how to extract lines from a gtf file using a list/subset of gene id obtained from the same gtf file? I basically want a 'recoded' (in vcf terminology) gtf file containing information for only those genes which I am interested in.

I tried awk awk 'FNR==NR {a[$0];next} {for (i in a) if (i~$1) print i}' and grep grep -Fwf but these have not yielded what I want. Thank you for your help.

gene RNA-Seq SNP gtf • 4.1k views
ADD COMMENT
0
Entering edit mode

can you please paste some sample data?

ADD REPLY
2
Entering edit mode
7.7 years ago

try the powerfull CSV/TSV toolkit csvtk, usage of csvtk grep

csvtk grep -H -t -f 1 -r -P gene_id_list.txt ref_CDS.gtf

you may change the column index -f 1 where the gene id locates

ADD COMMENT
0
Entering edit mode
7.7 years ago
berge2015 ▴ 110

After playing with awk for a while, I came up with a solution: awk -F'"' 'FNR==NR {block[$0];next} $2 in block' gene_id_list.txt ref_CDS.gtf > out.txt [Note the quote delimeter]

While not the most elegant solution, this does what's asked in the question. Hope it helps anyone else looking for something similar.

ADD COMMENT

Login before adding your answer.

Traffic: 1266 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6