print non match to list lines of GTF file
1
0
Entering edit mode
5.5 years ago
Sam ▴ 150

Dear Biostars

I have a GTF file and also a gene_id list file. I want to exclude the lines contain the gene_id of list file

any help?

Thanks

GTF file:

    Chr08   StringTie   exon    58908449    58908806    1000    -   .   gene_id "MSTRG.26714"; transcript_id "MSTRG.26714.1"; exon_number "1";
    Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
    Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";

list file:
MSTRG.26714
MSTRG.26717
MSTRG.26704

output:

Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
   Chr05    StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
bash grep awk • 1.6k views
ADD COMMENT
1
Entering edit mode

Try this:

grep -v -w -f list_file GTF_file
ADD REPLY
3
Entering edit mode
5.5 years ago
Prakash ★ 2.2k

Did above command worked, it didn't work for me, you can try using awk

awk -F'"' 'NR==FNR{a[$1]++;next}!a[$2]' list_file GTF_file
ADD COMMENT
1
Entering edit mode

I used egrep instead of grep and it worked!

ADD REPLY
0
Entering edit mode

What did you get? Check if you have an empty line in list_file...

For me it was:

$ cat GTF_file
Chr08   StringTie   exon    58908449    58908806    1000    -   .   gene_id "MSTRG.26714"; transcript_id "MSTRG.26714.1"; exon_number "1";
Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
$ cat list_file
MSTRG.26714
MSTRG.26717
MSTRG.26704
$ grep -v -w -f list_file GTF_file
Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
ADD REPLY
0
Entering edit mode

you are right SMK, there was actually empty line in the file Its working now. :)

ADD REPLY
0
Entering edit mode

Great!... Thanks for reporting. :-)

ADD REPLY

Login before adding your answer.

Traffic: 2051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6