filtered tab delimited file with awk
1
0
Entering edit mode
4.3 years ago

Hello everyone ... I have the output of an assembly with cufflinks and I want to delete transcripts that have FPKM < 0.5 ... How can I do this using awk .. Thanks for your help.

1   Cufflinks   transcript  11218   12435   1   +   .   "gene_id ""Os01g0100200""; transcript_id ""Os01t0100200-01""; FPKM ""0.0000000000""; frac ""0.000000""; conf_lo ""0.000000""; conf_hi ""0.129298""; cov ""0.000000""; full_read_support ""no"";"
1   Cufflinks   exon    11218   12060   1   +   .   "gene_id ""Os01g0100200""; transcript_id ""Os01t0100200-01""; exon_number ""1""; FPKM ""0.0000000000""; frac ""0.000000""; conf_lo ""0.000000""; conf_hi ""0.129298""; cov ""0.000000"";"
1   Cufflinks   exon    12152   12435   1   +   .   "gene_id ""Os01g0100200""; transcript_id ""Os01t0100200-01""; exon_number ""2""; FPKM ""0.0000000000""; frac ""0.000000""; conf_lo ""0.000000""; conf_hi ""0.129298""; cov ""0.000000"";"
1   Cufflinks   transcript  11372   12284   1000    -   .   "gene_id ""Os01g0100300""; transcript_id ""Os01t0100300-00""; FPKM ""0.1303951660""; frac ""1.000000""; conf_lo ""0.000000""; conf_hi ""0.502776""; cov ""0.268453""; full_read_support ""no"";"
1   Cufflinks   exon    11372   12042   1000    -   .   "gene_id ""Os01g0100300""; transcript_id ""Os01t0100300-00""; exon_number ""1""; FPKM ""0.1303951660""; frac ""1.000000""; conf_lo ""0.000000""; conf_hi ""0.502776""; cov ""0.268453"";"
1   Cufflinks   exon    12146   12284   1000    -   .   "gene_id ""Os01g0100300""; transcript_id ""Os01t0100300-00""; exon_number ""2""; FPKM ""0.1303951660""; frac ""1.000000""; conf_lo ""0.000000""; conf_hi ""0.502776""; cov ""0.268453"";"
1   Cufflinks   transcript  12721   15685   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427""; full_read_support ""yes"";"
1   Cufflinks   exon    12721   13813   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; exon_number ""1""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427"";"
1   Cufflinks   exon    13906   14271   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; exon_number ""2""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427"";"
1   Cufflinks   exon    14359   14437   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; exon_number ""3""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427"";"
1   Cufflinks   exon    14969   15171   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; exon_number ""4""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427"";"
1   Cufflinks   exon    15266   15685   1000    +   .   "gene_id ""Os01g0100400""; transcript_id ""Os01t0100400-01""; exon_number ""5""; FPKM ""6.3114227816""; frac ""0.716157""; conf_lo ""5.284882""; conf_hi ""7.337964""; cov ""25.312427"";"
1   Cufflinks   transcript  12808   13978   1000    -   .   "gene_id ""Os01g0100466""; transcript_id ""Os01t0100466-00""; FPKM ""5.8063471120""; frac ""0.283843""; conf_lo ""4.329164""; conf_hi ""7.283530""; cov ""23.286784""; full_read_support ""no"";"
1   Cufflinks   exon    12808   13782   1000    -   .   "gene_id ""Os01g0100466""; transcript_id ""Os01t0100466-00""; exon_number ""1""; FPKM ""5.8063471120""; frac ""0.283843""; conf_lo ""4.329164""; conf_hi ""7.283530""; cov ""23.286784"";"
1   Cufflinks   exon    13880   13978   1000    -   .   "gene_id ""Os01g0100466""; transcript_id ""Os01t0100466-00""; exon_number ""2""; FPKM ""5.8063471120""; frac ""0.283843""; conf_lo ""4.329164""; conf_hi ""7.283530""; cov ""23.286784"";"
1   Cufflinks   transcript  2905    10815   1000    +   .   "gene_id ""CUFF.1""; transcript_id ""CUFF.1.1""; FPKM ""4.7439672851""; frac ""0.518769""; conf_lo ""3.876114""; conf_hi ""5.611820""; cov ""18.968843""; full_read_support ""yes"";"
1   Cufflinks   exon    2905    3255    1000    +   .   "gene_id ""CUFF.1""; transcript_id ""CUFF.1.1""; exon_number ""1""; FPKM ""4.7439672851""; frac ""0.518769""; conf_lo ""3.876114""; conf_hi ""5.611820""; cov ""18.968843"";"
awk • 1.6k views
ADD COMMENT
0
Entering edit mode

I guess FPKM is the 13th column. if so, you can print transcripts that have FPKM values greater than 0.5 as below.

 cat yourfile | awk '{if($13>=0.5) print}' > output
ADD REPLY
0
Entering edit mode

A couple of questions:

  1. This looks like a GTF-ish format. 9 columns are tab delimited and the 9th column is a ; delimited, " " separated key-value pair. How does your awk account for that?
  2. Even if everything were tab delimited, there is no way FPKM is column #13. Even if so, it is quoted (doubly for some odd reason). How do you directly compare that to a number?
ADD REPLY
3
Entering edit mode
4.3 years ago
awk -F '[ "\t]+' '{for(i=9;i+1<=NF;i++) if($i=="FPKM" && $(i+1)>=0.5) {print;break;}}' yourfile
ADD COMMENT
2
Entering edit mode

amazing, as usual.

ADD REPLY
0
Entering edit mode

It was great .. Thank you

ADD REPLY

Login before adding your answer.

Traffic: 1368 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6