Question

RNASeq, gtf file

0

Entering edit mode

4.3 years ago

mdfardin374 ▴ 10

Hello, I am new to Bioinformatics and I am dealing with a gtf file. I have a gtf file and in this gtf file the column look like following

1   StringTie   transcript  463572  478996  1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; reference_id "ENSBTAT00000015319"; ref_gene_id "ENSBTAG00000011528"; ref_gene_name "SMIM11A"; cov "7.464981"; FPKM "8.923761"; TPM "15.880815";
1   StringTie   exon    463572  463746  1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; reference_id "ENSBTAT00000015319"; ref_gene_id "ENSBTAG00000011528"; ref_gene_name "SMIM11A"; cov "2.531429";

I want to filter transcript with value FPKM value greater than 2. Please help me

rna-seq sequencing • 1.7k views

ADD COMMENT • link updated 4.3 years ago by Juke34 9.0k • written 4.3 years ago by mdfardin374 ▴ 10

1

Entering edit mode

C: filtered tab delimited file with awk

ADD REPLY • link 4.3 years ago by Mehmet ▴ 820

0

Entering edit mode

You can do that in Bash or Python for example. Here are the steps :

Select "transcript" lines
Split by ";"
Look for element starting with "FPKM"
Once found, compare FPKM value to 2.
Keep the line if higher than 2

ADD REPLY • link 4.3 years ago by Bastien Hervé 6.0k

score 1 · Answer 1 · 2020-08-25

1

Entering edit mode

4.3 years ago

Juke34 9.0k

You can use agat_sp_filter_feature_by_attribute_value.pl from AGAT

conda create -n agat
conda activate agat
conda --install -c bioconda agat
agat_sp_filter_feature_by_attribute_value.pl --gff input.gff --attribute FPKM --value 2 --test "<" -o result_value_over_or_equal=2.gff

ADD COMMENT • link 4.3 years ago by Juke34 9.0k

0

Entering edit mode

Hello juke34, I have not used the suggestion given by you, but was curious to know that whether this script will be able to remove all exons associated with transcripts. I am asking this as there is no FPKM information available for exon.

ADD REPLY • link 4.3 years ago by Tm ★ 1.1k

2

Entering edit mode

yes

Removing a level1 (e.g. gene) or level2 (e.g transcript, mRNA) feature will automatically remove all linked subfeatures, and removing all children (e.g: mRNAs of a gene, or exons of an mRNA ) of a feature will automatically remove this feature too.