Remove gene features from a given list
1
0
Entering edit mode
5.5 years ago

Hello,

Is there a tool that can help me remove the entire features for a given list of genes? i.e from this gff3-version, I want to remove all the features related to g1 (start to end).

# start gene g1 
scaffold1size1833262    AUGUSTUS    gene    1   1168    0.56    +   .   ID=g1 
scaffold1size1833262    AUGUSTUS    transcript  1   1168    0.56    +   .   ID=g1.t1;Parent=g1 
scaffold1size1833262    AUGUSTUS    intron  1   563 0.91    +   .   Parent=g1.t1 
scaffold1size1833262    AUGUSTUS    CDS 564 676 0.91    +   2   ID=g1.t1.cds;Parent=g1.t1 
scaffold1size1833262    AUGUSTUS    exon    564 1168    .   +   .   Parent=g1.t1 
scaffold1size1833262    AUGUSTUS    stop_codon  674 676 .   +   0   Parent=g1.t1 
scaffold1size1833262    AUGUSTUS    transcription_end_site  1168    1168    .   +   .   Parent=g1.t1
# protein sequence = [SGFLRPVEADVNLTVCSKDTGKAADKGGSTSFPISM]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 33.3
# CDS exons: 0/1
# CDS introns: 0/1
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 1/1
#      W:   1 
# hint groups fully obeyed: 40
#      W:  40 
# incompatible hint groups: 14
#      W:  14 
# end gene g1
gene annotation gff3 • 3.4k views
ADD COMMENT
1
Entering edit mode

Have you looked at some combination of grep -v -w to eliminate lines with g1*?

ADD REPLY
0
Entering edit mode

the problem is, it will remove all the lines with g1 but leaves the traces like lines from # protein sequence till # W: 14. This would make the gff3 file untidy.

ADD REPLY
0
Entering edit mode

Dear @arunprasanna83,

Have you fixed your question? I got same problem. I used some combination of grep but it does not work on my case. Take an example as below. I want to remove all lines below from the original gff3 file, but there is only transcript id information of novel_model_471_5f349842 in the list (if not check by looking at the original gff3, I can't know the parent is novel_gene_467_5f349842). So, I am looking for a tool to remove all the feature lines based on transcript ids, which is similar to yours. If you have any answer about your question, would you mind let me know?

Thanks a lot!

Best,

Xiaofei

Here is the lines in annotation file:

chrUn . gene 30387395 30387595 . + . ID=novel_gene_467_5f349842;Name=%2A%2A%20NO%20NAME%20ASSIGNED%20%2A%2A chrUn . mRNA 30387395 30387595 . + . ID=novel_model_471_5f349842;Parent=novel_gene_467_5f349842;Name=%2A%2A%20NO%20NAME%20ASSIGNED%20%2A%2A chrUn . exon 30387395 30387595 . + . ID=novel_model_471_5f349842.exon1;Parent=novel_model_471_5f349842 chrUn . CDS 30387395 30387595 . + 0 ID=cds.novel_model_471_5f349842;Parent=novel_model_471_5f349842

ADD REPLY
1
Entering edit mode
4.2 years ago
Juke34 8.9k

You can use agat_sp_filter_feature_from_kill_list.pl from AGAT

ADD COMMENT

Login before adding your answer.

Traffic: 1611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6