Entering edit mode
5.6 years ago
arunprasanna83
▴
60
Hello,
Is there a tool that can help me remove the entire features for a given list of genes? i.e from this gff3-version, I want to remove all the features related to g1 (start to end).
# start gene g1
scaffold1size1833262 AUGUSTUS gene 1 1168 0.56 + . ID=g1
scaffold1size1833262 AUGUSTUS transcript 1 1168 0.56 + . ID=g1.t1;Parent=g1
scaffold1size1833262 AUGUSTUS intron 1 563 0.91 + . Parent=g1.t1
scaffold1size1833262 AUGUSTUS CDS 564 676 0.91 + 2 ID=g1.t1.cds;Parent=g1.t1
scaffold1size1833262 AUGUSTUS exon 564 1168 . + . Parent=g1.t1
scaffold1size1833262 AUGUSTUS stop_codon 674 676 . + 0 Parent=g1.t1
scaffold1size1833262 AUGUSTUS transcription_end_site 1168 1168 . + . Parent=g1.t1
# protein sequence = [SGFLRPVEADVNLTVCSKDTGKAADKGGSTSFPISM]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 33.3
# CDS exons: 0/1
# CDS introns: 0/1
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 1/1
# W: 1
# hint groups fully obeyed: 40
# W: 40
# incompatible hint groups: 14
# W: 14
# end gene g1
Have you looked at some combination of
grep -v -w
to eliminate lines withg1*
?the problem is, it will remove all the lines with g1 but leaves the traces like lines from # protein sequence till # W: 14. This would make the gff3 file untidy.
Dear @arunprasanna83,
Have you fixed your question? I got same problem. I used some combination of grep but it does not work on my case. Take an example as below. I want to remove all lines below from the original gff3 file, but there is only transcript id information of novel_model_471_5f349842 in the list (if not check by looking at the original gff3, I can't know the parent is novel_gene_467_5f349842). So, I am looking for a tool to remove all the feature lines based on transcript ids, which is similar to yours. If you have any answer about your question, would you mind let me know?
Thanks a lot!
Best,
Xiaofei
Here is the lines in annotation file:
chrUn . gene 30387395 30387595 . + . ID=novel_gene_467_5f349842;Name=%2A%2A%20NO%20NAME%20ASSIGNED%20%2A%2A chrUn . mRNA 30387395 30387595 . + . ID=novel_model_471_5f349842;Parent=novel_gene_467_5f349842;Name=%2A%2A%20NO%20NAME%20ASSIGNED%20%2A%2A chrUn . exon 30387395 30387595 . + . ID=novel_model_471_5f349842.exon1;Parent=novel_model_471_5f349842 chrUn . CDS 30387395 30387595 . + 0 ID=cds.novel_model_471_5f349842;Parent=novel_model_471_5f349842