There is a genome fasta file having say 10 chromosomes and 1000 scaffolds. After gene prediction there are genes/mRNA/cds etc prediction in chromosomes as well as scaffolds.
Now, a particular requirement is to get genome fasta as well as gff corresponding to chromosomes alone.
For genome , the below was executed with a list of chromosomes and it worked fine.
faSomeRecords genome.fasta chrs.list genome.chrs-only.fasta
But for the gff3, I would like to know the better ways/tools to include or exclude entries from a list.
Could manage to filter for using grep -w -f chrs.list
But I would prefer to use a more robust and recommended method of doing this filtering.
Please advise.
Try AGAT toolkit: https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_feature_by_attribute_presence
In fact AGAT was the first place I looked for, for the requirement. That particular script seemed to be based on 9th column. Whereas, the requirement is on the first colum, which infact should be much simpler. But unfortunately I still could not land on the correct tool for the purpose.
Just thinking but can bedtools intersect work ? It takes in a gff3 file. They bed can be
chrname 0 chrlen
of required chromosomes.Thanks. That should work.
@Juke34 Thank you. I was looking for such a straight forward way indeed.