inclusion or exclusion in gff by list of chromosomes/scaffolds
1
0
Entering edit mode
4.3 years ago
Jeffin Rockey ★ 1.3k

There is a genome fasta file having say 10 chromosomes and 1000 scaffolds. After gene prediction there are genes/mRNA/cds etc prediction in chromosomes as well as scaffolds.

Now, a particular requirement is to get genome fasta as well as gff corresponding to chromosomes alone.

For genome , the below was executed with a list of chromosomes and it worked fine.

faSomeRecords genome.fasta chrs.list genome.chrs-only.fasta

But for the gff3, I would like to know the better ways/tools to include or exclude entries from a list.

Could manage to filter for using grep -w -f chrs.list

But I would prefer to use a more robust and recommended method of doing this filtering.

Please advise.

gff • 1.5k views
ADD COMMENT
0
Entering edit mode

In fact AGAT was the first place I looked for, for the requirement. That particular script seemed to be based on 9th column. Whereas, the requirement is on the first colum, which infact should be much simpler. But unfortunately I still could not land on the correct tool for the purpose.

ADD REPLY
0
Entering edit mode

Just thinking but can bedtools intersect work ? It takes in a gff3 file. They bed can be chrname 0 chrlen of required chromosomes.

ADD REPLY
0
Entering edit mode

Thanks. That should work.

ADD REPLY
0
Entering edit mode

@Juke34 Thank you. I was looking for such a straight forward way indeed.

ADD REPLY
3
Entering edit mode
4.3 years ago
Juke34 8.9k

You can use agat_sq_keep_annotation_from_fastaSeq.pl from AGAT.

The prerequisite is to prepare a fasta file containing only the chromosomes

ADD COMMENT

Login before adding your answer.

Traffic: 1501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6