Retrieve specific fasta sequences from a group of assemblies
0
0
Entering edit mode
22 months ago
SushiRoll ▴ 140

Hi all,

Sorry if this question has been addressed before but I haven't been able to find a solution to this. I have a lot of assemblies (around 800) and I would like to retrieve the fasta sequence for a specific housekeeping gene which should (in theory) be present in all of them. Is there any tool that can take the fasta assembly as input and retrieve a specific gene with certain % variation to retrieve the gene even if it has mutations? Alternatively it could take a gbk or gff3 as input and use the gene annotation as retrieval criterion.

Thanks a lot!

CDS gene sequence • 946 views
ADD COMMENT
1
Entering edit mode

Don't know if there is a ready made tool. You will need to align the gene to your assemblies and then it is a matter of parsing the results and retrieving the sequence you need using samtools faidx and similar options.

ADD REPLY
0
Entering edit mode

Great, I'll give it a shot.

Thanks!

ADD REPLY
1
Entering edit mode

Personally, I never worked on similar tasks and thus unfortunately can't provide you with a polished solution, but what you are trying here is to find orthologous genes. Using this keyword, you should find tools suitable for this task, e.g. OrthoFinder showed up in a quick search.

ADD REPLY
0
Entering edit mode

Thanks Matthias, that's a great starting point, I'll check what's out there.

ADD REPLY

Login before adding your answer.

Traffic: 2939 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6