Question

Too Many Protein Sequences To Align, Advice?

0

Entering edit mode

3.1 years ago

Jant • 0

I'm currently trying to do a multiple sequence alignment for a protein to look for conserved regions to later test any surface exposed ones for immunogenicity, but the protein in question has over 19 thousand entries in UniProt. I can't seem to find any tool that can do an MSA for this many and also show me the conserved regions, although I'm very new to bioinformatics so very possible I'm just missing things. Is there either a tool I can use for this (ClustalW on Galaxy gave up after about two days of processing, and that was the only thing I found so far that even accepted the fasta file) or is there any way I can somehow heavily trim down the number of sequences I'm aligning without losing any important information? Thanks in advance.

Fasta MSA • 1.4k views

ADD COMMENT • link updated 3.1 years ago by Andrzej Zielezinski 11k • written 3.1 years ago by Jant • 0

score 2 · Answer 1 · 2022-03-10

2

Entering edit mode

3.1 years ago

Andrzej Zielezinski 11k

I think Clustal Omega (not ClustalW) should easily handle 19,000 proteins. Alternatively, you can reduce the number of query proteins by clustering highly similar sequences (for example using CD-HIT).

ADD COMMENT • link 3.1 years ago by Andrzej Zielezinski 11k

0

Entering edit mode

Assuming I'm looking at the right place ( https://www.ebi.ac.uk/Tools/msa/clustalo/ ), it, unfortunately, can't accept over 4000 sequences or over 4MB in file size. The fasta file I'm trying to submit exceeds both of those by a fair bit unfortunately and it won't run it. I'll give CD-HIT a look and see how that works for me, though, thank you!

ADD REPLY • link 3.1 years ago by Jant • 0

1

Entering edit mode

You will need to download Clustal Omega and run it from the command line.

ADD REPLY • link 3.1 years ago by Andrzej Zielezinski 11k

0

Entering edit mode

It required some help from a friend who is much more tech-savvy than me, a person who had trouble even finding the button to make a post here, but I have gotten it working and got an alignment! However, I...don't actually know how to then get the conserved regions from it. I have tried using the MSA Viewer at NCBI but that choked and died on it, I've also tried a program called Gblocks that seemed promising but it just choked and died even faster. I'm starting to feel like I've bitten off more than I can chew on this.

ADD REPLY • link 3.1 years ago by Jant • 0