Entering edit mode
14 months ago
PamCraven
•
0
Hi there, Is there any way to detect just the Fab region of an antibody using a protein fasta file and Biopython? I was originally considering just using regex for finding conserved regions of the hinge and cutting there but was wondering if there was a feature in Biopython that can do this already? Or is there a less heavy handed way to achieve this? We are mainly looking at IgGs and interested in only the Fab region for analysis. Thanks!
There's nothing specialized for antibody sequences in Biopython, and I'm not sure what tool might do that for you out of the box. You could definitely just do pairwise alignments to the conserved sequence and go from there -- Biopython has very flexible features for custom pairwise alignments -- though you could also try using more specific antibody software.
In the context of sequences (versus the Fab/Fc in the resulting proteins), would you frame it as looking for the variable region through the first domain of the constant region? (Did I get that right? I'm fuzzy on the details of what goes on in the constant region.) You could consider using IgBLAST and extracting the V(D)JC details to start with. There's a command-line version that works pretty well once you get the databases and such set up right. If you're working with one of the species with built-in support it's not too bad. I'd suggest using
-outfmt 19
for AIRR-formatted TSV and then extracting whatever columns you need.But then, I think you'd still need to separate the parts of the constant region to finish it. IMGT publishes separate sequences for the domains for each constant gene (for human, see these diagrams and these links) but I've gotten in way over my head trying to understand what's going on there between different classes. Any chance that helps?