Hello
I am trying to reconstruct the heavy chain and the light chain from two DNA sequences. I run for both purposes the command provided by IgBlastn:
database="my_path/to/db/"
optional_file="my_path/to/optional_file/"
igblastn -germline_db_V $database/human_igh_v -germline_db_D $database/human_igh_d -germline_db_J $database/human_igh_j -auxiliary_data $optional_file/human_gl.aux -domain_system imgt -ig_seqtype Ig -organism human -outfmt '7 std qseq sseq btop' -query example.fa -out example.fmt7
when I compare the results for the heavy chain with IgBlastn from the website tool here https://www.ncbi.nlm.nih.gov/igblast/igblast.cgi the results are identical. For the light chain however they are different. What could be the reason? The database was created following the suggestions here:
https://changeo.readthedocs.io/en/version-0.3.11---igblast-junction-fix/examples/igblast.html.
Output from command line for an example of light chain (the heavy chain is correct so i will skip those output):
# IGBLASTN 2.5.1+
# Query: RL0575_B2_no210_RL0575_B2_positive_LC
# Database: /site/ne/home/i0439277/statistical_analysis/sequences_basic/Primer_Cocktail/blastn/kleinstein-immcantation-4425cb7a6101/scripts/database//human_igh_v /site/ne/home/i0439277/statistical_analysis/sequences_basic/Primer_Cocktail/blastn/kleinstein-immcantation-4425cb7a6101/scripts/database//human_igh_d /site/ne/home/i0439277/statistical_analysis/sequences_basic/Primer_Cocktail/blastn/kleinstein-immcantation-4425cb7a6101/scripts/database//human_igh_j
# Domain classification requested: imgt
# V-(D)-J rearrangement summary for query sequence (Top V gene match, Top J gene match, Chain type, stop codon, V-J frame, Productive, Strand). Multiple equivalent top matches having the same score and percent identity, if present, are separated by a comma.
IGHV3-47*01,IGHV3-47*02 N/A VL No N/A N/A +
# V-(D)-J junction details based on top germline gene matches (V end, V-J junction, J start). Note that possible overlapping nucleotides at VDJ junction (i.e, nucleotides that could be assigned to either rearranging gene) are indicated in parentheses (i.e., (TACT)) but are not included under the V, D, or J gene itself
ATTGT N/A N/A
# Alignment summary between query and top germline V gene hit (from, to, length, matches, mismatches, gaps, percent identity)
FR3-IMGT 240 276 37 27 10 0 73
Total N/A N/A 37 27 10 0 73
# Hit table (the first field indicates the chain type of the hit)
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, gaps, q. start, q. end, s. start, s. end, evalue, bit score, query seq, subject seq, BTOP
# 3 hits found
V RL0575_B2_no210_RL0575_B2_positive_LC IGHV3-47*01 72.973 37 10 0 0 240 276 249 285 0.098 28.3CAGCCTCCAGTCTGAGGATGAGGCTGACTATTATTGT CAGCCTGATAGCTGAGGACATGGCTGTGTATTATTGT 6CGCAATGATG7TCGAAT5ATCG9
V RL0575_B2_no210_RL0575_B2_positive_LC IGHV3-47*02 72.973 37 10 0 0 240 276 249 285 0.098 28.3CAGCCTCCAGTCTGAGGATGAGGCTGACTATTATTGT CAGCCTGATAGCTGAGGACATGGCTGTGTATTATTGT 6CGCAATGATG7TCGAAT5ATCG9
V RL0575_B2_no210_RL0575_B2_positive_LC IGHV3-30-2*01 80.000 25 5 0 0 196 220 111 135 0.85 25.2TTCTCAGGCTCCAGTTCTGGGGCTG TTCCCAGGCTCCAGGGAAGGGGCTG 3TC10TGTGCATA7
Total queries = 1
Total identifiable CDR3 = 0
Total unique clonotypes = 0
# BLAST processed 1 queries
Output from web tool for the same light chain above (the heavy chain is correct so i will skip those output):
Database: imgt.Homo_sapiens.V.f.orf.p; imgt.Homo_sapiens.D.f.orf;
imgt.Homo_sapiens.J.f.orf
600 sequences; 158,627 total letters
Query= RL0575_B2_no210_RL0575_B2_positive_LC
Length=334
Score E
Sequences producing significant alignments: (Bits) Value
IGLV4-69*01germline gene 391 7e-111
IGLV4-69*02germline gene 388 6e-110
IGLV4-60*03germline gene 310 2e-86
IGLJ1*01germline gene 66.1 5e-15
IGLJ6*01germline gene 35.3 8e-06
IGLJ2*01germline gene 29.5 5e-04
Domain classification requested: imgt
V-(D)-J rearrangement summary for query sequence (multiple equivalent top matches, if present, are separated by a comma):
Top V gene match Top J gene match Chain type stop codon V-J frame Productive Strand
IGLV4-69*01 IGLJ1*01 VL No In-frame Yes +
V-(D)-J junction details based on top germline gene matches:
V region end V-J junction* J region start
CATTC C TGTCT
*: Overlapping nucleotides may exist at V-D-J junction (i.e, nucleotides that could be assigned
to either rearranging gene). Such nucleotides are indicated inside a parenthesis (i.e., (TACAT))
but are not included under the V, D or J gene itself.
Sub-region sequence details:
Nucleotide sequence Translation Start End
CDR3 CAGACCTGGGGCACTGGCATTCCTGTC QTWGTGIPV 277 303
Alignment summary between query and top germline V gene hit:
from to length matches mismatches gaps identity(%)
FR1-IMGT 5 75 71 67 4 0 94.4
CDR1-IMGT 76 96 21 20 1 0 95.2
FR2-IMGT 97 147 51 43 8 0 84.3
CDR2-IMGT 148 168 21 20 1 0 95.2
FR3-IMGT 169 276 108 100 8 0 92.6
CDR3-IMGT (germline) 277 298 22 22 0 0 100
Total 294 272 22 0 92.5
Any raccomandation is really appreciated.
The genomes were not configured in the appropriate way. Try to configure the genomes as per the instruction.