just an observation, always use the name of the tool in any announcement, mention etc. Helps establishing context. I will edit the title to adhere to this.
Hi, thank you for introduction miTCR. I analyzed the TCR data using the miTCR software and found it very powerful and useful. but now I meet some difficulties when I run the miTCR software, when I input the TCR fastq file., the error occurred in the analysis pipeline(java.Lang.runtimeException :Error while parsing quality). I try to change the phred33 or 64, but it is no use.I feel confused about this situation, I will deeply appreciate it if you could give me some useful advice.
MiTCR accepts Phred quality scores in 0-40 range. New HiSeq runs produce quality values up to 50 Phred, which I'm pretty sure is the issue. You can manually fix those files by replacing all quality values above 40 by 40 (see this script for example https://github.com/mikessh/mageri-paper/blob/master/processing/FixQual.groovy).
Sorry for late reply.
What kind of data are you analyzing? T-cell libraries or immunoglobulins?
For analysis MiTCR requires V/J reference sequences, which are compiled from IMGT data. It is not quite straightforward, due to great complexity of IMGT database organization. So currently only TRa/b for human and mouse are supported. There will be a patch with TRgamma/delta available soon.
Full functionality for a spectrum of species is currently being developed, and it will be available within a new tool (also supporting IGh/k/l).
Still looking at IMGT database I see that sequences for pig only partially exist (for IGh/k/l and TRa-J&TRd-J). If you could share your reference germline sequences for V/J segments with marked conserved Cys/Phe/Try residues this could help to speed it up.
Hello, I want to use MiTCR to analysis my IGH data. Is it possible to integrate those reference into MiTCR or provide a parameter of reference file of fasta? Thanks for your attention!
Hello! The MiTCR software doesn't allow the integration of Immunoglobulin loci, as its internal search algorithm is not prone hypermutations and we could not guarantee optimal performance. We're now working on a software tool that could be used for high-throughput full-length antibody sequencing and have the performance characteristics similar or better than MiTCR. I will announce it upon release, which would happen in several months.
To analyze your IGH data you can use our recent MiGEC software, see this post. While its scope is a little bit different (it works with unique molecular identifier-tagged data), it provides fast IGH CDR3 extraction and V/J determination.
If you need whole-length analysis with hypermutations, you can use the wrapper for IgBlast software by NCBI, which is available here. This one is somewhat slower and less-documented.
Please let me know if you'll have any problems/questions during the analysis. In this case please also describe your library structure.
Hi Mikhail - excuse my ignorance, but what is the exact interpretation of the tilde (~) character in the amino acid sequence of the CDR3 regions I extract using mitcr? All the best, A.
Indeed the CDR3 amino acid sequence is mostly meaningless here. However when looking at your data tables manually it can sometimes help to see sequencing errors, frameshift hypermutations in case of antibody data, etc. So consider this as an aesthetic for CDR3aa column.
Please let me know of any problems with the pipeline. Also keep in mind that IgBlast is relatively slow, so basically there are two cases when you want to use it: 454 and MiSeq 300bp paired-end data.
Hi! I want to compile reference sequences of chicken TCR to to use MiTCR or IgBlast wrapper. Would you like to provide the detailed methods for that and the command-line usage? Thank you!
Hello. Not quite understood your question. There are currently insufficient references in IMGT to extract CDR3 region for chicken TCRs, actually only Joining segments of TCRalpha chain are present there (http://www.imgt.org/IMGTrepertoire/index.php?section=LocusGenes&repertoire=genetable&species=Chicken&group=TRAJ). If you can provide me with a list of Variable and Joining segment sequences for TCR chain you're interested in I could try to compile them to database.
The currently available references are listed here, the ones that have "1" in last ("VJ") column
Thank you very much for your reply! I am analyzing NGS sequences of chicken and duck TCRB surrounding CDR3. I want to use MiTCR or IgBlast wrapper to extract CDR3s. Would you please help compile the chicken and duck Variable and Joining segment sequences into database and pack it into a MiTCR jar software? Besides, when I use IgBlast wrapper, part of the 3'ends of the CDR3 sequences are not accurate and do not end with an F. And the count of the clonotypes is smaller than the true number. Should I change the files in data/internal_data into sequences of chicken and duck? Thank you!
Hiļ¼thank you for introduction miTCR. I have found it very powerful and useful. But I am analyzing TRB sequences of Macaca mulatta, which miTCR don't support this species.I want to analyse CDR3 region by MiTCR. Would you please compile the TRB gene of Macaca mulatta into miTCR? All TRB genes are in IMTG database.
I am also aware of MiGEC through this post and found it can analyse TRB sequences of Macaque. Your library preparation method and analysis relies on unique molecular identifier tags(UMIs). Does MiGEC support data without UMIs, because my library don't have them? Thank you.
By the way, if MiTCR can analyse sequences of Macaques, what are the differences of functions of and results from MiTCR and MiGEC?
As I've already mentioned, there is a problem with adding new species into MiTCR, as those are somewhat hard-coded into the binaries. The problem originates from the way the IMGT database is organized, i.e. all those IMGT-gaps, etc, while it would be far better just to provide feature (CDR3 start, CDR2, CDR1, ..) coordinates. So adding new features require a high amount of manual work.
Indeed MiGEC has all species/receptor chains from IMGT, for which both V and J segments are available. MiGEC could be separately used for tasks like de-multiplexing, read overlapping and CDR3 extraction/clonotype assembly. You just should use CdrBlast module as is, check out this readme section and command line help by running java -jar migec.jar CdrBlast -h. MiGEC is slower than MiTCR as it was designed to handle BCR sequences containing hypermutations, but still faster than any alternatives (no problem to process a hiseq lane on a commodity hardware). The results are highly consistent between these tools.
So you can give it a try, and tell me if everything works fine.
We are sequencing TCR of bovine samples and interested to use miTCR. Is it possible to replace the in-built human and mouse database with the bovine database? We are making the bovine database. One of my colleague has done some reverse engineering of the tool and found the database. We were analysing it and saw that conserved Cys and Phe information is bit confusing. The Cys position is where the Cys codon starts. But while looking at the J gene, we found that the position is -2 nucleotides shifted where the Phe codon starts. We are not sure about the position the algorithm will use to mark down the CDR3 flanking amino acids. If you can highlight us few tips, it would be very helpful.
It is quite hard to re-assemble the binary database file for MiTCR, but I can easily add your references to MiGEC/CdrBlast (https://github.com/mikessh/migec, MIGEC: towards error-free profiling of immune repertoires) as they are stored in tabular format there. It has some data for Bos Taurus, but it is incomplete. If you have both V and J references for your chain of interest (say, TRB), you can mail them to me (my nickname at biostars at gmail.com) and I'll incorporate them into new release.
As far as I recall, the convention for Cys/Phe reference point was the coordinate first base after conserved Cys/the coordinate of first base before Phe, 0-based.
The user manual of mitcr has showed the pipeline for single-end read analysis, as shown:
mitcr -pset flex in.fastq.gz result txt
or $mitcr <options> <input file name> <output file name>
But I want to use paired-end file (which is separated into R1.fastq and R2.fastq) as input, could you please show me how to do that? as I know mitcr can also perform analysis on Illumina output.
The usage depends on the structure of your library:
Usually after de-multiplexing one gets oriented reads, in this case only the FASTQ file that contains CDR3 should be specified
In some cases one want to overlap reads. Note that it is not the best practice when CDR3 is spread among both reads, but with most recent protocols (i.e. Illumina HiSeq 150+150) it is not a problem to read the entire CDR3. However, for some protocols a read-through situation could occur, so CDR3 is fully present in both reads. In such case one can either overlap reads or proceed to 3)
If the library is non-oriented, you can just concatenate FASTQ files
just an observation, always use the name of the tool in any announcement, mention etc. Helps establishing context. I will edit the title to adhere to this.
Thanks a lot for correction
Hi, thank you for introduction miTCR. I analyzed the TCR data using the miTCR software and found it very powerful and useful. but now I meet some difficulties when I run the miTCR software, when I input the TCR fastq file., the error occurred in the analysis pipeline(java.Lang.runtimeException :Error while parsing quality). I try to change the phred33 or 64, but it is no use.I feel confused about this situation, I will deeply appreciate it if you could give me some useful advice.
Hello!
MiTCR accepts Phred quality scores in 0-40 range. New HiSeq runs produce quality values up to 50 Phred, which I'm pretty sure is the issue. You can manually fix those files by replacing all quality values above 40 by 40 (see this script for example https://github.com/mikessh/mageri-paper/blob/master/processing/FixQual.groovy).
i just try this software and found useful. but how can i change my "overrides target species"? i want to do some analysis on pig. how can i do that?
Sorry for late reply. What kind of data are you analyzing? T-cell libraries or immunoglobulins? For analysis MiTCR requires V/J reference sequences, which are compiled from IMGT data. It is not quite straightforward, due to great complexity of IMGT database organization. So currently only TRa/b for human and mouse are supported. There will be a patch with TRgamma/delta available soon. Full functionality for a spectrum of species is currently being developed, and it will be available within a new tool (also supporting IGh/k/l). Still looking at IMGT database I see that sequences for pig only partially exist (for IGh/k/l and TRa-J&TRd-J). If you could share your reference germline sequences for V/J segments with marked conserved Cys/Phe/Try residues this could help to speed it up.
Hello, I want to use MiTCR to analysis my IGH data. Is it possible to integrate those reference into MiTCR or provide a parameter of reference file of fasta? Thanks for your attention!
Hello! The MiTCR software doesn't allow the integration of Immunoglobulin loci, as its internal search algorithm is not prone hypermutations and we could not guarantee optimal performance. We're now working on a software tool that could be used for high-throughput full-length antibody sequencing and have the performance characteristics similar or better than MiTCR. I will announce it upon release, which would happen in several months.
To analyze your IGH data you can use our recent MiGEC software, see this post. While its scope is a little bit different (it works with unique molecular identifier-tagged data), it provides fast IGH CDR3 extraction and V/J determination.
If you need whole-length analysis with hypermutations, you can use the wrapper for IgBlast software by NCBI, which is available here. This one is somewhat slower and less-documented.
Please let me know if you'll have any problems/questions during the analysis. In this case please also describe your library structure.
Hi Mikhail - excuse my ignorance, but what is the exact interpretation of the tilde (~) character in the amino acid sequence of the CDR3 regions I extract using mitcr? All the best, A.
~
indicates a frameshift. In case of a frameshift the V -> J and J -> V translations are performed, the central incomplete codon is marked as~
Thanks. How should we interpret this though, I guess we don't expect a functional TCR product from such a sequence that contains an incomplete codon?
Indeed the CDR3 amino acid sequence is mostly meaningless here. However when looking at your data tables manually it can sometimes help to see sequencing errors, frameshift hypermutations in case of antibody data, etc. So consider this as an aesthetic for CDR3aa column.