Hey,
I have bulk mouse RNA-seq data from an external lab and am hoping to perform an analysis on the TCR-mapping reads contained within. The library preparation pipeline used resulted in 3 FASTQ files, two 75bp paired read files (R1 and R3), and a UMI file (R2).
I was hoping to use MixCR for this, however the MixCR pipeline does not incorporate the UMI information. One approach would be to deduplicate the FASTQs, followed by MixCR analysis. I've seen other posts recommending against FASTQ level deduplication, but I feel like it may be the best option here.
The external lab has provided mapped and unmapped BAM files which have been corrected for duplicates (corrected during mapping), I have attempted BAM>FASTQ conversion, and TCR pipelines that can use BAM as input, but with no success (the BAM files themselves seem rather non-canonical).
Any suggestions for FASTQ level deduplication tools that will accept 3 FASTQs in that format, or alternative solutions to my issue, would be greatly appreciated!
Thanks in advance! Gordon
You can check fgbio toolkit, specifically the Annotate BAM with UMIs function. It is a command line toolkit that will add UMI information from a fastq to a BAM file. Tool description from its page:
After this, you can use CallMolecularConsensusReads function to call consensus reads from the above BAM.