Question

FASTQ level UMI correction for TCR analysis

0

Entering edit mode

3.9 years ago

gordo2b ▴ 10

Hey,

I have bulk mouse RNA-seq data from an external lab and am hoping to perform an analysis on the TCR-mapping reads contained within. The library preparation pipeline used resulted in 3 FASTQ files, two 75bp paired read files (R1 and R3), and a UMI file (R2).

I was hoping to use MixCR for this, however the MixCR pipeline does not incorporate the UMI information. One approach would be to deduplicate the FASTQs, followed by MixCR analysis. I've seen other posts recommending against FASTQ level deduplication, but I feel like it may be the best option here.

The external lab has provided mapped and unmapped BAM files which have been corrected for duplicates (corrected during mapping), I have attempted BAM>FASTQ conversion, and TCR pipelines that can use BAM as input, but with no success (the BAM files themselves seem rather non-canonical).

Any suggestions for FASTQ level deduplication tools that will accept 3 FASTQs in that format, or alternative solutions to my issue, would be greatly appreciated!

Thanks in advance! Gordon

UMI fastq TCR RNA-Seq Mixcr • 1.3k views

ADD COMMENT • link 3.9 years ago by gordo2b ▴ 10

0

Entering edit mode

You can check fgbio toolkit, specifically the Annotate BAM with UMIs function. It is a command line toolkit that will add UMI information from a fastq to a BAM file. Tool description from its page:

Annotates existing BAM files with UMIs (Unique Molecular Indices, aka Molecular IDs, Molecular barcodes) from a separate FASTQ file. Takes an existing BAM file and a FASTQ file consisting of UMI reads, matches the reads between the files based on read names, and produces an output BAM file where each record is annotated with an optional tag (specified by attribute) that contains the read sequence of the UMI.

After this, you can use CallMolecularConsensusReads function to call consensus reads from the above BAM.

ADD REPLY • link 3.9 years ago by sysboolean ▴ 90