Question

Removing umi split off as a separate fastq (RNA-seq)

0

Entering edit mode

22 months ago

YJ • 0

Hi,

I used x-gen udi/umi adaptors from idt to generate my RNA-seq samples and ran my single end RNA-seq experiment. I received two fastq files for each sample: R1 from my 100bp SE run and R2 for 9bp UMI sequence split off. I normally analyze my RNA-seq experiments using STAR aligner to transcriptome and expression calculation with RSEM. I would like to incorporate umi-based deduplication into this step.

I've tried a few methods.

I ignored R2 and used umi_tools extract with --bc-pattern NNNNNNNNN as instructed on the website and followed up with STAR alignment and umi dedup. In this case, I obtained deduplicated files but my file size was reduced to 1/20 of original size.
I tried to convert my R1 fastq file into unmapped bam by using picard fastqtosam function. I incorporated UMIs from fastq by using fgbio annotatebamwithumis function. I converted ubam with UMIs marked with RX back to fastq and at this point I was able to see all my UMIs tagged with RX in bam file. Then I proceeded with STAR alignment to transcriptome. After alignment, I ran umi dedup with command --extract-umi-method=tag, --tag=RX. However, then I get a warning message that at least one read is missing umi and/or cell tag and I'm left with much smaller file size compared to original bam file.

Does anyone have a experience with this situation? I guess I can also try picard markduplicates with REMOVE DUPLICATES=TRUE option instead of umi dedup, but I'm concerned that I'm losing a big chunk of file. I would like to stick to already established STAR-RSEM pipeline as much as possible. I would appreciate any help! Thank you very much in advance!

umi_tools RNA-seq STAR • 1.6k views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 22 months ago by YJ • 0

0

Entering edit mode

For specialized kits like this you should follow the recommendations from IDT to analyze the data (Appendix G Here) You may be doing this already but wanted to check.

ADD REPLY • link 22 months ago by GenoMax 147k

0

Entering edit mode

Interestingly that manual doesn't mention UMIs.....

ADD REPLY • link 22 months ago by i.sudbery 20k

0

Entering edit mode

They are extended adapters that can be used with xGEN RNA kit I think: https://www.idtdna.com/pages/support/faqs/how-do-i-sequence-the-umi-in-the-xgen-udi-umi-adapters

ADD REPLY • link 22 months ago by GenoMax 147k

score 1 · Answer 1 · 2023-01-10

Here is what I would do:

You need to use umi_tools to extract the UMI from read2 and add it to the header of read1. To do this use umi_tools extract like follows:

$ umi_tools extract --stdin=R2.fastq.gz --read2-in=R1.fastq.gz --stdout=discard.fastq --read2-out=map_this_one.fastq.gz --bc-pattern=NNNNNNNNN

You can then follow up with STAR and umi_tools dedup. Note that you will see a decrease in the size of your BAM as duplicates are removed.

Alternatively, your second approach should have worked, but the option is --umi-tag not --tag. You can find reads that don't have an umi tag as follow:

$ samtools view mapped_reads.bam | grep -v 'RX:'