Extracting UMI sequences from paired-end reads
1
0
Entering edit mode
7.4 years ago
Javad ▴ 150

Hello,

I have a paired end fastq file and my experiment is designed in a way that each PAIRED READ has ONE barcode in it. This barcode might be on the forward read or on the reverse read (not both of them) and the barcode has a specific sequence before and after it that helps to identify the barcode. So, some of the forward reads have the barcode and some of them do not. This is also true for reverse reads.

The sequence of barcode is located at the beginning of reads (of course after pre processing trimming) and it is like this: GTC NNN NNN G

Does any one know a reliable tool for extraction of UMI sequences in this experimental design and quantifying the number of unique UMIs aligned to each gene?

Thanks in advance

RNA-Seq next-gen • 6.4k views
ADD COMMENT
0
Entering edit mode

See: UMI-Tools 0.5, now with tools for cell barcoded scRNA-seq

Please do not forget to upvote/accept or otherwise validate answers you receive in threads that you create. That is the only way we can know if any/all solutions suggested worked or not.

ADD REPLY
0
Entering edit mode

seems like I was a minute late if you deem fit please retire my answer. Thanks

ADD REPLY
0
Entering edit mode
7.4 years ago
ivivek_ngs ★ 5.2k

I reckon these are scRNA-Seq data, you should probably also mention the nature of the sequencing chemistry if its 10x Chromium, Fluidigm etc since some of them do have such issues.

However coming to your answer, @ian Sudbury and Tom Smith at CGAT Oxford has provided a pretty handy python script to extract UMI information from both SE and PE data along with an optional way of extracting cell barcodes. Check below

Extract UMI from Fastq

ADD COMMENT
0
Entering edit mode

Thanks for your answer. I have already tried their tool but the problem is when it comes to paired end reads, UMI-tools needs barcodes on both reverse and forward reads which is not the case in my experiment. Unfortunately, Their tool is not appropriate for this type of experiment design.

ADD REPLY
0
Entering edit mode

See some other possibilities here: Tools for demultiplexing a large fastq file based on random in-line barcodes and a tool called sabre.

Please be specific about where the barcode is expected to be in your reads. I suggest that you edit the original post and add this information there since it is critically important.

ADD REPLY
0
Entering edit mode

Thanks. The post is updated.

ADD REPLY

Login before adding your answer.

Traffic: 1534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6