Problem: "Pair-end" reads scRNA seq data (Drop-seq)
2
0
Entering edit mode
6.0 years ago

In case of Drop-seq, we have paired end data.

Read 1: Cell code + UMI (unique molecule identifier)

Read 2: The transcript information

But I have a problem/doubt with the sample I am working on.

The sample I am using is the following:

https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR6261587
(Check the "Reads" tab)

As you know the Drop-seq is "paired-end", we are expected to see two reads for a spot. Although this sample say paired-end, it has only one read per spot.

For example I can share a link of a different scRNA-seq data where you can properly see two reads for a spot

Example sample:

https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8086553 (Check the "Reads" tab)

Where I am going wrong?

RNA-Seq next-gen alignment • 1.8k views
ADD COMMENT
2
Entering edit mode
6.0 years ago

The submitters screwed up and didn't submit the second read. I checked to see if the submitters for some bizarre reason interleaved the two reads together in the one fastq file, but they didn't. If you want the whole data set, you'll have to contact the submitters directly.

ADD COMMENT
0
Entering edit mode

I will try to get the raw paired-end fastq data files.

ADD REPLY
1
Entering edit mode
6.0 years ago
paulranum11 ▴ 80

Hmmmm very strange. Here is a link to the European Nucleotide Archive where you can download their raw .fastq archived dataset from GEO FASTQ Download Location. Like you said it looks like they only archived one file... I don't know why this would be the case if they truly used paired end sequencing. Is it possible they mistakenly wrote "paired end sequencing" and actually performed single-end sequencing?

My next step would probably be to download the raw data and see how it looks and/or to email the authors and ask for clarification. Good luck!

ADD COMMENT
0
Entering edit mode

Drop-seq experiments involve paired end sequencing. I am also sure that they have done paired end sequencing. Looks like they might have done some processing and submitted the data. But I couldn't not figure out how much they have processed the data. I also wonder why they didn't submit the proper raw reads. I might be missing something as well.

I asked one of the main authors of the paper. The following is the reply I got :

"I recommend that you download the aligned BAM files that are hosted in the same GEO record. Read 1 is already processed into the cell and UMI barcodes and held as custom tags (XC and XM) in the BAM files. The cells are already barcode-corrected, so if you use those files, your cell barcodes will line up with mine; if you start from FASTQs, they will not. For most aligners, you can just use the BAM file as input to realign. (It has all reads, even those that did not align.)"

But I could not find any "XM" or "XC" keywords in the bam file :(

To understand his reply you have to be familiar the processing steps of the Drop-seq: Link: https://github.com/broadinstitute/Drop-seq/files/2425535/Drop-seqAlignmentCookbookv1.2Jan2016.pdf

Looks like they have submitted some kind of processed data. I could not figure out how much the data is processed. I am trying to use the data starting at different processing steps but I could not figure out how much the data is processed.

ADD REPLY

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6