Entering edit mode
7.5 years ago
ognjen011
▴
290
I have recently dabbled into the barcoding of reads, and it seems I don't understand many basic concepts that I need for advanced understanding. I started by downloading a SRA from a paper with Unique Molecular Identifiers (UMI), but it didn't have any observable UMIs. My questions:
- Am I correct to assume that if most reads are fully matched, my reads do not have adapters?
- SRA has an option to generate read name by adding $sg for spot group (barcode) (as shown here https://edwards.sdsu.edu/research/fastq-dump-options/). In this case it is only 12 bases long. What is this barcode EXACTLY? Standard Illumina adapters seem to be a bit longer than that, and this should also contain an UMI. How come this sequence is so short?
- Who trimmed these adapters and why? Is this customary in storing reads in archives?
I apologize if the questions seem trivial, but I couldn't find the answers anywhere.
EDIT: The paper is here, samples are from this study https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4907374/
Define what you mean by "fully matched" in this context.
e.g. 96M or any situation where no soft clipping occurs
The UMIs aren't part of the read in these datasets, they're part of the barcodes and multiple different strategies were used throughout the paper (look at the methods). You'll find the UMIs in the lines with the read names.
Yes, as I said, I obtained them as part of the read name by explicitly requesting them with SRA fastq-dump. I was hoping to understand these barcodes on this example, so I can use this particular example.
I take that back, part of the UMIs are part of the reads in at least some of the cases described in the paper. It's completely unclear if they've moved the UMI to be part of the barcode (this is the only explanation for it being 12 bases, since the index read was apparently 8 bases), it would have been nicer if they'd just provided the fastq files as they came off the machines without screwing with the read names and such.
Exactly. I really wasn't lazy, I read the whole paper, I fiddled around with the barcodes, but I can't get to the bottom of this. Thank you for taking a look though!
Yeah, they made a real mess out of that upload. I suspect you'll need to contact the authors. Sorry I don't have better news there :(
Unfortunately, I did that two weeks ago as well, but no reply from either corresponding author. I try to exhaust all the options before wasting the community time. Thanks, though!