Question

barcode in the fastq files

1

Entering edit mode

2.6 years ago

Wakala ▴ 20

Hello everyone, I have two questions to ask for help about barcode in the fastq files:

I want to know where the barcode is in the fastq file
how can I get the barcode length

Thank you very much

For example:

@SRR15999465.1 GCGGATCGATGATACGCCGTAG:K00168:267:HCYCLBBXY:7:1101:21684:1156 length=51
NGGATACTAGGAGGAGTATTGATAACTGCCATTCATGGAACACCTGTGAAT
+SRR15999465.1 GCGGATCGATGATACGCCGTAG:K00168:267:HCYCLBBXY:7:1101:21684:1156 length=51
#AAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJA<FJJ

RNA-seq Single cell • 2.9k views

ADD COMMENT • link 2.6 years ago by Wakala ▴ 20

1

Entering edit mode

Can you elaborate a bit? What kind of single cell platform do you use? What was the sequencing method?

ADD REPLY • link 2.6 years ago by Bertalan_Takacs ▴ 100

0

Entering edit mode

scATAC,Illumina NextSeq 500 and Illumina HiSeq 4000.

The data is from GSE184462, and I want to run the cellranger-atac pipline, so the standard cellranger input file is needed, but the file uploaded by the author can only separate two fastq files with faster-dump, so I wonder if the barcode is in these two files.

ADD REPLY • link 2.6 years ago by Wakala ▴ 20

1

Entering edit mode

The SRR number led me here: SRR15999465. At the bottom of their paragraph, it gives the read structure as

Libraries were sequenced on a NextSeq500 or HiSeq4000 sequencer (Illumina) using custom sequencing primers with following read lengths: 50 + 10 + 12 + 50 (Read1 + Index1 + Index2 + Read2).

The SRR*.1 leads me to believe this is Read 1. There is some info in the header:

@SRR15999465.1 GCGGATCGATGATACGCCGTAG:K00168:267:

The sequence GCGGATCGATGATACGCCGTAG is exactly as long as Index1 + Index2 (22 nt). I think it's a fair assumption to say Index1 is GCGGATCGAT and Index2 is GATACGCCGTAG. Without knowing more about how the libraries are prepped, that's about as far as I can go.

ADD REPLY • link 2.6 years ago by Trivas ★ 1.8k

score 0 · Answer 1 · 2022-05-11

These samples are already demultiplexed so there will be no indexes in these sequence files. If you look under the Data Access tab then you will realize that they have uploaded already demultiplexed files.

If you are referring to cell barcodes/UMI then they should be R1 file if this is a 10x dataset.

Note: As indicated by @Trivas the submitters may have moved these into the fastq headers. This would be a highly non-standard way.