barcode in the fastq files
1
1
Entering edit mode
2.5 years ago
Wakala ▴ 20

Hello everyone, I have two questions to ask for help about barcode in the fastq files:

  1. I want to know where the barcode is in the fastq file
  2. how can I get the barcode length

Thank you very much

For example:

@SRR15999465.1 GCGGATCGATGATACGCCGTAG:K00168:267:HCYCLBBXY:7:1101:21684:1156 length=51
NGGATACTAGGAGGAGTATTGATAACTGCCATTCATGGAACACCTGTGAAT
+SRR15999465.1 GCGGATCGATGATACGCCGTAG:K00168:267:HCYCLBBXY:7:1101:21684:1156 length=51
#AAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJA<FJJ
RNA-seq Single cell • 2.8k views
ADD COMMENT
1
Entering edit mode

Can you elaborate a bit? What kind of single cell platform do you use? What was the sequencing method?

ADD REPLY
0
Entering edit mode

scATAC,Illumina NextSeq 500 and Illumina HiSeq 4000.


The data is from GSE184462, and I want to run the cellranger-atac pipline, so the standard cellranger input file is needed, but the file uploaded by the author can only separate two fastq files with faster-dump, so I wonder if the barcode is in these two files.

ADD REPLY
1
Entering edit mode

The SRR number led me here: SRR15999465. At the bottom of their paragraph, it gives the read structure as

Libraries were sequenced on a NextSeq500 or HiSeq4000 sequencer (Illumina) using custom sequencing primers with following read lengths: 50 + 10 + 12 + 50 (Read1 + Index1 + Index2 + Read2).

The SRR*.1 leads me to believe this is Read 1. There is some info in the header:

@SRR15999465.1 GCGGATCGATGATACGCCGTAG:K00168:267:

The sequence GCGGATCGATGATACGCCGTAG is exactly as long as Index1 + Index2 (22 nt). I think it's a fair assumption to say Index1 is GCGGATCGAT and Index2 is GATACGCCGTAG. Without knowing more about how the libraries are prepped, that's about as far as I can go.

ADD REPLY
0
Entering edit mode
2.5 years ago
GenoMax 147k

These samples are already demultiplexed so there will be no indexes in these sequence files. If you look under the Data Access tab then you will realize that they have uploaded already demultiplexed files.

If you are referring to cell barcodes/UMI then they should be R1 file if this is a 10x dataset.

Note: As indicated by @Trivas the submitters may have moved these into the fastq headers. This would be a highly non-standard way.

ADD COMMENT

Login before adding your answer.

Traffic: 2010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6