How to align 10X R1 and R2 fastqs?
1
0
Entering edit mode
5.4 years ago

Hi guys, I am new to bioinformatics. Ill just keep my question simple. I have downloaded a fastq single cell dataset from 10X. It has R1 and R2 of lane 1 and lane 2. How can I align them and make it to one fastq file?

Example:

Input:

neurons_mouse_L001_R1.fastq
neurons_mouse_L002_R1.fastq
neurons_mouse_L001_R2.fastq
neurons_mouse_L002_R2.fastq

Expected output:

neurons_mouse.fastq
fastq alignment RNA-Seq 10x • 6.2k views
ADD COMMENT
0
Entering edit mode

What kind of 10x data is this?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

28bp read1 (16bp Chromium barcode and 12bp UMI) = R1 , 91bp read2 (transcript) = R2 , and 8bp I7 sample barcode

What are you planning to do next? Same sample ran on two lanes so the files can be concatenated.

ADD REPLY
0
Entering edit mode

I'm planning to build a matirx output so i can analyze it through seurat. Can i do cat of R1 and R2? Can you please explain me.

Thank you so much!

ADD REPLY
0
Entering edit mode

The link you posted above also has analyzed data so unless you want to recreate the analysis you could just get the result files from there directly.

ADD REPLY
0
Entering edit mode

Please first get a background in (sc)RNA-seq analysis by reading tutorials, e.g.

http://research.fhcrc.org/content/dam/stripe/sun/software/scRNAseq/scRNAseq.html

https://bioconductor.org/packages/devel/workflows/vignettes/simpleSingleCell/inst/doc/intro.html

https://davetang.org/muse/2018/08/09/getting-started-with-cell-ranger/

There are many more available, just search the web.

Standard tools for alignment or quantifications could be CellRanger or Alevin, the latter being a more recent development that relies on the quantification strategy of salmon plus a couple of other cool features (see the docs) There is extensive documentation available for both.

ADD REPLY
0
Entering edit mode

You can align paired-end data directly (two lanes or merge into one if possible accorrding to your knowledge) or transform it to single-end one (might lose paired imformation) like

@read1/1
xxx
xxx
xxx
......
@read1/2
xxx
xxx
xxx

use the command like

zcat sample_1.fq.gz | awk '{if(NR%4==1) print $0"/1"; else print $0}' > sample_onefile.fq
zcat sample_2.fq.gz | awk '{if(NR%4==1) print $0"/2"; else print $0}' >> sample_onefile.fq

If it is single-end, the order of reads is not important.

ADD REPLY
0
Entering edit mode

I will move this to a comment as it does not really answer the 10X-related question. In 10X there is no way you should merge paired files into one as R1 contains the barcode/Umi information that is processed separately from the R2 (cDNA) sequence. If you want to merge files (one calls this interleaved fastq better use something like seqtk which does it more conveniently. You also should not append /1 or /2 as (to my knowledge) most aligners expect identical read names for the two mates.

ADD REPLY
0
Entering edit mode
5.4 years ago

I have downloaded a fastq single cell dataset from 10X. It has R1 and R2 of lane 1 and lane 2. How can i align them and make it to one fastq file?

You don't make one fastq file. You can and should concatenate the data from two lanes into one R1 and one R2 file, but cellranger takes both of them separately as input. Did you look at the guides for using cellranger on the 10XGenomics website?

ADD COMMENT

Login before adding your answer.

Traffic: 2598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6