Hi guys, I am new to bioinformatics. Ill just keep my question simple. I have downloaded a fastq single cell dataset from 10X. It has R1 and R2 of lane 1 and lane 2. How can I align them and make it to one fastq file?
Example:
Input:
neurons_mouse_L001_R1.fastq
neurons_mouse_L002_R1.fastq
neurons_mouse_L001_R2.fastq
neurons_mouse_L002_R2.fastq
Expected output:
neurons_mouse.fastq
What kind of 10x data is this?
This is mouse brain single cell data. https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/neuron_1k_v2
What are you planning to do next? Same sample ran on two lanes so the files can be concatenated.
I'm planning to build a matirx output so i can analyze it through seurat. Can i do cat of R1 and R2? Can you please explain me.
Thank you so much!
The link you posted above also has analyzed data so unless you want to recreate the analysis you could just get the result files from there directly.
Please first get a background in (sc)RNA-seq analysis by reading tutorials, e.g.
http://research.fhcrc.org/content/dam/stripe/sun/software/scRNAseq/scRNAseq.html
https://bioconductor.org/packages/devel/workflows/vignettes/simpleSingleCell/inst/doc/intro.html
https://davetang.org/muse/2018/08/09/getting-started-with-cell-ranger/
There are many more available, just search the web.
Standard tools for alignment or quantifications could be
CellRanger
orAlevin
, the latter being a more recent development that relies on the quantification strategy ofsalmon
plus a couple of other cool features (see the docs) There is extensive documentation available for both.You can align paired-end data directly (two lanes or merge into one if possible accorrding to your knowledge) or transform it to single-end one (might lose paired imformation) like
use the command like
If it is single-end, the order of reads is not important.
I will move this to a comment as it does not really answer the 10X-related question. In 10X there is no way you should merge paired files into one as R1 contains the barcode/Umi information that is processed separately from the R2 (cDNA) sequence. If you want to merge files (one calls this
interleaved fastq
better use something likeseqtk
which does it more conveniently. You also should not append /1 or /2 as (to my knowledge) most aligners expect identical read names for the two mates.