Hello everyone
I have to analyse visium spatial transcriptome (ST) sequencing data (2 x150 bp) . I want to extract Spatial barcode and UMI from Read1 in order to reduce the read1 length from 150bp to 28 bp (16 bp Spatial Barcode and 12 bp UMI). I found one of the method "umi_tools" which has been used in various single cell studies.
Steps for barcode and UMI extraction :
#1
umi_tools whitelist --stdin R1.fastq.gz \
--bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \
--log2stderr > whitelist.txt;
#2
umi_tools extract --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \
--stdin R1.fastq.gz \
--stdout R1_extracted.fastq.gz \
--read2-in R2.fastq.gz \
--read2-out=R2_extracted.fastq.gz \
--whitelist=whitelist.txt;
I have not done this analysis before. Please correct me if I am doing something "wrong" here. I will appreciate all the suggestions.
I'm assuming this is the 10X kit? If so you can use the unmodified fastq files as input to Space Ranger. It's already set to do the proper processing and QC for 10X data.
Thanks rpolicastro for quick reply.
Yes its with 10X kit. I already analysed the data in spaceranger with unmodified fastq files. But mapping rate to transcriptome (< 30 % ) is very less against human genome. I was wondering if it had something to do with R1 read length.
I used the following parameters :
Under fastq parameters, I have given path of both R1 and R2 which also consist of L1 and L2 files.
I also did some quality filtration, mapping rate increased from 22% to 27 % which is still very less.
I will appreciate all the suggestions.
Solution I found is to define the desired read1 length in spaceranger with --r1-length = 28 (or more) .