Hello everyone,
I am trying to split my Visium spatial fastq file into many fastq files according to barcodes. So my desire is to have a fastq file for every barcode. My barcode.txt file is something like this...
sp1 AAACAACGAATAGTTC
sp2 AAACAAGTATCTCCCA
sp3 AAACAATCTACTAGCA
sp4 AAACACCAATAACTGC
...the R2 file is something like this...
@SRR19762149.14266 NB552055:200:HWCH5BGXH:1:11101:26830:2925 length=90
AATGCAAACAGTACCTAACAAACCCACAGGTCCTAAACTACCAAACCTGCATTAAAAATTTCGGTTGGGGCGACCTCGGAGCAGAACCCA
+SRR19762149.14266 NB552055:200:HWCH5BGXH:1:11101:26830:2925 length=90
AAAAAE/AEEEEEEEAEAEEEE<EEEEE//<EEE/EEEEEEEEEAEEEAEEE<AEEAE////6EAAA/EE/EA<EEAE/<EEEE//EEA/
...while the R1 file is something like this:
@SRR19762149.13421 NB552055:200:HWCH5BGXH:1:11101:23424:2818 length=28
CTCCGAGTAAATCCGCTCCTCAGTTGAC
+SRR19762149.13421 NB552055:200:HWCH5BGXH:1:11101:23424:2818 length=28
AAAAAEEEEEEEEEEEEEEEEEEEEEAE
I tried this command in Linux from https://github.com/Debian/fastx-toolkit/tree/debian/unstable (with both --eol and --bol):
zcat R2.fastq.gz | ./fastx_barcode_splitter.pl --bcfile barcodes.txt --eol --exact --prefix /output/ --suffix "_R2.fastq" --debug
But unfortunately it keeps on saying:
"matched barcode: unmatched"
I also tried https://bitbucket.org/princeton_genomics/barcode_splitter/src/master/ but again no luck :(
Could you kindly help me to find a solution, please?
Thank you so much in advance!
Matteo
Spatial barcodes are not going to be present in R2 since that is the RNA read. Are you referring to the barcodes (and UMI) which are present in R1? What is the reason you want to do this?
Thank you for your reply GenoMax ! Yes, you are right only R1. I aim to perform analyses that extend beyond spatial information, which is why I’d like to split each spatial spot into single Fastq files. What do you suggest to do in order to resolve these errors? Thanks again for your help!
You could try
sabre
(LINK).GenoMax Thank you again for your reply! I tried https://github.com/najoshi/sabre , but unfortunately again only the unknown_barcode.fastq was generated :( so we decided to go back to the BAM file and we found a reply of yours here Separate single cell BAM file by the cell barcode with the program sinto https://timoast.github.io/sinto/basic_usage.html#filter-cell-barcodes-from-bam-file Let's see how it goes! I will write here again for sure as soon I have something