Dear all, I made a set of RNA-seq libraries without illumina index embedded, but with inner barcode right after read 1 sequences. Now they were sequenced on Novaseq platform with others libraries. Where could I get my data? In undetermined data, or in the data marked by GGGGGG index? Thanks a lot.
So where exactly is your barcode?
Like this?:
if so, how long are the cDNA fragments and what was the read length of your run?
My humble recommendation, after several long unresolved discussions with sequencing guys is to search your internal barcode in both "undetermined" and G8. For 8 base codes, at a precise known location, error probability is low.
Yes. The barcode location is right. My insertion size is ~180-375 bp. Read length is PE150. Sequencing facility sent me G8 data split by my barcode. But the unique mapping rate is low (~47%), multi-alignment rate is ~50%. Usually, I got 90% unique mapping for arabidopsis samples on Hiseq2500. So I am splitting data again from undetermined data as Devon said. I am not sure which one could be used. Or merge both?