Hello,
I am looking at this experiment
Looks like the authors uploaded a multiplexed dataset with no info on barcoding. Any way to guess the barcodes? Just by running fastqc you can sorta guess what some barcodes may be, but is there a better way?
I downloaded the sra file and used --split-3
, which gave me 2 files for PE.
Here is what the file sorta looks like:
@SRR2046220.999996 HISEQ:108:C24D5ACXX:4:1101:5712:37823 length=102
AAGGATGCAATTCCTGGTGGTGCCATGGAGGTAAAGTCATAGTATTTTTATGATTTATATTTACATATTTTTACACTTCATAGTCATTTTTATAAAACTTTN
+SRR2046220.999996 HISEQ:108:C24D5ACXX:4:1101:5712:37823 length=102
CCCFFFFFHHHHHJJJIFGGFHIIHIIJAHIEGHJHFGGFIIBGIIJJJJJIJJIJJJJIJJJJJJIJIIJJJIIHHIIJIHHHFHHEFFFFFFEDDECEE#
@SRR2046220.999997 HISEQ:108:C24D5ACXX:4:1101:5536:37823 length=102
CAGGACGAAAATGAAGGTTTGGTTTTAACATTTGATCTGAGTTTATAGTATAGAAAGAGATCTATATTGACTCAGCTTTGCATATAAATCATACATTCTAGN
+SRR2046220.999997 HISEQ:108:C24D5ACXX:4:1101:5536:37823 length=102
######################################################################################################
@SRR2046220.999998 HISEQ:108:C24D5ACXX:4:1101:5653:37824 length=102
TAACTCTCTATTCACGAAAATCTGATCAATTGGATGACGGCTCGAAGAGCTTGATTCTACCAGATAGTACAGTTACATCAGGATGAAGTGCAGAAACGCTTN
+SRR2046220.999998 HISEQ:108:C24D5ACXX:4:1101:5653:37824 length=102
0;8@##################################################################################################
@SRR2046220.999999 HISEQ:108:C24D5ACXX:4:1101:5739:37825 length=102
CTCAATTCAATTCGGAGCTTCGTCCCCTACAGGACCTCACCCTTCGATCAAACTAAATTATTATTCTTTTTCCAATATTACAATATCAACAATATGTACGTN
+SRR2046220.999999 HISEQ:108:C24D5ACXX:4:1101:5739:37825 length=102
1++44=BDF>FFFEEG1CFCGIDGHFH@G>FGEHDGGIIIIID;;FFHEG8@FG@AGHH;CAEFFEHHHEEEDDE>CCEDCA5>>CDEC?@?CCCC>@CB<#
@SRR2046220.1000000 HISEQ:108:C24D5ACXX:4:1101:5569:37827 length=102
TCTCTTACAATTCCAAAAGATATAGATAAGGCAATTTATTGGTATGAAGAATCTGCTAAACAAGGAAATCAAGGTGCACAAAATAGTTTAGAAGGACTTCAN
+SRR2046220.1000000 HISEQ:108:C24D5ACXX:4:1101:5569:37827 length=102
+++22?@A+?CCCCBBBCBCABBBBBCBBBBCCBBBBBBBBBABABBBBBBBBBBBBBBBBBBBBBABBBBABB>=ABAAAA<>>@?@@B>@@@@==;???#
Based on the SRA entry this is ddRAD-Seq data. ENA also has just paired end fastq files. This must be EcoRI-MspI digested fragments.
How to detect if a certain SRA RNA-seq fastq file has been demultiplexed or not?
Check the index sequences in fastq headers. If there are more than one the file is likely not demultiplexed.