Question

PICARD ExactIlluminaBarcodes with NO BARCODES ILLUMINA LIBRARY

0

Entering edit mode

4.5 years ago

dishasharma35 • 0

Dear all

I am interested to converted illumina basecalls to sam. So I needed to use picard ExtractIlluminaBarcodes first. But ourlibrary has adapters and index sequences. Adapters are ~34 basepairs and are 2 in number. Index sequences are 8 basepair i5 and i7 index sequences. I have no barcode information. Please help me how to run these two commands mentioned below.

What would be the read structure when read length is 300 base pairs, ran for 300 cycles. what would be barcode.txt. What would be RUN_BARCODE?

java -jar picard.jar ExtractIlluminaBarcodes \
     BASECALLS_DIR=/BaseCalls/ \
     LANE=1 \
     READ_STRUCTURE=25T8B25T \
     BARCODE_FILE=barcodes.txt \
     METRICS_FILE=metrics_output.txt 

java -jar picard.jar IlluminaBasecallsToSam \
      BASECALLS_DIR=/BaseCalls/ \
      LANE=001 \
      READ_STRUCTURE=25T8B25T \
      RUN_BARCODE=run15 \
      IGNORE_UNEXPECTED_BARCODES=true \
      LIBRARY_PARAMS=library.params

sequencing illumina picard bcl2fastq basecalling • 2.3k views

ADD COMMENT • link updated 4.5 years ago by Ram 44k • written 4.5 years ago by dishasharma35 • 0

0

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY • link 4.5 years ago by Ram 44k

0

Entering edit mode

Thank you so much. I didn't know before.

ADD REPLY • link 4.5 years ago by dishasharma35 • 0

0

Entering edit mode

READ_STRUCTURE=25T8B25T

Should be

READ_STRUCTURE=300T8B8B300T

in your case. You will also need to know the expected barcodes present in this dataset. You will need to make a file up that looks like this (this is the file referred to as barcodes.txt in your example. If you don't have this information then you may not be able to use this method.

barcode_name     library_name       barcode_1    barcode_2
Bar1             Sample_1           ATCGCTAG     CGCTGATC
Bar2             Sample_2           TCGATCGT     CAGCTAGC

That said, do you have access to full data folder for this run? If you do then running bcl2fastq without a samplesheet can give you all reads from this run. Once you have those files you can determine predominant index combinations present in those files by using the code here: C: Demultiplexing reads with index present in the labels With this method you do not need to apriori know what indexes are expected to be present in your data.

ADD REPLY • link 4.5 years ago by GenoMax 147k

0

Entering edit mode

I have the index combinations for each samples. Are these same as barcodes?
I can run BCL2FASTQ but then I would need to convert to ubam and that will add one more step to analysis so I was interested to directly convert from BCL2SAM.

ADD REPLY • link 4.5 years ago by dishasharma35 • 0

0

Entering edit mode

Are these same as barcodes?

Yes. If for some reason you need to stick with Picard workflow then you can use the method posted above.

If you have the index/barcode combinations then create an appropriate samplesheet (use Illumina Experiment Manager software, windows only) with that information. You can then directly demultiplex the data to generate fastq files. No picard needed. Again for this to work you will need access to full raw flowcell folder.

ADD REPLY • link 4.5 years ago by GenoMax 147k

0

Entering edit mode

I have the access to raw folder. I have performed bcl2fastq which is done successfully. Only issue is with picard illuminabasecallstosam as the barcode.txt is not able to pick multiple combinations. Below is my index sequence combination. When i provide this file as barcode.txt, it says entries should be unique. barcode_1 is i5 index and barcode_2 is i7 index.

barcode_name    library_name    barcode_1   barcode_2
barcode1    Sample1 ATTACTCG    TATAGCCT
barcode2    Sample2 ATTACTCG    ATAGAGGC
barcode3    Sample3 ATTACTCG    CCTATCCT
barcode4    Sample4 ATTACTCG    GGCTCTGA
barcode5    Sample5 ATTACTCG    AGGCGAAG
barcode6    Sample6 ATTACTCG    TAATCTTA
barcode7    Sample7 ATTACTCG    CAGGACGT
barcode8    Sample8 ATTACTCG    GTACTGAC
barcode9    Sample9 TCCGGAGA    TATAGCCT
barcode10   Sample10    TCCGGAGA    ATAGAGGC
barcode11   Sample11    TCCGGAGA    CCTATCCT
barcode12   Sample12    TCCGGAGA    GGCTCTGA
barcode13   Sample13    TCCGGAGA    AGGCGAAG
barcode14   Sample14    TCCGGAGA    TAATCTTA
barcode15   Sample15    TCCGGAGA    CAGGACGT
barcode16   Sample16    TCCGGAGA    GTACTGAC
barcode17   Sample17    CGCTCATT    TATAGCCT
barcode18   Sample18    CGCTCATT    ATAGAGGC
barcode19   Sample19    CGCTCATT    CCTATCCT
barcode20   Sample20    CGCTCATT    GGCTCTGA
barcode21   Sample21    CGCTCATT    AGGCGAAG
barcode22   Sample22    CGCTCATT    TAATCTTA
barcode23   Sample23    CGCTCATT    CAGGACGT
barcode24   Sample24    CGCTCATT    GTACTGAC
barcode25   Sample25    GAGATTCC    TATAGCCT
barcode26   Sample26    GAGATTCC    ATAGAGGC
barcode27   Sample27    GAGATTCC    CCTATCCT
barcode28   Sample28    GAGATTCC    GGCTCTGA
barcode29   Sample29    GAGATTCC    AGGCGAAG
barcode30   Sample30    GAGATTCC    TAATCTTA

ADD REPLY • link updated 4.5 years ago by GenoMax 147k • written 4.5 years ago by dishasharma35 • 0

0

Entering edit mode

If you just need a unaligned bam file then you can produce that with reformat.sh from BBMap suite. Something like reformat.sh in1=R1.fq.gz in2=R2.fq.gz out=unaligned.bam will work. Make sure you have samtools available in your $PATH.

as the barcode.txt is not able to pick multiple combinations.

What does that mean? Are those not recognized as valid indexes? Check if you have one or both of them reverse-complemented. It is a common mistake people do.

ADD REPLY • link 4.5 years ago by GenoMax 147k

0

Entering edit mode

That error was resolved when i changed the read structure to what you suggested. Thats the output. Why the barcodes are merged for barcode_1 and barcode_2. And it shows so less percent of reads with barcodes. Is that correct output?

BARCODE BARCODE_WITHOUT_DELIMITER     BARCODE_NAME      LIBRARY_NAME    READS   PF_READS    PERFECT_MATCHES  PF_PERFECT_MATCHES ONE_MISMATCH_MATCHES    PF_ONE_MISMATCH_MATCHES P
CT_MATCHES      RATIO_THIS_BARCODE_TO_BEST_BARCODE_PCT  PF_PCT_MATCHES  PF_RATIO_THIS_BARCODE_TO_BEST_BARCODE_PCT  PF_NORMALIZED_MATCHES
ATTACTCG-TATAGCCT ATTACTCGTATAGCCT    barcode1  IND19050444     3       3  0    0     3 3   0   0.057692  0  0.057692   0.280374
ATTACTCG-ATAGAGGC ATTACTCGATAGAGGC    barcode2  IND19050446     7       7  0    0     7 7   0   0.134615  0  0.134615   0.654206
ATTACTCG-CCTATCCT ATTACTCGCCTATCCT    barcode3  IND19050482     4       4  2    2     2 2   0   0.076923  0  0.076923   0.373832
ATTACTCG-GGCTCTGA ATTACTCGGGCTCTGA    barcode4  IND19050488     8       8  0    0     8 8   0   0.153846  0  0.153846   0.747664
ATTACTCG-AGGCGAAG ATTACTCGAGGCGAAG    barcode5  IND19050489     4       4  0    0     4 4   0   0.076923  0  0.076923   0.373832
ATTACTCG-TAATCTTA ATTACTCGTAATCTTA    barcode6  IND19050716     17      17 0    0     17    17  0       0.326923   0    0.326923  1.588785
ATTACTCG-CAGGACGT ATTACTCGCAGGACGT    barcode7  IND19050718     2       2  0    0     2 2   0   0.038462  0  0.038462   0.186916
ATTACTCG-GTACTGAC ATTACTCGGTACTGAC    barcode8  IND19050720     1       1  0    0     1 1   0   0.019231  0  0.019231   0.093458
TCCGGAGA-TATAGCCT TCCGGAGATATAGCCT    barcode9  IND19050736     3       3  0    0     3 3   0   0.057692  0  0.057692   0.280374
TCCGGAGA-ATAGAGGC TCCGGAGAATAGAGGC    barcode10 IND19050741     1       1  0    0     1 1   0   0.019231  0  0.019231   0.093458
TCCGGAGA-CCTATCCT TCCGGAGACCTATCCT    barcode11 IND19051101     5       5  0    0     5 5   0   0.096154  0  0.096154   0.46729
TCCGGAGA-GGCTCTGA TCCGGAGAGGCTCTGA    barcode12 IND19051105     13      13 0    0     13    13  0       0.25 0  0.25    1.214953
TCCGGAGA-AGGCGAAG TCCGGAGAAGGCGAAG    barcode13 IND19051106     4       4  0    0     4 4   0   0.076923  0  0.076923   0.373832
TCCGGAGA-TAATCTTA TCCGGAGATAATCTTA    barcode14 IND19051108     14      14 0    0     14    14  0       0.269231   0    0.269231  1.308411
TCCGGAGA-CAGGACGT TCCGGAGACAGGACGT    barcode15 IND19051109     4       4  1    1     3 3   0   0.076923  0  0.076923   0.373832
TCCGGAGA-GTACTGAC TCCGGAGAGTACTGAC    barcode16 IND19051119     12      12 0    0     12    12  0       0.230769   0    0.230769  1.121495
CGCTCATT-TATAGCCT CGCTCATTTATAGCCT    barcode17 IND19051128     2       2  0    0     2 2   0   0.038462  0  0.038462   0.186916
CGCTCATT-ATAGAGGC CGCTCATTATAGAGGC    barcode18 IND19051129     2       2  0    0     2 2   0   0.038462  0  0.038462   0.186916
CGCTCATT-CCTATCCT CGCTCATTCCTATCCT    barcode19 IND19051130     11      11 0    0     11    11  0       0.211538   0    0.211538  1.028037
CGCTCATT-GGCTCTGA CGCTCATTGGCTCTGA    barcode20 IND19051167     10      10 0    0     10    10  0       0.192308   0    0.192308  0.934579
CGCTCATT-AGGCGAAG CGCTCATTAGGCGAAG    barcode21 IND19051180     0       0  0    0     0 0   0   0       0 0  0
CGCTCATT-TAATCTTA CGCTCATTTAATCTTA    barcode22 IND19051186     52      52 1    1     51    51  0       1 0  1  4.859813
CGCTCATT-CAGGACGT CGCTCATTCAGGACGT    barcode23 IND19051187     4       4  0    0     4 4   0   0.076923  0  0.076923   0.373832
CGCTCATT-GTACTGAC CGCTCATTGTACTGAC    barcode24 IND19051190     3       3  0    0     3 3   0   0.057692  0  0.057692   0.280374
GAGATTCC-TATAGCCT GAGATTCCTATAGCCT    barcode25 IND19051193     22      22 1    1     21    21  0       0.423077   0    0.423077  2.056075
GAGATTCC-ATAGAGGC GAGATTCCATAGAGGC    barcode26 IND19051196     21      21 0    0     21    21  0       0.403846   0    0.403846  1.962617
GAGATTCC-CCTATCCT GAGATTCCCCTATCCT    barcode27 IND19051197     22      22 0    0     22    22  0       0.423077   0    0.423077  2.056075
GAGATTCC-GGCTCTGA GAGATTCCGGCTCTGA    barcode28 IND190511103    19      19 0    0     19    19  0       0.365385   0    0.365385  1.775701
GAGATTCC-AGGCGAAG GAGATTCCAGGCGAAG    barcode29 IND190511104    21      21 2    2     19    19  0       0.403846   0    0.403846  1.962617
GAGATTCC-TAATCTTA GAGATTCCTAATCTTA    barcode30 IND19050447     30      30 0    0     30    30  0       0.576923   0    0.576923  2.803738
NNNNNNNN-NNNNNNNN NNNNNNNNNNNNNNNN              3101327150      3101327150 0    0     0 0   1   59640906.730769 1  59640906.730769    0

ADD REPLY • link 4.5 years ago by dishasharma35 • 0