Entering edit mode
4.5 years ago
dishasharma35
•
0
Dear all
I am interested to converted illumina basecalls to sam. So I needed to use picard ExtractIlluminaBarcodes first. But ourlibrary has adapters and index sequences. Adapters are ~34 basepairs and are 2 in number. Index sequences are 8 basepair i5 and i7 index sequences. I have no barcode information. Please help me how to run these two commands mentioned below.
What would be the read structure when read length is 300 base pairs, ran for 300 cycles. what would be barcode.txt. What would be RUN_BARCODE
?
java -jar picard.jar ExtractIlluminaBarcodes \
BASECALLS_DIR=/BaseCalls/ \
LANE=1 \
READ_STRUCTURE=25T8B25T \
BARCODE_FILE=barcodes.txt \
METRICS_FILE=metrics_output.txt
java -jar picard.jar IlluminaBasecallsToSam \
BASECALLS_DIR=/BaseCalls/ \
LANE=001 \
READ_STRUCTURE=25T8B25T \
RUN_BARCODE=run15 \
IGNORE_UNEXPECTED_BARCODES=true \
LIBRARY_PARAMS=library.params
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.Thank you so much. I didn't know before.
Should be
in your case. You will also need to know the expected barcodes present in this dataset. You will need to make a file up that looks like this (this is the file referred to as
barcodes.txt
in your example. If you don't have this information then you may not be able to use this method.That said, do you have access to full data folder for this run? If you do then running
bcl2fastq
without a samplesheet can give you all reads from this run. Once you have those files you can determine predominant index combinations present in those files by using the code here: C: Demultiplexing reads with index present in the labels With this method you do not need to apriori know what indexes are expected to be present in your data.I have the index combinations for each samples. Are these same as barcodes?
I can run BCL2FASTQ but then I would need to convert to ubam and that will add one more step to analysis so I was interested to directly convert from BCL2SAM.
Yes. If for some reason you need to stick with Picard workflow then you can use the method posted above.
If you have the index/barcode combinations then create an appropriate samplesheet (use Illumina Experiment Manager software, windows only) with that information. You can then directly demultiplex the data to generate fastq files. No picard needed. Again for this to work you will need access to full raw flowcell folder.
I have the access to raw folder. I have performed bcl2fastq which is done successfully. Only issue is with picard illuminabasecallstosam as the barcode.txt is not able to pick multiple combinations. Below is my index sequence combination. When i provide this file as barcode.txt, it says entries should be unique. barcode_1 is i5 index and barcode_2 is i7 index.
If you just need a unaligned bam file then you can produce that with
reformat.sh
from BBMap suite. Something likereformat.sh in1=R1.fq.gz in2=R2.fq.gz out=unaligned.bam
will work. Make sure you havesamtools
available in your$PATH
.What does that mean? Are those not recognized as valid indexes? Check if you have one or both of them reverse-complemented. It is a common mistake people do.
That error was resolved when i changed the read structure to what you suggested. Thats the output. Why the barcodes are merged for barcode_1 and barcode_2. And it shows so less percent of reads with barcodes. Is that correct output?