Entering edit mode
6.4 years ago
salamandra
▴
550
1- When we remove adapters with trimmomatic for example, are we also removing barcodes? Or is there another command to remove barcodes.
2- I heard that some sequencing providers already remove barcodes from their samples before delivering sequence to the client. Is it the case for Illuminia?
3- Does Illuminia remove adapters from reads before providing them to the client?
1) Trimmomatic removes adapter sequences based on the sequences you provide. Removal of "barcodes" (you probably mean sequencing indices) is called demultiplexing and is not supported by Trimmomatic.
2) Illumina is only the company behind the sequencing technology. It depends on the sequencing center you work with, if they provide demultiplexed files. Typically that is the case. If you download from NCBI or ENA, stuff is (as far as I know) always already demultiplexed.
3) Again, depends on the facility. If you book this service, they might do it. Typically they only demultiplex. Use fastqc to check for adapter content (which I always recommend, not because I do not trust the bioinformaticians at the facilities, but in the end it is you as the analyst who must confirm that the data quality is good, no matter what the facility said).
2) My reads are separated in different files, which might indicate they were de-multiplexed. Does this means the barcodes were removed from reads also, or although reads were split into different files according to sample the barcodes are still in the sequence? In latter case, which tool alows removal of barcode sequences?
3) Is it enough to look at 'adapter content' fastqc? I ask because, in some samples there was no warnings in 'adapter content module', but 'overepresented sequences module' had some sequences called illuminia index 'something'
See my comment below. Index sequences (barcodes) are moved to the
headers of fastq sequence
as a part of demultiplexing process.It is not enough to just look at FastQC report. You should always scan (and trim) your data with a proper program like
bbduk.sh
or trimmomatic. There can be low level contamination of adapters in your sequence that FastQC can miss. FastQC does not look at every read in the dataset as it does QC (only parts of data are used for various tests and that is generally ok).In case samples are not demultiplexed wich tool can be used to demultiplex?
Since you likely don't have access to original flowcell data folder you may need to use: deML or
demuxbyname.sh
from BBMap suite. You will need to know index sequences association for the samples for BBMap option.What do you mean by barcode, the primer? Most of the time the adapters are already trimmed. This is done during the basecalling/demultiplexing of raw data. Terminology is always confusing, also not sure if I use the right words now.
mean, the sequences that identify the different samples
Ah oke clear. If you got them back as seperated files you can open a file and check if all te sequences start with the same bases.
I would say yes. In the manual it says "ILLUMINACLIP: Cut adapter and other illumina-specific sequences from the read" so I assume also the nextera labels etc. Manual: http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf
But it easy to check for yourself. Just run trimmomatic on a subsample and see if the everything is trimmed off that you wanted to be trimmed off.
Index sequences (or barcodes) are not the same thing as adapters. Index sequences are always read independently in Illumina tech and are never part of the main reads. ILLUMINACLIP is cutting adapter sequences.