Confusion about Illumina PE reads structure
3
1
Entering edit mode
8.7 years ago
Picasa ▴ 650

Based on these figures:

http://www.illumina.com/content/dam/illumina-marketing/images/products/truseq_custom_amplicon_workflow.jpg https://sites.google.com/a/brown.edu/bioinformatics-in-biomed/qiime-tutorial-3

I don't understand what is the final structure we got for a paired end fastq:

1) In the second link: a forward read seem to be composed of a barcode sequence and a primer sequence but a reverse read doesn't have a barcode sequence ?

2) If I use Trimmomatic (http://www.usadellab.org/cms/?page=trimmomatic), it says that I can remove adapters but the second link says there is no adapters ??

illumina paired end • 6.1k views
ADD COMMENT
0
Entering edit mode

Ok but the primer is present in a read file ?

ADD REPLY
0
Entering edit mode

The barcode reads are present in a separate file from R1 and R2.

ADD REPLY
0
Entering edit mode

1) Barcodes and Index are the same ?

In this link, index can be in the same file (in the read id) : https://en.wikipedia.org/wiki/FASTQ_format

ADD REPLY
1
Entering edit mode

Yes and they're usually only in the read ID. While it's possible to include them in a separate file, this is almost never done (it's a sure fire way to both confuse people and break a LOT of automated pipelines).

ADD REPLY
0
Entering edit mode

Not unless the fragment length was shorter than the read length (e.g., when sequencing miRNAs).

ADD REPLY
0
Entering edit mode

So a a read starts always with the target sequence ? (5' position)

ADD REPLY
0
Entering edit mode

99.99% of the time, yes. It occasionally happens that you get adapter dimers on the end. In these cases the adapter sequence will either be soft-clipped (so it doesn't matter) or the read won't align (again, that's OK).

ADD REPLY
0
Entering edit mode

1) So in the first figure: The "Custom probe" is the adapter right ? and what about P7 and P5 ?

2) In the second figure, what they call Primer sequence is in fact the adapter ?

ADD REPLY
0
Entering edit mode

1) So in the first figure: The "Custom probe" is the adapter right ? and what about P7 and P5 ?

Yes. The custom probe is the adapter. The figure is for custom amplicon sequencing. Are you using the same method?

P7 and P5 are the oligomers linked to the read+adapter along with index/barcode by PCR. These sequences attach to the complimentary oligos attached to the illumina flowchip where the actual sequencing occur.

2) In the second figure, what they call Primer sequence is in fact the adapter ?

Looks like that.

ADD REPLY
0
Entering edit mode

The terms primer/adapter are often used interchangeably, as are index/barcode.

ADD REPLY
1
Entering edit mode
8.7 years ago
mastal511 ★ 2.1k

Depending on what version of the Illumina technology reads come from, they may have barcodes of varying lengths, and on both reads of a pair, or only on one read of a pair.

Note though, that with Illumina technology, the barcodes are read in a separate run or runs from the R1 and R2 sequences.

However, in the cases where the DNA insert is shorter than the read length, you will read through into the first part of the adapter sequence, and eventually into the barcode sequence, so some reads may have an adapter sequence and maybe also a barcode sequence towards the 3' end of the read. This is what trimmomatic removes.

In the link you gave from Brown Uni, the sequencing primer hybridises to the green part of the sequence, and you start getting the sequence of the blue part of the line (sequencing-by-synthesis). If the blue part is shorter than the number of sequencing cycles, you read into the yellow part (the reverse adapter), and you get the sequence of this adapter on the 3' end of your read.

ADD COMMENT
0
Entering edit mode

Can there be barcode sequence or any part of adapter at the 5' end of reverse read?? I have so in 75 out of the 7 million reads in my illlumina paired end data. Should these reads be deleted? What could be the reason for such contamination? (adapter dimers would have caused the 3' adapter to be present at 5' of read, but I see 5' adapter at 5' end of read.. Why is it so??)

ADD REPLY
0
Entering edit mode

You can get some reads that are entirely adapter sequences. How long are your reads, and for the reverse reads where you see adapter sequences at the 5' end, what do the corresponding forward reads look like?

ADD REPLY
0
Entering edit mode

The reads are 100bp in length.. I've got 40 hits for exact matches with 5' adapter (63 bp) at 5' end in reverse reads file.

eg: reverse read: 5' CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCAAGCAGAAGACGGCATACGAGATCGTGATGGGACT 3'

forward read for same: 5' CAAGCAGAAGACGGCATACGAGATCGTGATGGGACTGGAGTTCAGACGTGGGCTCTTGCGATCTGAAGCAGAAGACGGGATACGAGATCGGGAGAGGAGA 3'

What does this mean??? Why is there an identical sequence at 5' of both??

ADD REPLY
0
Entering edit mode

Another case: forward 5' CAAGCAGAAGACGGCATAGGAGATCGTGATGTGAATGGAGTTCAGACGTGTGCTATTCCGATCTCAAGCATAAGACGGGATACGGGATGGGAAGAGCAAC 3'

reverse 5' CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCGAGCAGAATACTGTAGATGAGATCGGGAGAGGACT 3'

3' of forward read is reverse complemet of 5'!

ADD REPLY
0
Entering edit mode

And what about the cases (5000) where 5' adapter matches to 5-10 bases at the 5' of a read?? Should these be discarded? Or can be included??

ADD REPLY
1
Entering edit mode
8.7 years ago

There generally aren't adapters, except at the 3' end if your fragment size was shorter than the read length. The first link you showed didn't include the location of the sequencing primers, which likely would have clarified this for you.

ADD COMMENT
1
Entering edit mode
8.7 years ago
Shyam ▴ 150

Check this link https://ngsc.med.upenn.edu/Experiments/Public_Data/Media/Public/FASTQ-Files.html Barcodes are part of the 3' adapter and used to separate reads from different libraries when pooled and sequenced. If you are sequencing inserts larger than your read size say 180-300bp you are not expected to see any adapter sequences or parts of them in your reads. But nothing is perfect in sequencing and sometimes you may end up with part your data containing few bases at the 3' end of your read containing the part of adapter sequences. When using trimmomatic use the adapter sequences specific to the type of library you used. If there are any adapter sequences at the end of the reads it will trim and keep the 5' part of the read.

ADD COMMENT

Login before adding your answer.

Traffic: 2368 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6