Entering edit mode
3.0 years ago
jonas.andersson
▴
40
Hi!
As the title says, how does the sequencing machine know which strand is the forward/reverse?
Let's say I have a fastq file of a WGS, is that presented in a single strand with the forward strand followed by the reverse complement from the reverse strand?
help, I'm confused!
best Jonas
Thanks for you answer! But when they decided the reference genome, how did they know which was the forward and reverse strand from the beginning? was that just arbitrarily decided by humans?
It is mostly arbitrary, although by convention the forward strand (a.k.a, Watson strand or plus strand) is the strand of a chromosome that has its 5'-end at the short-arm telomere and its 3'-end at the long-arm telomere. Obviously, this is only relevant for linear genomes.
More details on the usage of reference DNA strand on this paper: The multiple personalities of Watson and Crick strands
Great, thank you so much! exactly the information I was lacking:) Now I understand!
just one last question that I hope you can answer:) The WGS I have been involved in is using Illumina and the DNA is fragmented using sonication. What technique was first used to determine the forward/reverse strands when e.g., the human reference genome was established? Obviously, the library can't have been prepared with sonication because then it would be impossible to distinguish which fragment belongs to which strand. So how was this done?
Haaaa interesting question ! You should read about the human genome project, it is a very interesting part of the recent history of biology and bioinformatics.
Actually two methods were used for the first human whole genome sequencing: hierarchical shotgun sequencing by the publicly funded human genome project (an international consortium) and the more "brute force" whole genome shotgun sequencing by their competitor, the private firm Celera Genomics led by Craig Venter (who tried to patent the human genome). Fortunately, politics stepped in and declared the human genome public (so that nobody could patent it), which ended the race and led the consortium and Celera to cooperate at the end.
In both methods, the DNA is sheared in small fragments (as with the sonication) and sequencing occurs from both ends of the fragment (so you get partial information for each strand). At that point, we still don't know which strand is which. It is only after careful assembly of all reads – like a really big jigsaw puzzle – that strands are attributed. Note that a strand carry exactly the same information than the other one (it is reversed complemented), and that fragments are made of both strands.
Alignment file would have the strand information