Hello, I have a question about the paired end sequencing. When you have the FASTQ files of the read1 and the read2, that come from a paired sequencing, is it correct to assume that if in the position 1, of the R1 file, you have the read X in the same position of the R2 file you have the paired reads of X? Because if this is not true you need to check the name of millions of sequences and it will be very time consuming, if only one reads is missing or is in the incorrect order in R1 or R2 you will have reads paired incorrectly, could this happen? Do you know if the aligners check the names of the reads when they align paired reads or they just rely on the position of the reads? Thank you.
Best
Hi, is this also possible if you consider the original files that the sequencer gives you in output? Thanks
If no trimming has been done for the data then they should be in sync. If in doubt run
repair.sh
to be sure. If data is in sync then nothing should appear in the singleton's file.Hi, I think it's really not necessary if nothing appears in singleton's file your R1 and R2 are in sync. I have good quality sequencing data which required no trimming. Also starting reads were in sync. However, there were few reads in mid of the file that was out of sync. You can't always say at the face value if R1 and R2 are in sync until you face an error during the alignment step which is as follows:
A more general question that comes to my mind and I haven't found an answer to is is it a sequencing defect or something went haywire during demultiplexing. Because I have such issue for all the samples that were run on a single flow cell. Quite strange though!
That should not happen if you are using
repair.sh
tool. If your files are not in sync it should flag those.If your have reads that have the relevant part of identifiers (e.g.
1:Y:18:ATCACG
) stripped away from fastq headers then it would be difficult for any program to find if reads are out of sync.Are you referring to original read files? No manipulation has been done to them after they came off the sequencher/demultiplexing before you started these alignments?
Precisely. I have files where no manipulation was done post demultiplexing. When I align it to the reference I get an error
paired reads have different names
. I used repair.sh script to reorder the files. My singleton files (for all 4 samples) are empty. However, post repair.sh the error disappears.Since I didn't perform any preprocessing on fastq files and went on for direct alignment, I suspect something might have gone fishy during demultiplexing. But I don't have any evidence/explanation on why would it happen during demultiplexing.
So
repair.sh
does work as intended. There are very rare errors like this in the output ofbcl2fastq
. One speculation I have is that these files were made using a file system that was not performant. It may not have kept up with the processes that wrote the output file properly.But your point is well taken. In this specific instance, singleton files will be empty, after
repair.sh
does its job.