I am working on data (RNA-Seq) that was sequenced as 151 bp PE using NovaSeq.
Looking at the QC that was generated after library construction (the library was constructed using TruSeq RNA Access ), the size of the fragments is given as ~300 on average. Since this is after library construction, I understand that this includes adapters as well.
I'm still new in bioinformatics, but I completely fail to understand the logic behind sequencing 151 bp paired end with a fragment that after adapter removal should be around 170 bp only. As far as I understand, R2 would mostly overlap with R1. While 151 bp single read sequencing could make sense, I fail to see the logic behind paired-end sequencing of such a fragment.
The answer I have received from the company as to why 150 paired ending was performed is
"From what we checked, 150PE sequencing is agreed on the quotation and we followed the settings. As the fragments size is longer than 150bp, 150PE sequencing seems acceptable. Additionally the sequencing result shows high quality(e.g. Q30 is higher than 90). If you want it to be 100PE sequencing, please let me know. We will trim the data and send it back to you."
1) Am I correct that sequencing paired-end (as opposed to single-read) was completely useless in this case?
2) Does the person to whom the data belongs have a moral case for asking a refund? While he might not have a legal case (as the company states, 150 PE is agreed on the quotation), I think they should have definitely warned him that 150 paired end is not needed if the biological fragment is 170 bp long. He could have saved part of the money (I guess that a large part, though I don't know) by sequencing single end.
That question dos not have a right answer. Could one have done with just single end sequencing, sure. But that is hindsight 20/20.
Depends. Who made the libraries? Submitter did or the sequencing provider did. If submitter submitted pre-made libraries then it is their fault that the inserts are not long. If provider made the libraries you could request that they re-make them, if they advertise/guarantee a 300-400 bp insert size. This also would depend on the initial material that was submitted. If it was not of good quality/intact this is the best result you are going to get.
The sequencing provider most definitely knew that the fragment is ~ 300 bp including adapters. I think that rpolicastro's comment is correct, and the R1,R2 would be redundant almost "by definition", so to speak - not as hindsight.
I do not know who made the libraries, but even if it was the submitter, the sequencing provider should have at least warned him that it is senseless (and costly!!) to sequence 150 from each end of the fragment. I assume that the less the sequence length, the less the cost.
Cost-effectiveness with Novaseq comes from pooling samples on the same flowcell together with many other libraries and 2x150 is a common read length setup. The other samples (e.g. exomes, WGS...) might have benefitted from that setup and they simply included your libraries since there was free space.
Was this 150bp for both the forward and reverse read (for a total of 300bp), or was it 150bp split in half between the forward and reverse read (so 75bp in either direction)?
The first. 150 for each of the reads, not split between them.
If the average fragment size was ~170 bp, then it did not make sense to sequence 150bp in both directions. The R1 and R2 reads would be nearly identical in the best case, and would be reading into the adapter on the other side in the worst case.