RNA sequencing
1
0
Entering edit mode
2.6 years ago
aranyak111 • 0

I have a fundamental question to the Bioinformatics community in general.

Two different sequencing platforms produce sequencing reads prepared in the same way by the same protocol that has to be matched with a reference genome/transcriptome. one of them is of the length 70 bps and the other one is around 150bps. How do you know whether one alignment is more accurate than the other?

Any information regarding the query will be useful.

Genomics • 1.4k views
ADD COMMENT
0
Entering edit mode

Could this be homework? The way you phrased it sounds like you have no idea about the answer, or at least don't know how to defend the answer. If you do, showing some effort may get others to chime in.

ADD REPLY
0
Entering edit mode

I am working in the field of genomics sequencing for quite some time. I have been both working in the field of mRNA and small RNA sequencing for quite some time. Usually for large reads in the range of 70-75 nucleotides aligned by splice aware software like STAR what matters is the proprtion of sequences aligned based on the alignment summary report. I guess for specific information about locus of interest or SNV, CNV detection reads of length 70 to 150 nucleotides alignment may not reveal specific information as there may be spurious alignment generating many non specific reads. I think the sequencing technique and the question we are trying to pursue are vital things to consider.

ADD REPLY
0
Entering edit mode
2.6 years ago

For common experiments, the difference between both of these will be negligible, i.e., not noticeable. You would have to sequence the [c]DNA to an extremely high depth of coverage to be able to have the statistical power to detect differences between both, in which case it can be inferred that 150bp would result in more accurate alignment of reads.

However, ultimately, as it can be assumed that you are referring to Next Generation Sequencing, both read lengths --and the informatics tools used to process these-- would be terrible at faithfully representing the [c]DNA sequence that they are supposed to decode.

Kevin

ADD COMMENT
0
Entering edit mode

To extend on what I mean:

Next Generation Sequencing (NGS) permitted that we could generate data more rapidly, but there were multiple and important costs to this. For one, the length of the reads used in most NGS applications are too short --including 150bp-- meaning, that, for a decent proportion of the genome, we have zero chance to faithfully align these reads to infer position / locus. Thus, for many RNA-, ChIP-, DNA-seq experiments, the results are summarised and lack accuracy, irrespective of the read lengths.

*long-read sequencers face other important obstacles

NGS --in particular the Sequencing-By-Synthesis method that Illumina purchased [yes, they purchased it and don't develop anything themselves]-- also has high error rates. The data is just very messy.

Conclusion: there are countless regions of the genome that require customised assays in order for these regions to be analysed with confidence.

ADD REPLY
0
Entering edit mode

Just wondering, what if the question mentioned by @aranyak111 is for small RNA-seq, e.g. where one is interested in studying miRNA (21-25bp). Does read length have a significant role to play in mapping to those miRNAs or how small RNA-Seq pipelines deal with it?

ADD REPLY
0
Entering edit mode

Could be, indeed.

ADD REPLY

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6