I am obtaining quotes for transcriptome sequencing. I see that researchers often use both PacBio and Illumina to obtain and correct their transcriptome, but I only have the funds to do one of these.
My aim is to produce a de novo assembly for a eukaryote with no reference genome - a protist with a very large genome.
There seems to be a lot of conflicting advice out there. I have been told that PacBio Iso-Seq is great for transcriptomes as it produces whole transcripts. I am puzzled about why it is recommended above Illumina for de novo assembly, as it seems that the high indel rate will make it difficult to predict and annotate genes. I am concerned that there will be a large number of genes that I will miss as they will have poor database matches.
The quotes that I have obtained are similar for PacBio and Illumina NextSeq (2 x 75 bp, 150 cycles).
Any advice or opinions would be welcome!
Thank you.
Karen
karenkvn : If you have only one shot at doing this then you are likely to get more data with Illumina and it will likely be useful (though you may not get information about alternative splicing etc). If your organism has not been successfully sequenced using PacBio then the hurdles of getting a good library there are going to be higher. Unless you do some selection rRNA's otherwise those would likely form a large part of the data.
Not doing/having enough sequence is going to severely constrain what you can discover. Ideally you would want to do DNAseq as well but in real world non-research constraints always take center stage.
Excellent point that I haven't given much thought. Can you elaborate a bit? Do you mean if there's no genome or if there's no information about the rRNA?
I was thinking of practical experimental hurdles. We informaticians never consider the possibility that a particular organism may be difficult to extract DNA/RNA from. They may have tough cell walls, may produce carbohydrates/proteins that end up in nucleic acid prep that cause problems with sequencing. PacBio will require special handling for Isoseq and any additional problem like the one above will add to risk of failure. If rRNA sequence is not known (likely) then trying to deplete it would be yet another challenge.
Right, making sure that you're able to obtain enough high-quality starting material is probably going to be a greater issue for PacBio because they generally need more to start with. On the other hand, Illumina sequencing will present a biased picture, too, if the starting material isn't of sufficiently high quality.