Hello everyone, I'm a Master student from Spain and this is my very first job with bioinformatics and RNA-Seq, so please excuse me if my questions are too easy or are not very clearly explained.
I have two files containing 100pb pair-end reads from Illumina RNA-seq and I want to assembly them into a De Novo transcriptome using Trinity. Up to here everything is OK, but I have some doubts about the process of combining the two files (containing the F and the R reads) to obtain a "consensus" sequence for the k-mer dictionary construction and the downstream processes (actually I don't really know if such a "consensus" sequence is formed or not when you perform the Inchworm algorithm of the assembly).
My main doubt is if the F and the R reads of the 100pd fragment need to be of the same length. I wonder this because the first 9-10 bases of each read have poor per sequence position quality, and if I trim them I don't know if it's going to be a disaster (because I don't know if Trinity align the F and R reads or if it just transforms the R reads to their reverse complementary and obtains the k-mers from the F and the R-transformed reads independently).
I know it's a bit messy but I will be very grateful if anyone can help me.
Thank you so much for answering Seta! It's difficult to know what you are really doing and how it affect your results when working with this huge data sets.
I think I will trim the beginning of the reads in both files (F and R reads) with the FastQ trimmer per column tool implemented in Galaxy and then run Trinity. Hope it helps improving the results.
Thank you again.