Question

Pair-end sequences assembly with Trinity (data mining doubt)

0

Entering edit mode

9.6 years ago

guillermo.ponz.segrelles ▴ 30

Hello everyone, I'm a Master student from Spain and this is my very first job with bioinformatics and RNA-Seq, so please excuse me if my questions are too easy or are not very clearly explained.

I have two files containing 100pb pair-end reads from Illumina RNA-seq and I want to assembly them into a De Novo transcriptome using Trinity. Up to here everything is OK, but I have some doubts about the process of combining the two files (containing the F and the R reads) to obtain a "consensus" sequence for the k-mer dictionary construction and the downstream processes (actually I don't really know if such a "consensus" sequence is formed or not when you perform the Inchworm algorithm of the assembly).

My main doubt is if the F and the R reads of the 100pd fragment need to be of the same length. I wonder this because the first 9-10 bases of each read have poor per sequence position quality, and if I trim them I don't know if it's going to be a disaster (because I don't know if Trinity align the F and R reads or if it just transforms the R reads to their reverse complementary and obtains the k-mers from the F and the R-transformed reads independently).

I know it's a bit messy but I will be very grateful if anyone can help me.

RNA-Seq next-gen Assembly • 2.0k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by guillermo.ponz.segrelles ▴ 30

Ram · Answer 1 · 2015-04-13

0

Entering edit mode

9.6 years ago

seta ★ 1.9k

Hi, if your two file are R and F data,separately, you can combine them or not based on trinity commands. You should evaluate your quality data and try to trim the poor quality bases at the first, having poor quality base at the beginning of read is normal and can trim them to have the better assembly, no worry about it.

ADD COMMENT • link updated 3.7 years ago by Ram 44k • written 9.6 years ago by seta ★ 1.9k

0

Entering edit mode

Thank you so much for answering Seta! It's difficult to know what you are really doing and how it affect your results when working with this huge data sets.

I think I will trim the beginning of the reads in both files (F and R reads) with the FastQ trimmer per column tool implemented in Galaxy and then run Trinity. Hope it helps improving the results.

Thank you again.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 9.6 years ago by guillermo.ponz.segrelles ▴ 30