Question

assembly without trimming vs with trimming

0

Entering edit mode

4.0 years ago

tiagobellintani ▴ 40

Hi friends. I performed a comparative test between two pipeline (Trimmomatic + Trinity + busco) and compared with the pipeline (Trinity + Busco). I realized that I obtained a substantial gain from Busco groups through the pipeline with the raw sequence (RNAseq, trancriptome, Illumina Hiseq, pairend). Now I am undecided, but trying and proceeding with the analyzes without trimading? What do you suggest?

pipeline (Trimmomatic + Trinity + busco)

Trinity

Total assembled = 63677017 Number of contigs = 78868 Number of Trinity unigenes = 60510 Contigs longer than 1000 = 18429 Contigs longer than 2000 = 7059 Contigs longer than 10000 = 79 Longest contig = 28033 Median = 403 Average = 807.39 N50 = 1451

Busco

C:88.8%[S:58.6%,D:30.2%],F:3.6%,M:7.6%,n:2510
2229 Complete BUSCOs (C)
1471 Complete and single-copy BUSCOs (S)
758 Complete and duplicated BUSCOs (D)
90 Fragmented BUSCOs (F)
191 Missing BUSCOs (M)
2510 Total BUSCO groups searched

pipeline(Trinity + Busco)

Trinity

Total assembled = 69407639 Number of contigs = 84709 Number of Trinity unigenes = 64655 Contigs longer than 1000 = 19883 Contigs longer than 2000 = 7917 Contigs longer than 10000 = 93 Longest contig = 33336 Median = 400 Average = 819.37 N50 = 1506

Busco

C:89.8%[S:58.0%,D:31.8%],F:2.8%,M:7.4%,n:2510
2256 Complete BUSCOs (C)
1457 Complete and single-copy BUSCOs (S)
799 Complete and duplicated BUSCOs (D)
70 Fragmented BUSCOs (F)
184 Missing BUSCOs (M)
2510 Total BUSCO groups searched

assembly busco Trimmomatic Trinity • 1.7k views

ADD COMMENT • link updated 4.0 years ago by b.schiffthaler ▴ 20 • written 4.0 years ago by tiagobellintani ▴ 40

0

Entering edit mode

If there is any extraneous sequence (that does not belong to the genome you are working with) going into the assembly then that assembly is not correct. No matter what the stats say.

ADD REPLY • link 4.0 years ago by GenoMax 147k

0

Entering edit mode

I used a bank of ortholog I am looking for my species (Order level), you believe that even so, there may be redundancies.

Note: my raw data has a very good qualitative profile, so I opted for the test, with the assembly with raw data.

Thanks.

ADD REPLY • link 4.0 years ago by tiagobellintani ▴ 40

0

Entering edit mode

Trimmomatic is being used to remove adapter sequences correct? Those have no place in your de novo assembly. Since you are getting different results compared to when you do not trim there must be some extraneous sequence in your reads. That should not be included in the assembly.

Total assembled difference below. To be fair we don't know if that is what got assembled and there was more sequence that went in. Have you checked stats on the actual input?
69407639 (no trim) - 63677017 (trimmed) = 5730622

ADD REPLY • link 4.0 years ago by GenoMax 147k

0

Entering edit mode

I trimmed the tips, indexes, adapters so I got a trimmed sequence of less reads. Wouldn't it be possible that this trimmed part did not remain transcribed, that is, when I trimmed, did I not only remove these "leftovers" plus important transcript data, which Busco recognized?

Thanks for discussion.

ADD REPLY • link 4.0 years ago by tiagobellintani ▴ 40

0

Entering edit mode

If you did extra trimming beyond what was by normal scanning/trimming for adapters then potentially you could have lost data. But that does not mean you can go back and do NO trimming. You still need to trim where you ensure that no extraneous sequence (sorry to harp on it) gets into your assembly.

ADD REPLY • link 4.0 years ago by GenoMax 147k

0

Entering edit mode

Would it be possible for me to assess whether it is reductions or really transcript data? Because all quality analyzes (Fastqc) indicated that there was no contamination from another organism, and the metrics were very good, Phred indexes higher than 35

ADD REPLY • link 4.0 years ago by tiagobellintani ▴ 40

0

Entering edit mode

Unless you specifically scan and trim you are not going to remove adapter contamination, if any. FastQC does not look at your entire dataset for all metrics. It sub-samples data for many of the tests it does. That is generally a good approximation for gross quality.

I suggest you take a look at bbduk.sh which is an efficient scan/trim program, if you are inclined. A guide is available.

ADD REPLY • link 4.0 years ago by GenoMax 147k

score 0 · Answer 1 · 2020-11-13

You should always trim adapters and other foreign sequence (if you know possible sources of contaminants). Any trimming beyond that (quality trimming) is likely going to degrade the quality of your assembly. You can trim _very_ mildy, but even there I'd be careful.

Here's a paper from 2014 looking into that. Up to a sliding window trimmer with PHRED>=5 improved the assembly, anything higher degraded it.