Question

Trimming RNAseq data for transcriptome assembly

0

Entering edit mode

2.7 years ago

al_capone13 • 0

Hey everyone,

I downloaded the Rna seq data from the organism of interest and my goal is to produce a de novo transcriptome assembly. When I ran fastqc on the raw reads I get a warning or failure on the "per base sequence content" module(the problem is detected at the first 15 bases

enter image description here

AND I get checkmark on the adapter content..I could use trimmomatic with the headcrop parameter, but I think that's not an efficient way(too much info lost). Can you suggest me an efficient way of getting the checkmark on this module? (without cropping all the reads)

Thank you

Rna-seq • 1.1k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 2.7 years ago by al_capone13 • 0

1

Entering edit mode

Also to note is that the values in the first ten positions are not binned, after which they become an average of 2 positions.

ADD REPLY • link 2.7 years ago by Istvan Albert 102k

0

Entering edit mode

This is a common case. I would trim 5` region and map them if alignments are improved after trimming.

ADD REPLY • link 2.7 years ago by binodregmi30 ▴ 10

score 3 · Answer 1 · 2022-05-13

This has been asked here multiple times before :)

in a nutshell: FastQC is historically not meant to check RNAseq data, it was intended for DNAseq so some of it's checks are suboptimal when dealing with RNAseq data. Where this one is exactly an example of such a case.

bottom line: no need to worry about the 'variability' in the beginning of the read , that's normal (moreover tha graph also suffers from binning, from base 10 onwards it's plotted in a binned manner, below 10 it's per base).

Adapters you are best to trim those ones of as you for sure know they are not part of your assembly and should thus never be present in your final assembly result.