Trimming RNAseq data for transcriptome assembly
1
0
Entering edit mode
2.6 years ago

Hey everyone,

I downloaded the Rna seq data from the organism of interest and my goal is to produce a de novo transcriptome assembly. When I ran fastqc on the raw reads I get a warning or failure on the "per base sequence content" module(the problem is detected at the first 15 bases

enter image description here

AND I get checkmark on the adapter content..I could use trimmomatic with the headcrop parameter, but I think that's not an efficient way(too much info lost). Can you suggest me an efficient way of getting the checkmark on this module? (without cropping all the reads)

Thank you

Rna-seq • 1.1k views
ADD COMMENT
1
Entering edit mode

Also to note is that the values in the first ten positions are not binned, after which they become an average of 2 positions.

ADD REPLY
0
Entering edit mode

This is a common case. I would trim 5` region and map them if alignments are improved after trimming.

ADD REPLY
3
Entering edit mode
2.6 years ago

This has been asked here multiple times before :)

in a nutshell: FastQC is historically not meant to check RNAseq data, it was intended for DNAseq so some of it's checks are suboptimal when dealing with RNAseq data. Where this one is exactly an example of such a case.

bottom line: no need to worry about the 'variability' in the beginning of the read , that's normal (moreover tha graph also suffers from binning, from base 10 onwards it's plotted in a binned manner, below 10 it's per base).

Adapters you are best to trim those ones of as you for sure know they are not part of your assembly and should thus never be present in your final assembly result.

ADD COMMENT
3
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6