Good morning,
I know this topic has been discussed but I still have some doubts in its regard.
I've done RNA seq 2x150 bp and before receiving my data the company did a QC check and the library size was ~ 300bp. When I received my data I used FASTQC to check the data and the quality was quite good. There were no poor quality reads, the length was 150 but the adaptors were present.
So, I used bbmerge to find the adaptors, I trim them and then I merge the reads, however for one sample the avg insert size was 148bp and for the other samples were between 151bp and 173 bp. The % of joined reads was also high (the lowest % was 80% , but for most of the samples it was ~90%.
This is one example:
Adapters counted: 32566620 Total time: 710.930 seconds. Pairs: 62324086 Joined: 56271221 90.288% Ambiguous: 5211149 8.361% No Solution: 841716 1.351% Too Short: 0 0.000% Avg Insert: 148.1 Standard Deviation: 47.2 Mode: 122
Insert range: 35 - 291 90th percentile: 217 75th percentile: 176 50th percentile: 140 25th percentile: 113 10th percentile: 95
So, my question is should I be worried about the not so good insert size? Because at this stage there is nothing I could do about it rightsta? Plus should I trim the adaptors and then merge the reads or do the other way around? I think the first option should be the best since the quality around the edges of the reads is usually low right?
Thank you for your help!
Yes, because if you don't trim the adapters, the reads won't merge anyway. But you should merge before quality trimming, because joining reads will improve quality.
Are you performing transcriptome assembly? If you are just mapping and counting, do not merge.
AFAIK
bbmerge
can use the adapter information for merging. So don't trimming before merging increases the merging rate.fin swimmer
Are you doing assembly? It's otherwise not clear why you would merge the reads to begin with.