I have around 150bp, paired-end RNA-seq data of 321 samples with around 30 Million reads per sample and I am interested in quantifying the expression for the transcriptome-wide association studies (TWAS). I have performed QC using fastqc. The per base sequence content and duplication level is high in most of the samples,
For the per base sequence content, the multiqc reports shows that the base content is high in the start of the sequence, especially for the first 10 bases. The fastqc tutoria; says "Whilst this is a true technical bias, it isn't something which can be corrected by trimming and in most cases doesn't seem to adversely affect the downstream" so shall I leave it as it is if it will not affect the downstream analysis or if there is a way to cater it?
For the error in sequence duplication we have used the coverage of around 30M per sample to get the expression of low expressed transcripts so its possible to have high sequence duplication level, but do I need to do something to control/solve this error?
Am I good to proceed with the quantification using Salmon or do I need to perform some sort of action to improve the per-base sequence content and duplication level? I have checked
Suggestions will be highly appreciated