My question is how to improve reads quality based on input data. I mean what kind of filtering and trimming to use in case of sequences reads from RNA-seq, scRNA-seq Chip-Seq, RIP-Seq, CNV, SNP or it does not matter and it is based on other factors?
You are not really "improving reads quality" but by trimming you are simply throwing away data that is adapter sequence (or is below an average quality threshold that you decided to use). This action improves average Q-scores for your data.
In most applications as long as you are aligning to a good reference aligners should be able to tackle "bad" data (adapters or bases with poor quality). That said time spent once scanning/trimming the data ensures that you have clean data for any downstream applications.
Only time you would want to be strict is if you were doing any de novo work. There you will want to make sure to remove any extraneous sequence and perhaps data with quality scores below Q20 or so.
One needs to prepare "libraries" (collection of nucleic acid fragments that come from original sample), for most methods you mentioned above. This generally requires adding long oligos to ends of these fragments (so the "library" fragments can bind the flowcell surface to be sequenced). Sequence for these "oligos" (adapters) is known based on the kit used or can be inferred by sampling data.
Q-scores refer to Phred-quality scores which estimate the probability of a particular basecall being incorrect (LINK). Each base will have an associated q-score.
Many aligners will remove parts of a read that do not align with the reference when they write alignments out to SAM format. This is termed "soft clipping". In case of Illumina sequencing one expects adapter sequence to be present only on 3'-end of the read (because of the way that sequencing works) and only when the length of sequencing is longer than the length of the insert. So in theory you could skip trimming and let the aligner "soft clip" non-aligned part of the read.
Thanks for reply. What you mean by "adapter sequence"?
Or "Q-score"?
One needs to prepare "libraries" (collection of nucleic acid fragments that come from original sample), for most methods you mentioned above. This generally requires adding long oligos to ends of these fragments (so the "library" fragments can bind the flowcell surface to be sequenced). Sequence for these "oligos" (adapters) is known based on the kit used or can be inferred by sampling data.
Q-scores refer to Phred-quality scores which estimate the probability of a particular basecall being incorrect (LINK). Each base will have an associated q-score.
generally speaking I need to remove adapters that are known but it is not necessary in some cases?
Many aligners will remove parts of a read that do not align with the reference when they write alignments out to SAM format. This is termed "soft clipping". In case of Illumina sequencing one expects adapter sequence to be present only on 3'-end of the read (because of the way that sequencing works) and only when the length of sequencing is longer than the length of the insert. So in theory you could skip trimming and let the aligner "soft clip" non-aligned part of the read.
Okay, thanks for the explanation