Hello everyone, I hope everything is going well.
I would like to know if any of you have used the FASTP tool or would recommend any documentation (other than the tool's github) for use in the analysis of genetic variants. Because I'd like to use fastp to pre-process my reads before moving on to alignment.
I'm a bit stuck on which options to give to have a better pre-processing of the fastq.files I have.
Thanks in advance!
Hey Aimar! Could you be more specific with "pre-processing", fastp can, among other things, remove adapters which is usually a good idea before mapping.
Here's the command line I've used in my script to preprocess my data (DNA paired-end reads). By preprocess I mean adapter removal, trimming low quality bases and reads and N-bases. Also if there's anyone who can suggest some articles/document that could be helpful to better understanding the analysis and customize my fastp command. I also add
--correction
option to made some correction by using overlapping (--overlap_len_require
= 30 bases by default).Here's the command line I've used in my script to preprocess my data (DNA paired-end reads). By preprocess I mean adapter removal, trimming low quality bases and reads and N-bases. Also if there's anyone who can suggest some articles/document that could be helpful to better understanding the analysis and customize my fastp command. I also add
--correction
option to made some correction by using overlapping (--overlap_len_require
= 30 bases by default).Please use
ADD REPLY/ADD COMMENT
when responding to existing posts.SUBMIT ANSWER
is for NEW answers for the original question.Oh sorry my bad, I would nex.
As others have indicated don't overthink this part of analysis. You don't need to change default parameters unless you have a very specific reason to do so (e.g. your data has bad qualities going in).
Strictly speaking you don't need to scan/trim (strictly needed, if you are going to do
de novo
work, to remove extraneous sequence). Aligners will generally handle extraneous sequence by "soft-clipping". But scanning/trimming is fine to do for peace of mind.Understood, I think from now it's good cause I've checked the quality and it seems good than before.
One problem that I encounter if I don't trim reads is that taxonomic classification of untrimmed reads often produces incorrect results. The reason is that some genomes deposited in NCBI were assembled from untrimmed reads and thus contain adapter sequences. Therefore, if you align a read with an adapter to NCBI nt, the best match is often to some unrelated genome that contains an adapter. If you align by BLASTN the sequence of the Illumina universal adapter (GATCGGAAGAGCACACGTCTGAACTCCAGTCA) to NCBI nt, you'll see many genomes that contain adapters.