Question

About the Tool FASTP

0

Entering edit mode

3 days ago

AIMAR • 0

Hello everyone, I hope everything is going well.

I would like to know if any of you have used the FASTP tool or would recommend any documentation (other than the tool's github) for use in the analysis of genetic variants. Because I'd like to use fastp to pre-process my reads before moving on to alignment.

I'm a bit stuck on which options to give to have a better pre-processing of the fastq.files I have.

Thanks in advance!

Quality-control reads fastp • 445 views

ADD COMMENT • link updated 2 days ago by shelkmike ★ 1.5k • written 3 days ago by AIMAR • 0

1

Entering edit mode

Hey Aimar! Could you be more specific with "pre-processing", fastp can, among other things, remove adapters which is usually a good idea before mapping.

ADD REPLY • link 3 days ago by Gabriel R. ★ 2.9k

0

Entering edit mode

Here's the command line I've used in my script to preprocess my data (DNA paired-end reads). By preprocess I mean adapter removal, trimming low quality bases and reads and N-bases. Also if there's anyone who can suggest some articles/document that could be helpful to better understanding the analysis and customize my fastp command. I also add --correction option to made some correction by using overlapping (--overlap_len_require = 30 bases by default).

 # Exécuter fastp
        echo " Fastp preprocessing with $SAMPLE..."
        fastp \
            -i "$R1_FILE" -I "$R2_FILE" \
            -o "$trimmed_r1" -O "$trimmed_r2" \
            --html "$html_report" --json "$json_report" \
            --detect_adapter_for_pe \
            --qualified_quality_phred 20 \
            --average_qual 20 \
            --length_required 50 \
            --n_base_limit 5 \
            --cut_front \
            --cut_tail \
            --cut_window_size 4 \
            --cut_mean_quality 20 \
            --trim_poly_g \
            --poly_g_min_len 10 \
            --trim_poly_x \
            --poly_x_min_len 10 \
            --correction \
            --thread 8

ADD REPLY • link updated 2 days ago by Ram 45k • written 3 days ago by AIMAR • 0

0

Entering edit mode

Here's the command line I've used in my script to preprocess my data (DNA paired-end reads). By preprocess I mean adapter removal, trimming low quality bases and reads and N-bases. Also if there's anyone who can suggest some articles/document that could be helpful to better understanding the analysis and customize my fastp command. I also add --correction option to made some correction by using overlapping (--overlap_len_require = 30 bases by default).

 # Exécuter fastp
        echo " Fastp preprocessing with $SAMPLE..."
        fastp \
            -i "$R1_FILE" -I "$R2_FILE" \
            -o "$trimmed_r1" -O "$trimmed_r2" \
            --html "$html_report" --json "$json_report" \
            --detect_adapter_for_pe \
            --qualified_quality_phred 20 \
            --average_qual 20 \
            --length_required 50 \
            --n_base_limit 5 \
            --cut_front \
            --cut_tail \
            --cut_window_size 4 \
            --cut_mean_quality 20 \
            --trim_poly_g \
            --poly_g_min_len 10 \
            --trim_poly_x \
            --poly_x_min_len 10 \
            --correction \
            --thread 8

ADD REPLY • link updated 2 days ago by Ram 45k • written 3 days ago by AIMAR • 0

1

Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts. SUBMIT ANSWER is for NEW answers for the original question.

ADD REPLY • link 3 days ago by GenoMax 150k

0

Entering edit mode

Oh sorry my bad, I would nex.

ADD REPLY • link 3 days ago by AIMAR • 0

0

Entering edit mode

As others have indicated don't overthink this part of analysis. You don't need to change default parameters unless you have a very specific reason to do so (e.g. your data has bad qualities going in).

Strictly speaking you don't need to scan/trim (strictly needed, if you are going to do de novo work, to remove extraneous sequence). Aligners will generally handle extraneous sequence by "soft-clipping". But scanning/trimming is fine to do for peace of mind.

ADD REPLY • link 3 days ago by GenoMax 150k

0

Entering edit mode

Understood, I think from now it's good cause I've checked the quality and it seems good than before.

ADD REPLY • link 3 days ago by AIMAR • 0

0

Entering edit mode

One problem that I encounter if I don't trim reads is that taxonomic classification of untrimmed reads often produces incorrect results. The reason is that some genomes deposited in NCBI were assembled from untrimmed reads and thus contain adapter sequences. Therefore, if you align a read with an adapter to NCBI nt, the best match is often to some unrelated genome that contains an adapter. If you align by BLASTN the sequence of the Illumina universal adapter (GATCGGAAGAGCACACGTCTGAACTCCAGTCA) to NCBI nt, you'll see many genomes that contain adapters.

ADD REPLY • link 2 days ago by shelkmike ★ 1.5k

score 1 · Answer 1 · 2025-04-07

In my experience, for paired end reads Fastp is great with default parameters. For single end reads I suggest to give it the adapter sequence via the option --adapter_sequence, because otherwise Fastp tries to guess the adapter sequence from reads and too often Fastp isn't able to do this correctly.