About the Tool FASTP
1
0
Entering edit mode
3 days ago
AIMAR • 0

Hello everyone, I hope everything is going well.

I would like to know if any of you have used the FASTP tool or would recommend any documentation (other than the tool's github) for use in the analysis of genetic variants. Because I'd like to use fastp to pre-process my reads before moving on to alignment.

I'm a bit stuck on which options to give to have a better pre-processing of the fastq.files I have.

Thanks in advance!

Quality-control reads fastp • 445 views
ADD COMMENT
1
Entering edit mode

Hey Aimar! Could you be more specific with "pre-processing", fastp can, among other things, remove adapters which is usually a good idea before mapping.

ADD REPLY
0
Entering edit mode

Here's the command line I've used in my script to preprocess my data (DNA paired-end reads). By preprocess I mean adapter removal, trimming low quality bases and reads and N-bases. Also if there's anyone who can suggest some articles/document that could be helpful to better understanding the analysis and customize my fastp command. I also add --correction option to made some correction by using overlapping (--overlap_len_require = 30 bases by default).

 # Exécuter fastp
        echo " Fastp preprocessing with $SAMPLE..."
        fastp \
            -i "$R1_FILE" -I "$R2_FILE" \
            -o "$trimmed_r1" -O "$trimmed_r2" \
            --html "$html_report" --json "$json_report" \
            --detect_adapter_for_pe \
            --qualified_quality_phred 20 \
            --average_qual 20 \
            --length_required 50 \
            --n_base_limit 5 \
            --cut_front \
            --cut_tail \
            --cut_window_size 4 \
            --cut_mean_quality 20 \
            --trim_poly_g \
            --poly_g_min_len 10 \
            --trim_poly_x \
            --poly_x_min_len 10 \
            --correction \
            --thread 8
ADD REPLY
0
Entering edit mode

Here's the command line I've used in my script to preprocess my data (DNA paired-end reads). By preprocess I mean adapter removal, trimming low quality bases and reads and N-bases. Also if there's anyone who can suggest some articles/document that could be helpful to better understanding the analysis and customize my fastp command. I also add --correction option to made some correction by using overlapping (--overlap_len_require = 30 bases by default).

 # Exécuter fastp
        echo " Fastp preprocessing with $SAMPLE..."
        fastp \
            -i "$R1_FILE" -I "$R2_FILE" \
            -o "$trimmed_r1" -O "$trimmed_r2" \
            --html "$html_report" --json "$json_report" \
            --detect_adapter_for_pe \
            --qualified_quality_phred 20 \
            --average_qual 20 \
            --length_required 50 \
            --n_base_limit 5 \
            --cut_front \
            --cut_tail \
            --cut_window_size 4 \
            --cut_mean_quality 20 \
            --trim_poly_g \
            --poly_g_min_len 10 \
            --trim_poly_x \
            --poly_x_min_len 10 \
            --correction \
            --thread 8
ADD REPLY
1
Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts. SUBMIT ANSWER is for NEW answers for the original question.

ADD REPLY
0
Entering edit mode

Oh sorry my bad, I would nex.

ADD REPLY
0
Entering edit mode

As others have indicated don't overthink this part of analysis. You don't need to change default parameters unless you have a very specific reason to do so (e.g. your data has bad qualities going in).

Strictly speaking you don't need to scan/trim (strictly needed, if you are going to do de novo work, to remove extraneous sequence). Aligners will generally handle extraneous sequence by "soft-clipping". But scanning/trimming is fine to do for peace of mind.

ADD REPLY
0
Entering edit mode

Understood, I think from now it's good cause I've checked the quality and it seems good than before.

ADD REPLY
0
Entering edit mode

One problem that I encounter if I don't trim reads is that taxonomic classification of untrimmed reads often produces incorrect results. The reason is that some genomes deposited in NCBI were assembled from untrimmed reads and thus contain adapter sequences. Therefore, if you align a read with an adapter to NCBI nt, the best match is often to some unrelated genome that contains an adapter. If you align by BLASTN the sequence of the Illumina universal adapter (GATCGGAAGAGCACACGTCTGAACTCCAGTCA) to NCBI nt, you'll see many genomes that contain adapters.

ADD REPLY
1
Entering edit mode
3 days ago
shelkmike ★ 1.5k

In my experience, for paired end reads Fastp is great with default parameters. For single end reads I suggest to give it the adapter sequence via the option --adapter_sequence, because otherwise Fastp tries to guess the adapter sequence from reads and too often Fastp isn't able to do this correctly.

ADD COMMENT

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6