How to loop over certain files within a directory that belongs to certain samples
2
0
Entering edit mode
15 months ago

EDIT by RamRS

Cross-posted on bioinfo SE: https://bioinformatics.stackexchange.com/questions/21464/how-to-loop-over-certain-files-within-a-directory-that-belongs-to-certain-sample


I have two samples where the sequencing was done on 3-lanes paired end reads. The two samples are A44943_1 and A44944_1.

I would like to generate a for in loop to merge the R1 files of sample A44943_1 from the 3-lanes, and then do the same for R2 files of this sample, then I want to merge R1 files of sample A44944_1 from the 3-lanes and then do the same for R2 files of this sample. So I thought about for in loop. My fastq files looks like:

R0480-S0001_182602S1P1_A44943_1_H77YFDRXY_CGCAACTA-GAATCCGA_L001_R1_trimmed.fastq
R0480-S0001_182602S1P1_A44943_1_H77YFDRXY_CGCAACTA-GAATCCGA_L001_R2_trimmed.fastq
R0480-S0001_182602S1P1_A44943_1_H77YFDRXY_CGCAACTA-GAATCCGA_L002_R1_trimmed.fastq
R0480-S0001_182602S1P1_A44943_1_H77YFDRXY_CGCAACTA-GAATCCGA_L002_R2_trimmed.fastq
R0480-S0001_182602S1P1_A44943_1_H7YKTDRXY_CGCAACTA-GAATCCGA_L003_R1_trimmed.fastq
R0480-S0001_182602S1P1_A44943_1_H7YKTDRXY_CGCAACTA-GAATCCGA_L003_R2_trimmed.fastq
R0480-S0002_182602S1P2_A44944_1_H77YFDRXY_CACAGACT-TGGTACAG_L001_R1_trimmed.fastq
R0480-S0002_182602S1P2_A44944_1_H77YFDRXY_CACAGACT-TGGTACAG_L001_R2_trimmed.fastq
R0480-S0002_182602S1P2_A44944_1_H77YFDRXY_CACAGACT-TGGTACAG_L002_R1_trimmed.fastq
R0480-S0002_182602S1P2_A44944_1_H77YFDRXY_CACAGACT-TGGTACAG_L002_R2_trimmed.fastq
R0480-S0002_182602S1P2_A44944_1_H7YKTDRXY_CACAGACT-TGGTACAG_L003_R1_trimmed.fastq
R0480-S0002_182602S1P2_A44944_1_H7YKTDRXY_CACAGACT-TGGTACAG_L003_R2_trimmed.fastq

Could you advise me on a for in loop to do this task?

Thanks

shell fastq • 1.0k views
ADD COMMENT
0
Entering edit mode

https://www.cyberciti.biz/faq/bash-for-loop/ . What have you tried so far ?

ADD REPLY
0
Entering edit mode

Here is an easier to understand option: Concatenating fastq.gz files across lanes

There are other answers you can use in the thread as well.

ADD REPLY
0
Entering edit mode

Do not post on multiple forums - that is just bad etiquette. What's worse, you did not link between the two so no one knows that you're asking two sets of online volunteers to spend their time on your problem without telling them that you're also asking the other group.

You've done this multiple times. If you repeat this behavior, you risk your account being suspended.

If this comment is any indicator, this is how much you annoy people in both communities with this sort of behavior.

ADD REPLY
1
Entering edit mode
15 months ago

I crammed a function into my .bash_rc to do just this:

function concat_fastq {

    sample=$1

    if [[ -z "$sample" ]]; then
        echo "A sample name must be provided as an argument."
        return 1
    fi


    # Check if files for R2 exist
    if ls "${sample}"_L00*_R2_001.fastq.gz 1> /dev/null 2>&1; then
        # This is a paired-end sample
        echo "Processing paired-end sample: $sample"

        # Concatenate R1
        cat "${sample}"_L00*_R1_001.fastq.gz > "${sample}_merged_R1_001.fastq.gz"
        if [[ $? -ne 0 ]]; then
            echo "An error occurred while concatenating R1 files."
            return 1
        fi

        # Concatenate R2
        cat "${sample}"_L00*_R2_001.fastq.gz > "${sample}_merged_R2_001.fastq.gz"

        if [[ $? -ne 0 ]]; then
            echo "An error occurred while concatenating R2 files."
            return 1
        fi
    else

        # This is a single-end sample
        echo "Processing single-end sample: $sample"

        # Concatenate R1 only
        cat "${sample}"_L00*_R1_001.fastq.gz > "${sample}_merged_R1_001.fastq.gz"
        if [[ $? -ne 0 ]]; then
            echo "An error occurred while concatenating R1 files."
            return 1
        fi
    fi

    echo "Concatenation complete for sample: $sample"
}

To use:

for sample_name in $(ls *_L00*_R1_001.fastq.gz | rev | cut -d'_' -f4- | rev | sort | uniq); do
    concat_fastq "$sample_name"
done

Your file suffixes are a bit different, so will require some light edits.

ADD COMMENT
1
Entering edit mode
15 months ago
bk11 ★ 3.0k

OR You can use cat instead of for loop and so sth like this:

cat R0480-S0001*R1_trimmed.fastq > R0480-S0001_L123_R1_trimmed.fastq
cat R0480-S0001*R2_trimmed.fastq > R0480-S0001_L123_R2_trimmed.fastq

cat R0480-S0002*R1_trimmed.fastq >R0480-S0002_L123_R1_trimmed.fastq
cat R0480-S0002*R2_trimmed.fastq >R0480-S0002_L123_R2_trimmed.fastq
ADD COMMENT

Login before adding your answer.

Traffic: 2431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6