Hello everyone,
I had paired-end data which I merged using fastp and performed several prepocessing analysis on them. As a result, I ended up with merged, unmerged_paired_1, and unmerged_paired_2 files.
Now, I want to reformat these processed files back into paired-end format. How can I do this correctly?
So assuming for one sample:
merged_reads : 1,200,000
unmerged_reads_1 : 1,800,000
unmerged_reads_2 : 1,800,000
I want to get paired-end format again.
reformat.sh in=merged_reads out1=forward.reads out2=reverse.reads
cat forward.reads unmerged_reads_1 > final_paired_1.reads
cat forward.reads unmerged_reads_2 > final_paired_2.reads
Is that technically correct? If not, do you have any suggestions to do it in a correct way. Thank you.
Thank you GenoMax for your answer. I want to add some details concerning this puzzling and extend the topic. Maybe It is a bit out of the topic but related to my question anyway.
Here, I follow the procedure what I explained and I mapped two different format to a reference using bowtie2 and salmon. Here are the results for different format for the real data.
and here same reformat paired end reads I used for salmon-mapper. I know
bowtie2
andsalmon
are totally different aligner but I am confused at that point. Here only one paired was considered: 48770307Log File:
In bowtie2 result, when I divided total reads,
97540614/2
, I get the same number as Salmon says. So, why should I expect Salmon not give me97540614
for initial reads but bowtie2 does? That's why I was afraid I made a mistake about reformating the reads but I think It is not about reformatting issue. What do you think?This is indeed off topic for original question. Programs may account of input reads in different ways.
I suggest that you verify that reads are in sync in your files first. If they are not then the entire results above is null and void.