I trimmed adaptors from PE reads data, and ended up with uneven pairs
(i.e. R1 and R2 are not the same size)
Does anyone know how I can run these in bowtie2?
is there a way to even things up?
I used cutadapt to cut the adaptors, is there maybe a way to even things up in cutadapt?
thanks!
ADD COMMENT
• link
updated 8.1 years ago by
SES
8.6k
•
written 8.1 years ago by
blur
▴
280
1
Entering edit mode
If you have mismatched number of reads in R1 and R2 files then be careful using solutions below (you may have discordant alignments, if you have done them, that is not clear from your OP). Take a look at repair.sh from BBMap to bring your R1/R2 files back in sync. Something like
You don't need to unzip your files (even though the example above shows them that way). Both input and output can stay gzipped.
If you did want to use the singletons file with bowtie2 you should be able to do that.
You could avoid all this problem by using bbduk.sh from BBMap and trimming R1/R2 files in sync. AFAIK cutadapt is able to trim files in pairs. Perhaps you did not use it appropriately.
I think you can use Trim Galore It is a warper for cutadapt to handle this issue (If you are interested in using cutadapt);
Trim Galore! can remove sequences if they become too short during the
trimming process. For paired-end files Trim Galore! removes entire
sequence pairs if one (or both) of the two reads became shorter than
the set length cutoff. Reads of a read-pair that are longer than a
given threshold but for which the partner read has become too short
can optionally be written out to single-end files. This ensures that
the information of a read pair is not lost entirely if only one read
is of good quality
meanwhile have a look at bbduk.sh It also do a great job
any of the previous tools will give you paired mate pair
bonus
there is tool that could fix it directly ( again try the previous tools first)
as I wrote in my answer you need to rerun the process using one of the tools I suggested; If you do not want to do that (I highly suggest that you do it ), then align your fastq file to the reference then use the other solution with the bam file
I need to use an option of cutadapt that does not seem to be in trim_galore or any other tool.
\I want to remove 10 bp from the start of read 2 and the end of read 1, and couldn't find that option anywhere...
If you read the trim galore manual, you would see that this is possible with trim galore. If you think it is not possible, you either haven't read the manual, or you are too lazy to read it and hope someone will spell it out for you??
I did read the trim_galore manual. I saw the Clip options, but wasn't sure it suited my data, that's why I asked a question.
if you don't want to help others in need, you could just not answer them - calling someone lazy for not getting something makes you feel good? what sort of person does that make you exactly?
I'm sure you asked silly questions starting out, did people call you lazy for it?
Instead of doing the trimming again (which can be a long process), the easiest solution is to repair the reads. Here's a very lightweight solution for pairing the reads (using Pairfq):
That command doesn't require any installation or hard coding paths to programs, so it's quite easy to work into a pipeline. The input can be compressed and you can compress the output if you want (if I remember correctly). The --stats option tells you the results. The other arguments are the trimmed files as input and the paired and unpaired, or singleton, reads as output.
The most efficient approach, of course, is to avoid unnecessary steps like this but it's not always ideal to go back and redo your work. There are other approaches also but hopefully this helps.
If you have mismatched number of reads in R1 and R2 files then be careful using solutions below (you may have discordant alignments, if you have done them, that is not clear from your OP). Take a look at
repair.sh
from BBMap to bring your R1/R2 files back in sync. Something likerepair.sh in1=SRR1972739_1_trim.fastq in2=SRR1972739_2_broken.fastq out=stdout.fq outsingle=SRR1972739_broken_reads.fastq | reformat.sh in=stdin.fq out1=SRR1972739_1_fixed.fastq out2=SRR1972739_2_fixed.fastq interleaved addcolon
thanks, that seems like a good idea!
You don't need to unzip your files (even though the example above shows them that way). Both input and output can stay gzipped.
If you did want to use the singletons file with bowtie2 you should be able to do that.
You could avoid all this problem by using
bbduk.sh
from BBMap and trimming R1/R2 files in sync. AFAIKcutadapt
is able to trim files in pairs. Perhaps you did not use it appropriately.