Sorry if this may appear like a really dumb question, but I am new to RNAseq and feeling lost about some of the terminologies and calculations used in literature.
I understand what's the difference between the principle behind paired-end and single-end reads, however I need to clarify the method to calculate number of reads in paired end vs. single end reads.
For single end reads, I was told that I could calculate the no. of reads by: wc -l <name of="" fastq="" file="">, divided by 4.
For paired end reads, because 2 files are generated (R1, R2), so is it correct that I calculate the number of reads by adding the read count derived from R1 and R2?
Thank you very much for your clarification.
Hi Devon,
Thank you very much for your reply. Just for clarification, I calculated the number of fragments sequenced in the fastQ file for R1 and R2 for one sample and derived this: R1 - 499385 fragments. R2 - 487958 fragments. Because there's a difference between the no. of fragments sequenced for R1 and R2, so how should I correctly describe the number of fragments sequenced? Thank you very much again.
You apparently screwed up the read syncing between the files at some point, likely during trimming. You'll need to resync the files. The
reformat.sh
tool from BBMap is convenient for this.Small correction. @Genosa: Tool from BBMap you would want is
repair.sh
.bbduk.sh is also a great option for trimming (and it is PE aware so this sort of situation would not happen).
Thank you very much. I'll go and take a look at BBMap and try to fix it. What do you mean by error in 'read syncing' and what is the reason why it was introduced in the first place? I got the files back from the sequencing service and went through all the samples. So apparently, ALL the samples R1/R2 have very different fragment counts and I am not sure why this is happening. If I do not run reformat.sh, will this adversely affect the subsequent QC/mapping/Alignment/Differential Gene Expression analysis? Thank you!
The Nth read in each of the files should originate from the same cluster. This was the case when the files were originally created and pretty much all aligners assume that this is true and will give non-sensical results if you, for example, delete a read in only one file. Having the Nth read in each file arising from the same cluster (btw, you can tell if this is the case because the read names are more or less the same) is referred to as "the reads being in sync."
The most common way of getting paired-end files out of sync is by using a trimmer that processes only one fastq file at a time (e.g., fastx tools). If you use something like "Trim Galore!" or "trimmomatic" then this shouldn't happen.
Hi Ryan, I am not sure if I've done it correctly but I'm having a problem running the bbmap program. I have zero computer commands knowledge and I'm self-teaching as I'm working along. So I downloaded the BBmap package from source forge, and basically followed the read-me.txt and did as it instructed by unzipping. I am currently using Mac OS X. So I entered terminal and changed the parent directory to where the BBMAP *.sh files re kept. Then I entered "run repair.sh" A new X-code screen now pops up and something like this shows up:
Then at this stage I become a bit stuck. I tried entering the command (I have tried both terminal and X-code): "repair.sh in1=<file name="" of="" r1="">.fq in2=<file name="" of="" r2="">.fq out1=fixed1.fq out2=fixed2.fq outsingle=singletons.fq" However, none works.
My files are named in this format : SAMPLENAME_REPLICATE#_R1.fastq.gz.
I wonder what have I done wrong in any of the steps? Thank you very much
Don't do
run repair.sh
.Do you have java installed on your Mac? BBMap is a java program.
As shown in the example above you just need to run (provided the BBMap directory is in your $PATH otherwise use full path to repair.sh). There should be no spaces between any of the names, options and = sign.
If your sample files are not in the same directory as BBMap programs then modify your command to accommodate the file paths.
/path_to/repair.sh in1=/path_to/SAMPLENAME_REPLICATE#_R1.fastq.gz in2=/path_to/SAMPLENAME_REPLICATE#_R2.fastq.gz out1=/path_to/fixed_SAMPLENAME_REPLICATE#_R1.fastq.gz out2=/path_to/fixed_SAMPLENAME_REPLICATE#_R2.fastq.gz outsingle=/path_to/singleteons_SAMPLENAME_REPLICATE#.fastq.gz
Spend a couple of hours here: http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part1 familiarizing yourself with a bit of command line.