Hi all!
I'm in possession of a small RNA-seq dataset obtained from Illumina PE-50bp flow cell sequencing instead of SR-50bp flow cell.
Does small RNA require sequencing to be single-end only (i.e. to detect miRNAs)? Can paired-end be used as well? If yes, what tools do you suggest? Could I still make use of miRDeep2?
Thank you in advance!
Dario
One add: A simple merge of PE1 and PE2 would result in a problem and destroy your miRDeep2 run, since PE2 is the reverse complement of the sequenced molecule. Assure the you use tools like BBmerge, FLASH, PEAR, COPE, or fastq-join to do the merging step, since they assure that you get the correct strand. h.mon is saying exactly that... I just want to point out that merging the files with a linux call like
cat PE1.fastq PE2.fastq
would fail.And one more thing: Using paired-end sequencing for microRNA analysis does not make a lot of sense. It's a waste of time and money, since you sequence both directions and then delete one by merging them together. It might increase your quality, but the quality is normally not a problem in that length range.
How about if I didn't merge and only considered PE1 for my analysis for example?
Then you waste money and information. :)
Now that you have both, merge them: you will increase (probably not by much, but anyway...) the overall quality of the sequences; for these short sequences, merging is a good way to tell you where are the adapters, and if PE1 and PE2 do not merge you know there is something wrong.
I will quote something I've read somewhere: "Then you waste money and information." ;-)
Hello,
I have small RNA seq raw data and we want to do differntial expression focusing on snoRNA. I followed the path as you described here.
But when I used fastq-join the the final file has only few thousands reads from the millions input reads ? I used standard fastq-join command. Am I missing something here?
mbansal : Paired-end sequencing should not be needed for small RNA .. which are well small in length. You probably can get away with using just R1 from your data. You will need to know the kit that was used to make the libraries since there will be specific instructions to trim the adapter away so you are left with small RNA sequence.
Thank you so much for prompt reply. We have outsourced the samples for sequencing. They have given us the adapters sequence used for sequencing.
Read 1 : AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC Read 2 : GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATT
Upon searching, I found they are NEB primers (https://www.neb.com/faqs/2017/07/17/how-should-my-nebnext-small-rna-library-be-trimmed). I am think to use seqPrep ( https://github.com/jstjohn/SeqPrep) to remove adapter and merge both the reads, followed by mapping using segemehl. Do you think this would be the right approach?
Would merging take into account the paired-end nature of the reads? Would I lose info if I simply merged, or would merging give an output file as if it had been sequenced single-end? I'm concerned with the integrity of the data, since I've never heard of paired-end for small RNA sequencing...