Hi all,
I was having a different computational problem, I had to compare a single end RNA-seq to paired end RNA seq. So my approach was to take the forward strand reads from the paired end sample and trim the read length and quality score of this fastq file and then map it using BWA, followed by regular stuff of estimating RPKM.
But how can I trim the read length and quality score?
any one liners in perl or awk to do this or is their something in picard tools?
Please share your experience and knowledge.
Thank you
https://github.com/lh3/seqtk
http://www.usadellab.org/cms/?page=trimmomatic
In case, both of your paired end reads are in the same file first separate them and then trim the forward reads.
Suggestion: If your goal is to compare these two set of files for the end results (RPKM) in your case I don't think there is any point in just considering forward reads. You can compare a fragment library with a paired-end library and compare their complexities, contamination etc. And please use Splice aware aligner like TopHat or STAR if you want to count the reads spanning exons.
If the pairs are in separate files, you could just work on each file and
substr()
on awk whenNR%2==0
, no?