Hello everyone.
I have a small RNA library. in this library, I have filtered out reads of certain length but I need to know what percentage of this reads begin with a certain nucleotide (G). I know this has been done in some papers but they dont say how they went about doing it. Any help or suggestions would be appreciated.
Thanks
Using a ready-made solution from the BBTools / BBMap package (note, FastQC should provide these results as well):
reformat.sh in=file.fastq bhist=file.bhist.txt
As I am waiting for a drive scan (possible a catastrophic drive failure), why not unroll my own solution? Save the following as countG.pl:
#!/usr/bin/perl
while (<>){
$lines++;
if ( $lines % 4 == 2) {
if ( /^G/i ) { $G++; }
else { $H++; }
}
}
print "Number of reads starting with G = $G\tNumber of reads starting with A/T/C/not-G = $H\n";
Make it executable with chmod +x countG.pl, and run it with ./countG.pl file.fastq, or zcat file.fastq.gz | ./countG.pl. Now, I only made this little script because I am waiting for the drive scan, please use reformat.sh for a more general and robust solution.
Please put a tiny little bit of effort at solving your problems, instead of chastising me for not reading your question carefully enough. You can craft a very small test fastq file to answer your question.
So is the output the total number of Gs or the total number of reads that begins with a G?
Yes. And my hard-drive didn't suffer a catastrophic failure, by the way.
Please read what I asked. your answer is for which part of my question?
Please put a tiny little bit of effort at solving your problems, instead of chastising me for not reading your question carefully enough. You can craft a very small test fastq file to answer your question.