Hi Everyone
So I am using RNA-seq data. I have raw fastq files which has read length of around 100bp.
I quickly ran the fastqc to check for the quality of my fastq file, and I inferred that I need to remove first 10 bases(even though quality scores are high for them , per base % gc content is higher for first 10 bp). Since my read length is 100 bp and as we go along the read the quality decreases, I would like to generate a fastq file which has reads starting from base position 10 till base position 80.
How can I do that??
Any command would be useful??
Hope to hear back soon
Regards
I have got the same case here, high quality score and strangely biased GC content and base frequencies. Do you have any idea how/why does this problem emerge?
Hi this problem i think occurs becoz of the rna seq done during library preparation since we use random hexamer. I think there is a paper on this which explains it. I will post it here as soon as i find it.
Varun
Thank you. Will be looking forward to it.
Hi Here's the link
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2896536/
Hope this solves some of your doubts..