Hi!
I'm checking the output from gene_bodyCoverage.py using a bed file with housekeeping genes.
We sequenced some samples using a polyA capture library (all in the same kit, same sequencing run), so I would expect a 3' bias. However, the output I got was somewhat mixed - I see some samples having a 3' bias as expected (purple), and other with borderline a 5' bias (green and blue) (not extremely but the profile plot seems to have a peak around bin 20 and from there on decrease towards the 3')...
I would understand that some samples would not have such an extreme bias (maybe RNA got degraded worse in some of them), but an opposite bias?? What does that even mean??
I am concerned of course that these samples are of bad quality - although, checking the reports from fastqc, they seem to be good.
I am also concerned if this in anyway could have been something related to how I ran rseqc or did the mapping? (I don't see how it would matter, but I am confused! I highly doubt it's something from the dry-lab since I used the same mapping strategy with all of them and ran RseQC with all the output bam files together)
How I mapped:
STAR --runThreadN 20 --readFilesCommand gunzip -c --outFilterMultimapNmax 1 --outFilterMismatchNoverLmax 0.03 --outSAMattributes All --outSAMtype BAM SortedByCoordinate --sjdbGTFfile /path/to/gencode.v36.annotation.gtf --genomeDir /path/to/index/star/ --outFileNamePrefix some_prefix_ --readFilesIn /path/to/fastq/sample1_R1_001.fastq.gz /path/to/fastq/sample1_R2_001.fastq.gz
How I ran RseQC:
geneBody_coverage.py -r hg38.housekeeping.bed -i samples_bam.txt -o prefix
Anyone have had a similar output? Or can someone help me interpreting this output?
Thank you!