Hi,
I am using the plotPCA function of DESeq2 to see how my samples look overall and maybe find any bad samples. I first provide the BAM files to featurecounts and then import those counts to DESeq2 for further analysis. However, since I load the original sample bam files, their names are super long, and so when I run plotPCA, the plot has these long messy names. Here is where I load the BAMs -
counts <- featureCounts(nthreads=3, isGTFAnnotationFile=TRUE, annot.ext="/Volumes/bam/DRG/annotations/Homo_sapiens.GRCh38.95.gtf", files=c('40T4L.fastqAligned.sortedByCoord.out.bam', '41T7R.fastqAligned.sortedByCoord.out.bam', '42T7L.fastqAligned.sortedByCoord.out.bam', '42T7R.fastqAligned.sortedByCoord.out.bam', '44T10R.fastqAligned.sortedByCoord.out.bam', '44T11L.fastqAligned.sortedByCoord.out.bam', '44T11R.fastqAligned.sortedByCoord.out.bam', '45T10L.fastqAligned.sortedByCoord.out.bam', '45T11R.fastqAligned.sortedByCoord.out.bam', '46T8L.fastqAligned.sortedByCoord.out.bam', '46T8R.fastqAligned.sortedByCoord.out.bam', '47T7L.fastqAligned.sortedByCoord.out.bam', '47T7R.fastqAligned.sortedByCoord.out.bam'))$counts
And later I run plotPCA with label names (because I want to be able to see individual samples on the plot) thus -
plotPCA(vsd, ntop=1000) + geom_text(aes(label=name),vjust=2,check_overlap = TRUE,size = 4)
The resultant plot looks like this - https://imgur.com/bcdu4J6 . As you can see, the long file name ruins the plot. Is there a way to rename the samples at some point (instead of having to rename the original BAM files) so that the final plot has shorter sample names?
My whole code is here -
counts <- featureCounts(nthreads=3, isGTFAnnotationFile=TRUE, annot.ext="/Volumes/bam/DRG/annotations/Homo_sapiens.GRCh38.95.gtf", files=c('40T4L.fastqAligned.sortedByCoord.out.bam', '41T7R.fastqAligned.sortedByCoord.out.bam', '42T7L.fastqAligned.sortedByCoord.out.bam', '42T7R.fastqAligned.sortedByCoord.out.bam', '44T10R.fastqAligned.sortedByCoord.out.bam', '44T11L.fastqAligned.sortedByCoord.out.bam', '44T11R.fastqAligned.sortedByCoord.out.bam', '45T10L.fastqAligned.sortedByCoord.out.bam', '45T11R.fastqAligned.sortedByCoord.out.bam', '46T8L.fastqAligned.sortedByCoord.out.bam', '46T8R.fastqAligned.sortedByCoord.out.bam', '47T7L.fastqAligned.sortedByCoord.out.bam', '47T7R.fastqAligned.sortedByCoord.out.bam'))$counts
sampleTable <- data.frame(condition = factor(c('P', 'P', 'P', 'P', 'NP', 'NP', 'NP', 'NP', 'P', 'P', 'P', 'P', 'P')))
coldata <- sampleTable
deseqdata <- DESeqDataSetFromMatrix(countData=counts, colData=coldata, design=~condition)
dds <- DESeq(deseqdata)
vsd <- vst(dds)
plotPCA(vsd, ntop=1000) + geom_text(aes(label=name),vjust=2,check_overlap = TRUE,size = 4)