Entering edit mode
7.8 years ago
irritable_phd_syndrome
▴
130
I have a cuffcompare.stats file that contains :
# Cuffcompare v2.2.1 | Command line was:
#cuffcompare -o cuffmerge/cuffcompare -r /reference/mus_musculus/GRCm38/ensembl/release-86/Annotation/Genes/gtf/Mus_musculus.GRCm38.86.gtf -R -s /reference/mus_musculus/GRCm38/ensembl/release-86/Sequence/Chromosomes/ -C -V cuffmerge/merged.gtf
#
#= Summary for dataset: cuffmerge/merged.gtf :
# Query mRNAs : 151818 in 59836 loci (114796 multi-exon transcripts)
# (17978 multi-transcript loci, ~2.5 transcripts per locus)
# Reference mRNAs : 117527 in 47037 loci (95479 multi-exon)
# Super-loci w/ reference transcripts: 41604
#--------------------| Sn | Sp | fSn | fSp
Base level: 100.0 74.3 - -
Exon level: 113.0 110.3 100.0 100.0
Intron level: 99.6 96.4 100.0 98.7
Intron chain level: 80.0 66.5 100.0 100.0
Transcript level: 76.6 59.3 72.7 56.2
Locus level: 99.6 77.3 100.0 77.5
Matching intron chains: 76386
Matching loci: 46831
Missed exons: 2/396144 ( 0.0%)
Novel exons: 18667/405769 ( 4.6%)
Missed introns: 781/261696 ( 0.3%)
Novel introns: 1838/270629 ( 0.7%)
Missed loci: 0/47037 ( 0.0%)
Novel loci: 12834/59836 ( 21.4%)
Total union super-loci across all input datasets: 59834
where
wc -l cuffmerge/cuffcompare/cuffcompare.loci
> 59834 cuffmerge/cuffcompare/cuffcompare.loc
The last line says that there 59834 "super-loci", however the line above seems to imply 59836 loci. What is the difference?
Likewise when I count the number of loci with novel transcripts (ie class_code = "j"), I get 12627 instead of 12834.
grep "class_code \"j\"" cuffmerge/merged.gtf | awk 'BEGIN{FS="\t"}{print $9}' | awk '{print $2}' | sort | uniq | wc -l
12627
What is the difference here also?