Heya, I'm trying to stick RNAseq data through tophat and cufflinks, however, cufflinks doesn't like the accepted_hits.bam (we're on v1.2.0 now, but its still the same issue):
EDIT- I'm so sorry about the formatting
cufflinks -o cufflinksoases10 /home/sbica1/tophat/tophatoutput/oases10/acceptedhits.bam Warning: Your version of Cufflinks is not up-to-date. It is recommended that you upgrade to Cufflinks v1.2.0 to benefit from the most recent features and bug fixes (http://cufflinks.cbcb.umd.edu). Warning: BAM header too large File /home/sbica1/tophat/tophatoutput/oases10/accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM... [10:42:43] Inspecting reads and determining fragment length distribution. SAM error on line 40479: invalid CIGAR operation SAM error on line 48713: CIGAR op has zero length SAM error on line 48714: CIGAR op has zero length
SAM error on line 51328: CIGAR op has zero length
Goes on
SAM error on line 21228840: CIGAR op has zero length SAM error on line 21233743: CIGAR op has zero length Processed 0 loci.
I've converted the accepted_hits.bam to a .sam and it goes through really nicely, but now when I try using cuffmerge, it has an issue with sorting even though its come from the latest tophat:
cuffmerge -p 30 manifest.txt
[Thu Dec 1 18:00:16 2011] Beginning transcriptome assembly merge
[Thu Dec 1 18:00:16 2011] Preparing output location ./mergedasm/ Warning: no reference GTF provided! [Thu Dec 1 18:00:18 2011] Converting GTF files to SAM [18:00:18] Loading reference annotation. [18:01:52] Loading reference annotation. [18:04:38] Loading reference annotation. [Thu Dec 1 18:04:43 2011] Assembling transcripts Warning: Your version of Cufflinks is not up-to-date. It is recommended that you upgrade to Cufflinks v1.2.1 to benefit from the most recent features and bug fixes (http://cufflinks.cbcb.umd.edu). [bamheaderread] EOF marker is absent. [bamheaderread] invalid BAM > binary header (this is not a BAM file). File ./mergedasm/tmp/mergeSam_fileJeUyrw doesn't appear to be a valid BAM file, trying SAM... [18:04:44] Inspecting reads and determining fragment length distribution.
Error: this SAM file doesn't appear to be correctly sorted! current hit is at Locus10004Transcript2/24Confidence1.000Length181:2, last one was at Locus10004Transcript21/24Confidence0.023Length669:44 Cufflinks requires that if your file has SQ records in the SAM header that they appear in the same order as the chromosomes names in the alignments. If there are no SQ records in the header, or if the header is missing, the alignments must be sorted lexicographically by chromsome name and by position. [FAILED] Error: could not execute cufflinks
Even when I sort the .sam as in the manual, it has issues:
cufflinks -p 10 -o cufflinkssortedoases10 /home/sbica1/tophat/tophatoutput/oases10/sortedhits.sam Warning: Your version of Cufflinks is not up-to-date. It is recommended that you upgrade to Cufflinks v1.2.1 to benefit from the most recent features and bug fixes (http://cufflinks.cbcb.umd.edu). [bamheaderread] EOF marker is absent. [bamheaderread] invalid BAM binary header (this is not a BAM file). File /home/sbica1/tophat/tophatoutput/oases10/sorted_hits.sam doesn't appear to be a valid BAM file, trying SAM... [20:09:25] Inspecting reads and determining fragment length distribution.
Error: this SAM file doesn't appear to be correctly sorted! current hit is at Locus100001Transcript1/1Confidence1.000Length131:9, last one was at Locus100000Transcript1/1Confidence1.000Length269:115 Cufflinks requires that if your file has SQ records in the SAM header that they appear in the same order as the chromosomes names in the alignments. If there are no SQ records in the header, or if the header is missing, the alignments must be sorted lexicographically by chromsome name and by position.
I've also tried converting back to a .bam as in the bowtie manual, but no dice. I'd really appreciate if anyone can advise a work around or point out what I've done wrong (aside from pasting too much in one go). Cheers guys, Craig
Tophat/Bowtie always had issues with how they sort the bam files. It could be how you are naming your reference chromosome names. The error seems to suggest that cufflinks thinks those two chromosomes are the same since they said the coordinates aren't matching.