Struggling With Cufflinks Into Cuffmerge
1
4
Entering edit mode
13.0 years ago

Heya, I'm trying to stick RNAseq data through tophat and cufflinks, however, cufflinks doesn't like the accepted_hits.bam (we're on v1.2.0 now, but its still the same issue):

EDIT- I'm so sorry about the formatting

cufflinks -o cufflinksoases10 /home/sbica1/tophat/tophatoutput/oases10/acceptedhits.bam Warning: Your version of Cufflinks is not up-to-date. It is recommended that you upgrade to Cufflinks v1.2.0 to benefit from the most recent features and bug fixes (http://cufflinks.cbcb.umd.edu). Warning: BAM header too large File /home/sbica1/tophat/tophatoutput/oases10/accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM... [10:42:43] Inspecting reads and determining fragment length distribution. SAM error on line 40479: invalid CIGAR operation SAM error on line 48713: CIGAR op has zero length SAM error on line 48714: CIGAR op has zero length

SAM error on line 51328: CIGAR op has zero length

Goes on

SAM error on line 21228840: CIGAR op has zero length SAM error on line 21233743: CIGAR op has zero length Processed 0 loci.

I've converted the accepted_hits.bam to a .sam and it goes through really nicely, but now when I try using cuffmerge, it has an issue with sorting even though its come from the latest tophat:

cuffmerge -p 30 manifest.txt

[Thu Dec 1 18:00:16 2011] Beginning transcriptome assembly merge

[Thu Dec 1 18:00:16 2011] Preparing output location ./mergedasm/ Warning: no reference GTF provided! [Thu Dec 1 18:00:18 2011] Converting GTF files to SAM [18:00:18] Loading reference annotation. [18:01:52] Loading reference annotation. [18:04:38] Loading reference annotation. [Thu Dec 1 18:04:43 2011] Assembling transcripts Warning: Your version of Cufflinks is not up-to-date. It is recommended that you upgrade to Cufflinks v1.2.1 to benefit from the most recent features and bug fixes (http://cufflinks.cbcb.umd.edu). [bamheaderread] EOF marker is absent. [bamheaderread] invalid BAM > binary header (this is not a BAM file). File ./mergedasm/tmp/mergeSam_fileJeUyrw doesn't appear to be a valid BAM file, trying SAM... [18:04:44] Inspecting reads and determining fragment length distribution.

Error: this SAM file doesn't appear to be correctly sorted! current hit is at Locus10004Transcript2/24Confidence1.000Length181:2, last one was at Locus10004Transcript21/24Confidence0.023Length669:44 Cufflinks requires that if your file has SQ records in the SAM header that they appear in the same order as the chromosomes names in the alignments. If there are no SQ records in the header, or if the header is missing, the alignments must be sorted lexicographically by chromsome name and by position. [FAILED] Error: could not execute cufflinks

Even when I sort the .sam as in the manual, it has issues:

cufflinks -p 10 -o cufflinkssortedoases10 /home/sbica1/tophat/tophatoutput/oases10/sortedhits.sam Warning: Your version of Cufflinks is not up-to-date. It is recommended that you upgrade to Cufflinks v1.2.1 to benefit from the most recent features and bug fixes (http://cufflinks.cbcb.umd.edu). [bamheaderread] EOF marker is absent. [bamheaderread] invalid BAM binary header (this is not a BAM file). File /home/sbica1/tophat/tophatoutput/oases10/sorted_hits.sam doesn't appear to be a valid BAM file, trying SAM... [20:09:25] Inspecting reads and determining fragment length distribution.

Error: this SAM file doesn't appear to be correctly sorted! current hit is at Locus100001Transcript1/1Confidence1.000Length131:9, last one was at Locus100000Transcript1/1Confidence1.000Length269:115 Cufflinks requires that if your file has SQ records in the SAM header that they appear in the same order as the chromosomes names in the alignments. If there are no SQ records in the header, or if the header is missing, the alignments must be sorted lexicographically by chromsome name and by position.

I've also tried converting back to a .bam as in the bowtie manual, but no dice. I'd really appreciate if anyone can advise a work around or point out what I've done wrong (aside from pasting too much in one go). Cheers guys, Craig

cuffdiff cufflinks cuffmerge rna • 7.5k views
ADD COMMENT
0
Entering edit mode

Tophat/Bowtie always had issues with how they sort the bam files. It could be how you are naming your reference chromosome names. The error seems to suggest that cufflinks thinks those two chromosomes are the same since they said the coordinates aren't matching.

ADD REPLY
3
Entering edit mode
12.9 years ago
John St. John ★ 1.2k

I am having the same issue. Notice the very first warning, that your BAM header is too large. I get that too. Are you also doing this on a de-novo assembly with a few hundred thousand fragments?

I peeked into the cufflinks source, and on line 745 of v1.3.0 of hits.cpp

 if (header->l_text >= MAX_HEADER_LEN)
 {
        fprintf(stderr, "Warning: BAM header too large\n");
        return false;
 }

where MAX_HEADER_LEN is defined as:

 MAX_HEADER_LEN = 4 * 1024 * 1024; // 4 MB

I then looked at one of my offending files to see what the byte count of the header is (using wc -c)

samtools view -H accepted_hits.bam | wc -c
4815350

So indeed my header is 4,815,350 which is over 4,194,304 (4 * 1024 * 1024).

I then tried modifying the max variable in hits.cpp (line 731 in v1.3.0) to see if that fixes the problem:

static const unsigned MAX_HEADER_LEN = 6 * 1024 * 1024; // 6 MB

Everything made properly, and now things appear to be working. Everything appears to have finished successfully! Hopefully though this won't break something else downstream. I am not a cufflinks developer so I am not sure what other effects this could have on the program.

I just wrote to the cufflinks developers to let them know about the change, and to ask them if they could double check that it doesn't cause issues. Maybe it will work its way into a future release...

Good luck!

ADD COMMENT

Login before adding your answer.

Traffic: 1553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6