Cuffdiff Not Seeming To Work With Ucsc Gtf
4
1
Entering edit mode
13.9 years ago
Sam ▴ 80

Hello,

I've been working with a public dataset from the SRA. I've gone through conventional RNAseq workflow of alignment and sorting the SAM files using samtools. I'm now trying to run CuffDiff on the sorted SAM files and submitting a RefSeq GTF file downloaded from UCSC (HG_18). However, my attempts are unsuccessful because I keep getting the following type of error:

[12:33:02] Inspecting maps and determining fragment length distributions.
Error: this SAM file doesn't appear to be correctly sorted!
    current hit is at chrX:2710105, last one was at chr22:49568431
You may be able to fix this by running:
    $ LC_ALL="C" sort -k 3,3 -k 4,4n input.sam > fixed.sam

My SAM files are sorted in the correct order upon inspect:

chr1
chr2
chr3
...
chrX
chrY
chrM

(however, i tried deleting the chrM aligned reads, hence the above error between chr22 and chrX)

Can anyone provide advice on overcoming this problem?

Thanks!

cufflinks cuffdiff ucsc gtf • 5.8k views
ADD COMMENT
1
Entering edit mode
13.9 years ago

The part of the error message reading 'You may be able to fix this by running: ...' only appears in older versions of cufflinks (0.9.0). The authors strongly recommend upgrading.

In newer versions, the message reads:

"Cufflinks requires that if your file has SQ records in the SAM header that they appear in the same order as the chromosomes names in the alignments. If there are no SQ records in the header, or if the header is missing, the alignments must be sorted lexicographically by chromsome name and by position."

The first part of that statement applies to all valid SAM/BAM files. The order of the reference sequences in the header dictates the order the alignments must appear in the body of the file. Each SAM file must be sorted according to its own header.

If you sorted the BAM files (I don't think samtools will sort SAM text) then the order you get will be determined by the BAM header. This may be not be lexicographic order. If you then converted to text SAM format, but without including the SAM header, cuffdiff will expect a lexicographic sort, but will get the BAM ordering instead.

ADD COMMENT
0
Entering edit mode
13.9 years ago

I may be wrong , but if your program expects a simple alphanumeric order , then your chromosomes should be ordered like this:

chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr21
chr22
chrM
chrX
chrY
ADD COMMENT
0
Entering edit mode

This is indeed the case.

ADD REPLY
0
Entering edit mode
13.1 years ago
Natasha • 0

I had the same error some time ago. What I did was re-align my files, convert the SAM to BAM files, sort the BAM files and provide them to cufflinks. It worked for me.

ADD COMMENT
0
Entering edit mode
12.4 years ago

Hey Sam, I had the same issue. Natasha's method may be more robust, but what worked for me was to determine the order in the sam file (cut the column | uniq), and then change the order of the @SQ lines in the header to match the order in the body of the sam file. Hope this helps you.

perl script //

use warnings;
use strict;

open (ORIG, "original)header.txt") or die $!;
open (NEW_ORDER, "new_header_order.txt") or die $!;

print '@HD',"\tVN:1.0\tSO:coordinate\n";

my %chr_hash;
while (<ORIG>) {
        chomp;
        my @tab_split = split /\t/, $_;
        my @colon_split = split /:/, $tab_split[1];
        my $chr = $colon_split[1];
        $chr_hash{$chr} = $tab_split[3];
}
my @new_order;
while (<NEW_ORDER>) {
        chomp;
        push (@new_order,$_)
}

for my $new_chr (@new_order) {
        if (exists $chr_hash{$new_chr}) {
                print '@SQ',"\tSN:$new_chr\tLN:\t$chr_hash{$new_chr}\n"
        }
        else {
                print "chr $new_chr does not have an entry\n"
        }
}

print '@PG',"\tID:cuffmerge\tVN:1.0.0\n";
ADD COMMENT

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6