Question

Dexseq_Count And Bwa

1

Entering edit mode

13.0 years ago

K. Wyatt Mcmahon ▴ 50

Hello BioStar friends!

I've gotten so much help here previously and I'm looking forward to help here again!

I'm trying to use DEXSeq (a Bioconductor package) to identify and quantitate changes in splicing following expression of a transgene. I've performed the mapping of my Illumina paired-end data using BWA and now have SAM files. No problems up to this point.

The problem occurs when I run dexseq_count.py, a Python script associated with the DEXSeq package which is dependent on HTSeq in Python. I keep getting the following messages:

"UserWarning: Read HWUSI-EAS381R:1:10:12325:4848#CGATGT claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)"

After a couple of searches I've found that HTSeq seems not to like BWA-generated SAM files. What I have not found was a work-around for this (although some people write things like "I wrote a perl script to get around this problem" with no further details).

I can use Bowtie (which is recommended), but we have a BWA Convey system so the speed up would be phenomenal, and I have quite a few of these.

Can someone help me with this?

Thanks in advance,

Wyatt

bwa htseq • 3.8k views

ADD COMMENT • link 13.0 years ago by K. Wyatt Mcmahon ▴ 50

score 1 · Answer 1 · 2011-12-12

1

Entering edit mode

13.0 years ago

Damian Kao 16k

From this [?]old seqanswers post[?], the bottom two posts by Simon Anders who wrote HTSeq:

If you have paired-end data, htseq-count expects that the SAM file is sorted by read name. This is because the SAM format prescribes that paired reads have the same read ID, and hence, sorting causes the read pairs to appear in adjacent line.pairs to appear in adjacent line.

Your error suggest that HTSeq cannot find the aligned mate data. It might be because your SAM file is not sorted by read name.

ADD COMMENT • link 13.0 years ago by Damian Kao 16k

0

Entering edit mode

Thanks so much for your help, DK. I've tried this and have thus far not been able to get the script to work. I'm trying the sampe step again in hopes that something went wrong there and that's why I'm not getting the script to work.

ADD REPLY • link 13.0 years ago by K. Wyatt Mcmahon ▴ 50

score 1 · Answer 2 · 2011-12-14

1

Entering edit mode

13.0 years ago

Michael 55k

Did you try to run your output through SAMtools sort command?

ADD COMMENT • link 13.0 years ago by Michael 55k

0

Entering edit mode

Yes, I've tried SAMTools, and although the sorting worked, the script still found problems. I've even used a linux one-liner. Neither has worked. I'm redoing the sampe step now just hoping that something went wrong there. Thanks for your help!

ADD REPLY • link 13.0 years ago by K. Wyatt Mcmahon ▴ 50

score 0 · Answer 3 · 2011-12-21

I wanted everyone to know what ended up happening with this:

I had used SAMtools sort, and was still getting the same problem.

It seems that when I demultiplexed my Illumina run, the same number of reads did not end up in each paired fastq file. So I had around 15 million reads in my read1 file, but only 11 million in my read2 file.

I wrote a perl script to make sure the same names are in each file. If you'd like a copy of this perl script comment on my answer here and I'll send it to you. Or if there's a way to share easily I'd be happy to use that too.

Thanks for your help!

Wyatt