How to fix: Error: No module named HTSeq.scripts?
0
0
Entering edit mode
2.2 years ago
Chris ▴ 340

Hi all,

I run the code below

    samtools view MT1/Tophat_Out/accepted_hits.sorted.bam | python -m
HTSeq.scripts.count -q -s no - ~/Indexes/Mus_musculus/UCSC/mm10/Genes/genes.gtf >
MT1/MT1.count.txt

Then I got this error: /bin/python: No module named HTSeq.scripts

I rerun the code and not sure what changed but I got a new error:

Error occurred when reading beginning of SAM/BAM file.
file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False
[Exception type: ValueError, raised in libcalignmentfile.pyx:1000]

I found a solution on the Internet:

https://www.seqanswers.com/forum/bioinformatics/bioinformatics-aa/9688-problem-using-htseq

which said: "the script was added to ./local/bin instead of /bin"

However, I don't know how to apply in my case. Would you please help? Thank you so much!

HTSeq • 1.5k views
ADD COMMENT
1
Entering edit mode

What does pip show HTSeq return?

And is there a particular reason for using HTSeq outside a Python script in the first place? Not to say that it doesn't work, but salmon or featureCounts are way more common programs to quantify RNA-seq data. And it has been a long time that I saw someone using TopHat...since there are faster and more accurate aligners out there in the meantime.

ADD REPLY
0
Entering edit mode

Thank you for your help!

pip show HTSeq

Name: HTSeq
Version: 2.0.2
Summary: A framework to process and analyze data from high-throughput sequencing (HTS) assays
Home-page: https://github.com/htseq
Author: Simon Anders, Fabio Zanini
Author-email: fabio.zanini@unsw.edu.au
License: GPL3
Location: /gpfs2/home/user/.pyenv/versions/3.8.0/lib/python3.8/site-packages
Requires: numpy, pysam
Required-by:

I try to learn to do bulk RNA-seq so I try to reproduce this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373869/
The code above is in step 4 of protocol 1 in this paper. I could not find any better free bulk RNA-seq material with data and code to follow along that why I choose this paper even though I know we have better tools as you said.

I rerun the code in the post and not sure what changed but I got a new error:

Error occurred when reading beginning of SAM/BAM file. file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False [Exception type: ValueError, raised in libcalignmentfile.pyx:1000]

ADD REPLY
1
Entering edit mode

With pip show HTSeq, you corroborated that this module is installed on your system. However, its install path is within a Pyenv virtual environment. So you need to activate the Python 3.8 virtual environment first, which you probably did without realizing when the error message changed.

Did you enter something like this or put it into your .bash_profile or .zprofile file?

export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"

As far as the second error is concerned:

Error occurred when reading beginning of SAM/BAM file. file has no sequences defined (mode='r')

This indicates that your SAM/BAM file has no header. If I am not mistaken, older versions of samtools view by default printed the header as well. Now, you need to specify the -h flag for this behavior.

Therefore, try:

samtools view -h MT1/Tophat_Out/accepted_hits.sorted.bam | ...
ADD REPLY
1
Entering edit mode

Thank you so much! It worked.

ADD REPLY

Login before adding your answer.

Traffic: 2683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6