htseq-count cannot locate gff
0
0
Entering edit mode
5.6 years ago
Melonhead ▴ 10

Hi there

I am struggling with the trivial task of running htseq-count. My command is as follows:

htseq-count -r pos \
-s no \
-t exonic_part \
-i transcripts \
--additional-attr=gene_id \
-m union \
--nonunique=all \
-f sam \
Analysis/Samples/sam/"${prefix}_Mapped.sortedByCoord.out.q10.R1.sam" \
Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff >  \
Analysis/Samples/htseq_output/"${prefix}_counts_dexseq_int.counts"

Every time it exits with an error message:

    usage: .htseq-count-real [options] alignment_file gff_file.htseq-count-real: error: the following arguments are required: featuresfilename

This is not the standard .gff gile. Yet, even if I use the standard Ensembl .gff and correct arguments for -i, -t options I am still receiving the same error.

What could be messed up in my command?

Will be grateful for any advice

RNA-Seq alignment • 1.6k views
ADD COMMENT
1
Entering edit mode

Can you show us what you get if you do cat -A HTseq_count_int.sh? Or, try not to loop over ${prefix} but run the command in shell directly using one mapping sam with Tab completion for the file path, and see what you get.

If in the script it can detect the text Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff but the path is wrong, it should show something like:

Error occured when processing GFF file (line 1 of file Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff):
  [Errno 2] No such file or directory: 'Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff'
  [Exception type: FileNotFoundError, raised in __init__.py:47]
ADD REPLY
0
Entering edit mode

The output of cat -A HTseq_count_int.sh :

#!/usr/bin/bash$
$
#Script to convert .BAM to .SAM and quantify gene expression with HTseq$
$
#Get WD$
currentdir=$(cd -P -- "$(dirname -- "$0")" && pwd -P)$
wd=$(dirname $currentdir)$
wdsamp="Analysis/Samples"$
$
#Declare an array of IDs$
declare -a arr=("SRR8919665" $
                "SRR8919666"$
                "SRR8919667"$
                "SRR8919668"$
                "SRR8919669"$
                "SRR8919670"$
                "SRR8919671"$
                "SRR8919672"$
                "SRR8919673"$
                "SRR8919674")$
 $
#Function to convert bams to sams $
convert_to_sam () {$
samtools sort -o "$wd"/"$wdsamp"/sam/"${prefix}_Mapped.sortedByCoord.out.q10.R1.sam" -O sam -T "${prefix}" -@ 2 "$wd"/"$wdsamp"/bam/"${prefix}_Mapped.sortedByCoord.out.q10.R1.bam"$
}$
$
#Function to compute counts with HTseq                               $
quantify_exp () {$
htseq-count \$
    -r pos \$
    -s no \$
    -t exonic_part \$
    -i transcripts \$
    --additional-attr=gene_id \$
    -m union --nonunique=all \$
    -f sam "$wd"/"$wdsamp"/sam/"${prefix}_Mapped.sortedByCoord.out.q10.R1.sam" \ $
    Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff > \$
    "$wd"/"$wdsamp"/htseq_output/"${prefix}_counts_dexseq_int.counts"$
}$
$
#Loop over$
for prefix in "${arr[@]}"; do convert_to_sam "$prefix" & done; wait && \$
for prefix in "${arr[@]}"; do htseq-count "$prefix" & done; wait$
$
ADD REPLY
0
Entering edit mode

Oh yes, now I see a mistake. I mislabeled the function in for loop

Thank you!

ADD REPLY
0
Entering edit mode

You're welcome, Melonhead. Glad that the issue is solved.

ADD REPLY
0
Entering edit mode

did you check whether the file (gff) is accessible from the location where you run the script? eg. can you do ls Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff ?

ADD REPLY
0
Entering edit mode

It should be:

-rw-r--r-- 1  1,4K May 29 14:49 HTseq_count_int.log
-rw-r--r-- 1  1,6K May 29 14:49 HTseq_count_int.sh
-rw-r--r-- 1   70M May 29 14:49 Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff

I also tried providing the absolute path to the .gff. Still of no avail.

ADD REPLY
0
Entering edit mode

Not to go too off-topic, but is there still any reason to use htseq-count over featureCounts? The last I've tried the former it was orders of magnitude slower than featureCounts.

ADD REPLY
1
Entering edit mode

To be honest, I've never used any of these before :^)

I will gladly try different approach. Thank you for the suggestion!

ADD REPLY
1
Entering edit mode

AFAIK they basically do the same thing (unlike the fancy EM-based methods, that also attempt to rescue multi-mappers). But featureCounts is a lot faster, like I said.

ADD REPLY

Login before adding your answer.

Traffic: 2647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6