Entering edit mode
12 months ago
Bjorn
•
0
Hello,
I am running this htseq command
htseq-count -r pos -t gene -i gene -s yes -f bam \
/Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam \
/Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3 > \
/Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt
However I get this error:
Error processing GFF file (line 75 of file /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3):
Feature gene:ENSTGUG00000013637 does not contain a 'gene' attribute
[Exception type: ValueError, raised in features.py:387]
But in my gff3 file there is clearly a gene attribute:
1 ensembl gene 34955 72180 . + . ID=gene:ENSTGUG00000013637;Name=DCBLD2;biotype=protein_coding;description=discoidin%2C CUB and LCCL domain containing 2 [Source:NCBI gene%3BAcc:100218192];gene_id=ENSTGUG00000013637;logic_name=ensembl;version=2
1 ensembl mRNA 34955 72180 . + . ID=transcript:ENSTGUT00000014203;Parent=gene:ENSTGUG00000013637;Name=DCBLD2-201;biotype=protein_coding;tag=Ensembl_canonical;transcript_id=ENSTGUT00000014203;version=2
Does anyone know what is going on here?
Thanks! I was able to get around this error by using feature counts. However, I ended up getting zero reads. This is odd considering that my Bam file had an alignment of about 90%. I think that my .gff3 file and bam file are somehow incompatible.
Here is some of the .bam file. AS you can see it is region CM012081.2
here is the head of my gff3 file with the alias almost matching the bam file except for the .1 instead of .2
Here is my output. I do not believe that the gene counts are all zero. This has been a sustained issue for 2 weeks and I am not sure what to do.
your BAM file is mapped against a genome with chromosome called
CM012081.2
but your annotation file has the chromosome called1
the chromosomal names have to match
otherwise, it cannot match a gene to the the alignments
Thank you. I have been trying to build a genome for this GCF file but I keep running into the same error.
.fna genome >NC_044211.2 Taeniopygia guttata isolate Blue55 chromosome 1, bTaeGut1.4.pri, whole genome shotgun sequence AAGCCAGCCATATCAGTATCCCAGCCCGCAGGGATCTGACTGCCACCCAGCAATGGCCAGCCACTATGTGTGCCCCTGTA TCACT
however, I keep getting errors on during the hisat2-build command. That there are no A nucleotides, but when I open my .fna there clearly are. I haven't seen many examples of this on the internet. Any ideas?
Exited GFM loop fchr[A]: 0 fchr[C]: 306050481 fchr[G]: 526435265 fchr[T]: 746725375 fchr[$]: 1052636476 Exiting GFM::buildToDisk() Returning from initFromVector Wrote 355100291 bytes to primary GFM file: prof.1.ht2 Wrote 263159124 bytes to secondary GFM file: prof.2.ht2 Re-opening _in1 and _in2 as input streams Returning from GFM constructor assert_eq: expected (57130, 0xdf2a) got (57344, 0xe000) ./hgfm.h:2127 Assertion failed: (0), function HGFM, file ./hgfm.h, line 2127. Abort trap: 6
sounds like your genome file is not quite correct, you need to make sure it is in FASTA format all the way through, perhaps get it from a reliable source.
I tried to build the index on a valid data and it worked just fine: