Entering edit mode
9.2 years ago
Kristin Muench
▴
640
Hello,
I'm trying to extract raw count data from a .bam file (output by Tophat2) for use in DESeq2. I'm trying to use HTSeq. Although I can get HTSeq to run, nothing aligns uniquely. The reads themselves are 100 bp long, so I'm surprised that's the case.
I input something like:
htseq-count -f bam -r pos /path/on/cluster/mySampleBam/accepted_hits.bam /path/on/cluster/homoSapGRCh38.gtf
And the output file looks like:
...
ENSG00000282813 0
ENSG00000282814 0
ENSG00000282815 0
ENSG00000282816 0
ENSG00000282817 0
ENSG00000282818 0
ENSG00000282819 0
ENSG00000282820 0
ENSG00000282821 0
ENSG00000282822 0
__no_feature 16996078
__ambiguous 0
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 2579821
Has anyone had this issue? What might be going on?
I previously sorted the .bam file I'm trying to generate counts for with the command:
samtools sort accepted_hits.bam accepted_hits_sorted
Just use the
-o
option and have a look at some of the reads that should be counted. I hope you looked at these in IGV or similar already.I haven't - what's IGV? I'm new to this, so any suggestions on how to sanity check as I build my pipeline would be appreciated.
This is IGV, it's one of a number of viewers for BAM (and other) files. It's always good to have a brief visual look at things when you get weird results. You can then find some reads that should be counted (but obviously aren't), use the -o option and then grep for the read names and see why they weren't counted.
What's the output of:
A very long list of numbers that looks like:
EDIT: It has this many lines: 1566241
sorry. This command:
Thanks! Output is 4 lines:
Output?
For the .gtf file:
For the .bam file:
I think Goutham Atla wanted to know if the chromosome-naming in your bam file and the GTF are the same: In your sam-file, you have Chromosome One written as chr1. Now you have to look if it's matching with the gtf:
and what is the output of: