HT-seq - what is ideal?
1
0
Entering edit mode
6.3 years ago
jsl ▴ 50

Hi guys,

I ran a HT-seq command and I would like to cross check with you guys. wonder what output is ideal? Below is the command

htseq-count -f bam --idattr=gene -r pos /home/user/scratch60/STARresults/SRR7059136Aligned.sortedByCoord.out.bam /home/user/scratch60/NCBI_files/GCF_000001405.26_GRCh38_genomic.gff >/home/user/scratch60/HTseq_annotation/annotated_SRR7059136.txt

and the output is this

12600000 SAM alignment record pairs processed.
Warning: Mate pairing was ambiguous for 22805 records; mate key for first such record: ('SRR7059136.1152992', 'first', 'NC_000001.11', 135867, 'NC_000001.11', 493007, 357290).
12621898 SAM alignment pairs processed.

My questions are:

  1. Should I be concerned about missing mate encountered warnings? Is there an ideal number one should be aiming for?
  2. Am I right to run -r pos because my STAR command included --outSAMtype BAM SortedByCoordinate? I'm trying to understand the logic of this, if someone can explain it, it would be much appreciated!

Thanks guys!

RNA-Seq • 1.8k views
ADD COMMENT
1
Entering edit mode
6.3 years ago
  1. In an ideal world it'd be 0, but losing <1% won't affect anything.
  2. Sure, though you can just have STAR quantify things for your and not have to wait as long.

In general HTSeq-count isn't much used these days because it's quite slow. Either have STAR do the counting for you or use featureCounts and you'll get the results quicker.

ADD COMMENT
0
Entering edit mode

Thanks Devon! That makes sense.

ADD REPLY

Login before adding your answer.

Traffic: 2051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6