I have few questions regarding htseq count.
I run the htseq in virtual box using this command:
samtools view SMA1.bam | htseq-count -s no -i ID -t gene - Anopheles-gambiae-PEST_BASEFEATURES_AgamP4.2.gff3 > htseq2/SMA1.text
biouser@VirtualBox:~/Desktop/sf_farahdata$ samtools view SMA2.bam | htseq-count -s no -i ID -t gene - Anopheles-gambiae-PEST_BASEFEATURES_AgamP4.2.gff3 > htseq2/SMA2.text
100000 GFF lines processed.
175804 GFF lines processed.
Warning: Read HS31_15535:2:2109:18695:7195#14 claims to have an aligned mate which could not be found in an adjacent line.
Warning: Malformed SAM line: RNAME != '*' although flag bit &0x0004 set
Warning: Malformed SAM line: MRNM == '=' although read is not aligned.
Warning: Malformed SAM line: MRNM != '*' although flag bit &0x0008 set
100000 SAM alignment record pairs processed.
200000 SAM alignment record pairs processed.
300000 SAM alignment record pairs processed.
Q1: What does the warning means?
I use this command to pull out unmapped reads:
samtools view -f 4 -c SMA2.bam
I attached the results from the reads data that I obtained. It only shows the read information rather than read counts.
Q2: How can I extracting counts from RNAseq data that does not align to a feature.
Thank you.
Hi Craig,
Thank you for your reply.
Maybe my question is not very clear. I use htseq count using annotated reference GFF file and I got around 3million of 'no_feature' reads. I am interested to get the counts on these unannotated regions. I wonder if I could do this using htseq count or any other tools?
Thank you in advance. :)
Use the
-o
option, and then grep the resulting SAM file forno_feature
. You'll likely find that these are in introns, around UTRs, and in repeat regions.