Question

RNA_seq

0

Entering edit mode

8 months ago

daffodil ▴ 10

Hi,

A RNA-seq analysis was done using this pipeline based on 6 samples I obtained from SRA. I got the following result when I checked the count table and the end of six samples. Could you please tell me why there are so many no_features?

#hisat2 -q -x /proj/naiss2023-22-1174/index/mm10/genome /proj/-1174/Masomeh/RNAseq/SRR21230495/SRR21230495_trimmed.fastq -S SRR21230495.sam
#htseq-count  SRR21230495.sam /proj/index/Genmm10_GTF > pachyten_rep2.count



no_feature
25272404
27648148
23773684
32014456
35770238
27957753
120314
__ambiguous
438530
467429
373228
513380
563027
434333
120315
__too_low_aQual
0
0
0
0
0
0
120316
__not_aligned
216285
221717
293961
360088
0
250609
120317
__alignment_not_unique
1831957
2081722
1556759
2101321
2127794
1683677

RNA-seq • 865 views

ADD COMMENT • link updated 8 months ago by swbarnes2 14k • written 8 months ago by daffodil ▴ 10

0

Entering edit mode

Please edit your post and use a better title.

ADD REPLY • link 8 months ago by Ram 44k

Ram · Answer 1 · 2024-03-13

0

Entering edit mode

8 months ago

swbarnes2 14k

The first question has to be: are you completely sure your gtf and fasta match?

ADD COMMENT • link 8 months ago by swbarnes2 14k

0

Entering edit mode

How I can be sure ? I got hisat2 index/mm10 from hisat2 website, and Genmm10_GTF from ucsc.

ADD REPLY • link 8 months ago by daffodil ▴ 10

0

Entering edit mode

Compare the sequence headers.

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

for GTF file

[ozatalab@rackham4 index]$ head -n 100 Genmm10_GTF
chr1    mm10_ncbiRefSeq stop_codon  134202951   134202953   0.000000    -   .   gene_id "NM_001291928.1"; transcript_id "NM_001291928.1"; 
chr1    mm10_ncbiRefSeq CDS 134202954   134203590   0.000000    -   1   gene_id "NM_001291928.1"; transcript_id "NM_001291928.1"; 
chr1    mm10_ncbiRefSeq exon    134199215   134203590   0.000000    -   .   gene_id "NM_001291928.1"; transcript_id "NM_001291928.1"; 
chr1    mm10_ncbiRefSeq CDS 134234663   134234733   0.000000    -   0   gene_id "NM_001291928.1"; transcript_id "NM_001291928.1"; 
chr1    mm10_ncbiRefSeq start_codon 134234731   134234733   0.000000    -   .   gene_id "NM_001291928.1"; transcript_id "NM_001291928.1"; 
chr1    mm10_ncbiRefSeq exon    134234663   134234856   0.000000    -   .   gene_id "NM_001291928.1"; transcript_id "NM_001291928.1"; 
chr1    mm10_ncbiRefSeq stop_codon  134202951   134202953   0.000000    -   .   gene_id "NM_001008533.3"; transcript_id "NM_001008533.3"; 
chr1    mm10_ncbiRefSeq CDS 134202954   134203590   0.000000    -   1   gene_id "NM_001008533.3"; transcript_id "NM_001008533.3"; 
chr1    mm10_ncbiRefSeq exon    134199215   134203590   0.000000    -   .   gene_id "NM_001008533.3"; transcript_id "NM_001008533.3"; 
chr1    mm10_ncbiRefSeq CDS 134234015   134234355   0.000000    -   0   gene_id "NM_001008533.3"; transcript_id "NM_001008533.3"; 
chr1    mm10_ncbiRefSeq start_codon 134234353   134234355   0.000000    -   .   gene_id "NM_001008533.3"; transcript_id "NM_001008533.3"; 
chr1    mm10_ncbiRefSeq exon    134234015   134235457   0.000000    -   .   gene_id "NM_001008533.3"; transcript_id "NM_001008533.3"; 
chr1    mm10_ncbiRefSeq stop_codon  134202951   134202953   0.000000    -   .   gene_id "NM_001282945.1"; transcript_id "NM_001282945.1"; 
chr1    mm10_ncbiRefSeq CDS 134202954   134203590   0.000000    -   1   gene_id "NM_001282945.1"; transcript_id "NM_001282945.1"; 
chr1    mm10_ncbiRefSeq exon    134199215   134203590   0.000000    -   .   gene_id "NM_001282945.1"; transcript_id "NM_001282945.1"; 
chr1    mm10_ncbiRefSeq CDS 134234015   134234355   0.000000    -   0   gene_id "NM_001282945.1"; transcript_id "NM_001282945.1"; 
chr1    mm10_ncbiRefSeq start_codon 134234353   134234355   0.000000    -   .   gene_id "NM_001282945.1"; transcript_id "NM_001282945.1"; 
chr1    mm10_ncbiRefSeq exon    134234015   134234446   0.000000    -   .   gene_id "NM_001282945.1"; transcript_id "NM_001282945.1"; 
chr1    mm10_ncbiRefSeq exon    134235228   134235457   0.000000    -   .   gene_id "NM_001282945.1"; transcript_id "NM_001282945.1"; 
chr1    mm10_ncbiRefSeq stop_codon  134202951   134202953   0.000000    -   .   gene_id "NM_001039510.2"; transcript_id "NM_001039510.2"; 
chr1    mm10_ncbiRefSeq CDS 134202954   134203590   0.000000    -   1   gene_id "NM_001039510.2"; transcript_id "NM_001039510.2"; 
chr1    mm10_ncbiRefSeq exon    134199215   134203590   0.000000    -   .   gene_id "NM_001039510.2"; transcript_id "NM_001039510.2"; 
chr1    mm10_ncbiRefSeq CDS 134234015   134234355   0.000000    -   0   gene_id "NM_001039510.2"; transcript_id "NM_001039510.2"; 
chr1    mm10_ncbiRefSeq start_codon 134234353   134234355   0.000000    -   .   gene_id "NM_001039510.2"; transcript_id "NM_001039510.2"; 
chr1    mm10_ncbiRefSeq exon    134234015   134234412   0.000000    -   .   gene_id "NM_001039510.2"; transcript_id "NM_001039510.2"; 
chr1    mm10_ncbiRefSeq exon    134235228   134235457   0.000000    -   .   gene_id "NM_001039510.2"; transcript_id "NM_001039510.2"; 
chr1    mm10_ncbiRefSeq stop_codon  134202951   134202953   0.000000    -   .   gene_id "NM_001291930.1"; transcript_id "NM_001291930.1"; 
chr1    mm10_ncbiRefSeq CDS 134202954   134203505   0.000000    -   0   gene_id "NM_001291930.1"; transcript_id "NM_001291930.1"; 
chr1    mm10_ncbiRefSeq start_codon 134203503   134203505   0.000000    -   .   gene_id "NM_001291930.1"; transcript_id "NM_001291930.1"; 
chr1    mm10_ncbiRefSeq exon    134199215   134203590   0.000000    -   .   gene_id "NM_001291930.1"; transcript_id "NM_001291930.1"; 
chr1    mm10_ncbiRefSeq exon    134235228   134235457   0.000000    -   .   gene_id "NM_001291930.1"; transcript_id "NM_001291930.1"; 
chr1    mm10_ncbiRefSeq stop_codon  134202951   134202953   0.000000    -   .   gene_id "XM_006529079.3"; transcript_id "XM_006529079.3"; 
chr1    mm10_ncbiRefSeq CDS 134202954   134203590   0.000000    -   1   gene_id "XM_006529079.3"; transcript_id "XM_006529079.3";

for index

[ozatalab@rackham4 index]$ hisat2-inspect -n /proj/naiss2023-22-1174/index/mm10/genome
chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr1_GL456210_random
chr1_GL456211_random
chr1_GL456212_random
chr1_GL456213_random
chr1_GL456221_random
chr2
chr3
chr4
chr4_GL456216_random
chr4_GL456350_random
chr4_JH584292_random
chr4_JH584293_random
chr4_JH584294_random
chr4_JH584295_random
chr5
chr5_GL456354_random
chr5_JH584296_random
chr5_JH584297_random
chr5_JH584298_random
chr5_JH584299_random
chr6
chr7
chr7_GL456219_random
chr8
chr9
chrM
chrUn_GL456239
chrUn_GL456359
chrUn_GL456360
chrUn_GL456366
chrUn_GL456367
chrUn_GL456368
chrUn_GL456370
chrUn_GL456372
chrUn_GL456378
chrUn_GL456379
chrUn_GL456381
chrUn_GL456382
chrUn_GL456383
chrUn_GL456385
chrUn_GL456387
chrUn_GL456389
chrUn_GL456390
chrUn_GL456392
chrUn_GL456393
chrUn_GL456394
chrUn_GL456396
chrUn_JH584304
chrX
chrX_GL456233_random
chrY
chrY_JH584300_random
chrY_JH584301_random
chrY_JH584302_random
chrY_JH584303_random

ADD REPLY • link updated 8 months ago by Ram 44k • written 8 months ago by daffodil ▴ 10

0

Entering edit mode

Why are you showing us this? Compare the sequence names, the head doesn't show you all relevant seqnames.

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

At first glance the main chromosomes appear to match between your GTF and reference. Give featureCounts a try as well.

ADD REPLY • link 8 months ago by GenoMax 147k

0

Entering edit mode

Thank you very much. I can figure it out by using featureCounts. However, when I used the GTF file that I got from the Table Browser tab from UCSC, this result appeared:

Status  SRR21230494_sorted.bam
Assigned        6435019
Unassigned_Unmapped     0
Unassigned_Read_Type    0
Unassigned_Singleton    0
Unassigned_MappingQuality   0
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 5189487
Unassigned_Secondary    0
Unassigned_NonSplit     0
Unassigned_NoFeatures   2354494
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    17341339

However when I used GTF file from https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/mm10. refGene.gtf.gz I got this result:

Status  SRR21230494_sorted.bam
Assigned        22514807
Unassigned_Unmapped     0
Unassigned_Read_Type    0
Unassigned_Singleton    0
Unassigned_MappingQuality   0
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 5189487
Unassigned_Secondary    0
Unassigned_NonSplit     0
Unassigned_NoFeatures   3178708
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    437337

I would be thankful if you could provide me with guidance.

ADD REPLY • link updated 8 months ago by GenoMax 147k • written 8 months ago by daffodil ▴ 10

0

Entering edit mode

The fact that you are getting some hits suggests that your gtf has the same chromosome naming as your fasta, but there might still be coordinate mismatches between the two files. You need to get both files from the same source, to make sure both are using the same coordinate system. If the hisat website doesn't provide a matching gtf, I'd go somewhere that has both, and remake the index file yourself using a fasta you know matches your gtf.

ADD REPLY • link 8 months ago by swbarnes2 14k