High number of unmapped reads in STAR

0

Entering edit mode

3.1 years ago

Ngrin • 0

Hello,

I have some covid samples. I have run below command to get reads per gene and also bam files.

STAR --runThreadN 20 --readFilesCommand zcat --genomeDir /star/genomeIndex --readFilesIn R1 R2 --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts

After running the above command and using the provided genome files here almost all samples have a very high number of unmaped reads.

N_unmapped  33570998    33570998    33570998
N_multimapping  1589    1589    1589
N_noFeature 8987    9775    9680
N_ambiguous 225 61  95

Is there anything that should I change? Is this high number normal?

read count alignement STAR genecount • 1.4k views

ADD COMMENT • link 3.1 years ago by Ngrin • 0

0

Entering edit mode

No, this is not "normal". What is this dataset, are you sure you are using the correct genome?

ADD REPLY • link 3.1 years ago by ATpoint 90k

0

Entering edit mode

I have added a link in my original post to the genome files I am using. These are human samples (paired end reads). I wonder why this much unmapped reads I have. Do I need to do any further step?

STAR --runThreadN 20 --runMode genomeGenerate --genomeDir ./star/genomeIndex --genomeFastaFiles ./star/reference/GRCh38.p13.genome.fa --sjdbGTFfile ./star/reference/gencode.v41.chr_patch_hapl_scaff.basic.annotation.gtf --sjdbOverhang 59

This is what I use to create genome index files.

ADD REPLY • link 3.1 years ago by Ngrin • 0

0

Entering edit mode

Hello there, @Negara. Have you checked whether the fasta and gtf's chromosome names match?