High number of unmapped reads in STAR
0
0
Entering edit mode
2.2 years ago
Ngrin • 0

Hello,

I have some covid samples. I have run below command to get reads per gene and also bam files.

STAR --runThreadN 20 --readFilesCommand zcat --genomeDir /star/genomeIndex --readFilesIn R1 R2 --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts

After running the above command and using the provided genome files here almost all samples have a very high number of unmaped reads.

N_unmapped  33570998    33570998    33570998
N_multimapping  1589    1589    1589
N_noFeature 8987    9775    9680
N_ambiguous 225 61  95

Is there anything that should I change? Is this high number normal?

read count alignement STAR genecount • 934 views
ADD COMMENT
0
Entering edit mode

No, this is not "normal". What is this dataset, are you sure you are using the correct genome?

ADD REPLY
0
Entering edit mode

I have added a link in my original post to the genome files I am using. These are human samples (paired end reads). I wonder why this much unmapped reads I have. Do I need to do any further step?

STAR --runThreadN 20 --runMode genomeGenerate --genomeDir ./star/genomeIndex --genomeFastaFiles ./star/reference/GRCh38.p13.genome.fa --sjdbGTFfile ./star/reference/gencode.v41.chr_patch_hapl_scaff.basic.annotation.gtf --sjdbOverhang 59

This is what I use to create genome index files.

ADD REPLY
0
Entering edit mode

Hello there, @Negara. Have you checked whether the fasta and gtf's chromosome names match?

ADD REPLY
0
Entering edit mode

yes in both the chromosome names are in the same format as chr1, chr2, etc.

ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6