Entering edit mode
7.1 years ago
Sara
▴
260
I have some RNAseq data and trying to align them to the genome (hg19) using both STAR and BWA separately. but the problem is that for each file it should take at least half an hour but the strange thing is that for these files it takes at most 6 minutes and at the end when I visualize the bam files on UCSC or IGV for many genes I do not have enough reads (in fact very few reads which is not really normal based on my experience). do you know what the possible problem is?
Hi Sara,
Please don't forget to follow up on your other thread: how to visualize the RNAseq data on IGV
Cheers, Wouter
Can you give details on number of reads, read length distribution, read quality distribution, the type of pairing (single- or paired-end sequencing), and anything else you've got? These can be quickly generated by using Fast QC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
It's neither a major issue, but BWA is not what I would initially think of for RNA-seq data. There are modern 'pseudo-aligners' like Kallisto, which can align large samples in just a few minutes each, but they align to a reference transcriptome in FASTA.
Except perhaps if OPs organism (important information left out) is prokaryotic?
It's hg19. Mentioned in the question. In addition try to count the number of reads mapped to the genome.
Ow yeah, missed that. Then bwa is not appropriate.
There are many "possible problems". Better if you troubleshoot the alignment. For starters, what is your mapping rate? STAR has some nice and informative log with this information and more. How many reads on the fastqs?