Hello, I'm new to bioinformatics so this might be a dumb question. I'm trying to map a publicly available RNA-seq fastq (from 2014) to the hg 19 using STARalign. The percentage of mapped reads is 0.2%. Could this be happening because of the format of FASTQ file? The SRA page for this dataset says it's paired reads but there's only one fastq file to be downloaded, with 202 bp long reads. Please see example of the reads below. Im also adding the output summary after the alignment. I've ran this before for my samples and they align perfectly. Is there a way to fix this?
Thank you SO much!!
Example of reads
@SRR111111.1 HWI-ST1220:65:C0MCWACXX:2:1101:1650:2407 length=202
GGAACACCTCCGCTNAATAGGCGTGGTTAGAGACGAAGAGGGACTCGCTGGCAGCAGCCCCAGCCTGACCGCTCGGAGTGTACTTTCCTTGACAGGCAAGGCCTCAATGCCATTAACAAGTGCCCCCTGCTGAAGCCCTGGGCCCTGACCTTCTCCTACGGCCGAGCCCTGCAGGCCTCTGCCCTGAAGGCCTGGGGCGGGA
+SRR111111.1 HWI-ST1220:65:C0MCWACXX:2:1101:1650:2407 length=202
@CCFDFFDHHFHGG#2AFBBHGIGHGHIGHBGE;G><GGHID;FFAABH=E?CEA@CCB?>;;ABC25>:8;3;>>008BC(:@>>C>CCA>CACB(<B##@@@FFFFFA>DHFHJJJJIFGIJGAHIEHGGIGI>HHHIJGEEFIEFFHIFHGGICHIGIIHB<ABDDDDDDCA?@>B>A@CDDA?A>>ABDBDBDBDDB5
@SRR111111.2 HWI-ST1220:65:C0MCWACXX:2:1101:1721:2430 length=202
ATCATCAGTAGGGTNAAACTAACCTGTCTCACGACGGTCTAACCCCAGCTCACGTTCCCTATTAGTGGGTGAACAATCCAACGCTTGGTGAATTCTGCTTCAAGCGTTCATAGCGACGTCGCTTTTTGATCCTTCGATGTCGGCTCTTCCTATCATTGTGAAGCAGAATTCACCAAGCGTTGGATTGTTCACCCACTAATAG
+SRR111111.2 HWI-ST1220:65:C0MCWACXX:2:1101:1721:2430 length=202
CCCFFFFFFHHHHG#2AFHIJIIJJJHGIIBGHGJGIEEHGIJJJJIIIJJGIIAEHHFFDFFDFEEDD>6;>>CACCC@CA@98?7?CBDDEECEEDACD@CCFFDDFHHHHGIIJJHIIJJJJIJIIIJJJJJIIJJJIIJJJIJJFHE=ACDDFFFDFFCEEEEDDDDDDDCDDDDBBDDDDCCDDCCCDDDBBDDDDD
@SRR111111.3 HWI-ST1220:65:C0MCWACXX:2:1101:1608:2458 length=202
GTTCTTAGTTGGTGNAGCGATTTGTCTGGTTAATTCCGATAACGAACGAGACTCTGGCATGCTAACTAGTTACGCGACCCCCGAGCGGTCGGCGAGATCGGCGCCGACCGCTCGGGGGTCGCGTAACTAGTTAGCATGCCAGAGTCTCGTTCGTTATCGGAATTAACCAGACAAATCGCTCCACCAACTAAGAACAGATCGG
OUTPUT FROM START
Number of input reads | 46256929
Average input read length | 202
UNIQUE READS:
Uniquely mapped reads number | 92715
Uniquely mapped reads % | 0.20%
Average mapped length | 193.31
Number of splices: Total | 114909
Number of splices: Annotated (sjdb) | 22204
Number of splices: GT/AG | 38543
Number of splices: GC/AG | 1475
Number of splices: AT/AC | 442
Number of splices: Non-canonical | 74449
Mismatch rate per base, % | 1.72%
Deletion rate per base | 0.05%
Deletion average length | 5.05
Insertion rate per base | 0.04%
Insertion average length | 4.63
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 66687
% of reads mapped to multiple loci | 0.14%
Number of reads mapped to too many loci | 3137
% of reads mapped to too many loci | 0.01%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 99.64%
% of reads unmapped: other | 0.01%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
It appears that you have obfuscated the actual SRR# so we can't tell you what the data for that accession should look like but what lieven.sterck said below is likely what is happening. You will need to use
--split-files
option when dumping reads from SRA.