I am looking for a human reference genome. I tried to download the genome from the link
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/
and the file I downloaded is hg38.fa
. The purpose of downloading the reference genome is to align RNA-seq reads
with the reference genome. I tried to look into the file hg38.fa
after download and found that there are different chromosome heading at the start such as
>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
and at the last I found something like
>chrY_KI270740v1_random
TAATAAATTTTGAAGAAAATGAAGACTGTGTTCTCAGTTCCAGGTGCTTC
ATCAGGCTCATTGTGGATCCAGACTACCAGACACAAGACATTACACATTG
TAATGCATTAAATGCATAGTTTTAACAGTAATAATTTAAAAGAGATTTAG
AATTTTATAATGTTTGGAAAAATACATAGAGGCTTACTTTTTATTTTATT
TTTTTGAGATAGGAAGCCtttttttttgtttttgtttttgtttctgtttt
tgttttttgagacagagtctcaccatgtcacccagactggagtgcagtgg
tgcaatatcggcccattgcaagctccacatcccaggttcacaccattctc
ctgcctcagcctcccaagtagctgggactacaggtgcccgccaccacatc
cagctaatttttttttgtacttttagtagagacggggtatcaccatgtga
gccaagatggtctccatctcctgacctcgtgatctgcccaccttggcctc
ccaaagtgctgggattacaggggtgagccaccacgcccagGCATAGAGGC
ACTTTTAACCATAAATGAACACTGTTATGATTTGTATTACCACAGTATCA
TTATTCTGTCCTGTTTGCCTTACAttttatttatttattatactgtaagt
tctgggatacatgtgcagaatgtgcaggtttgttacagagatatatgctt
gtttgctgcacctgtcagtttttcatctacattaggtatttctcctaatg
ctattccctgttaggtccccaccctccaacagtctccagtgtttgatgtt
cccctccctatgtccatgtattctcattttacaactcccacctatgagtg
agaaattgcagtgtttgTGtgtttggaacttattccttccagtgggtttg
tggtctcgctcactgcaaaaatgaagctgtagaccgtttcggtgtgtgtt
acaactcttaaaggtggtgtgtctggagtttgctacttcacatgagctca
tggtcttgcttacttcaagaatgaagctgcagacatttacggtgagtgtt
I am not sure if I can use this reference genome as it is, and if any preprocessing is required before it's use for sequence read alignment. I would also like to know from where I can get RNA-seq reads, that can be used for alignment with this reference genome.
From NCBI SRA/EBI ENA.