Non-repeat human genome dataset
1
0
Entering edit mode
4.2 years ago
Ivan • 0

Could anyone please point me to where I could find a dataset of non-repeat sequences for the human ref genome. I'm not sure if it's still regarded as true, but I saw that possibly 2/3 of the human genome contains repeats. Is there a place where I can download the other 1/3 of the human genome without repeats. Non-repeats of any human ref genome version would be good, but the best ones would be hg19 or hg38.

genome • 836 views
ADD COMMENT
0
Entering edit mode
3.3 years ago
jseg • 0

Hello,

I believe you're looking for genome sequences of the human genome, without repeats. Such sequences are termed masked sequences.

If you use Bioconductor, you can use the BSgenome.Hsapiens.UCSC.hg38.masked package.

If you want a FASTA file directly, you can download the masked sequences here: https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.masked.gz

There are more info on the file at this link: https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips

I hope this helps !

Best wishes.

ADD COMMENT

Login before adding your answer.

Traffic: 2051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6