Hi, everyone!
I'm trying to estimate the expression levels of ERVs in human by mapping the reads on the consensus sequences.
I found a paper that used consensus sequences from Dfam, but I couldn't find the exact file.
I found some files 'hg38_dfam.hits.gz' and 'hg38_dfam.nrph.hits.gz' which had only location information without actual sequences.
Should I extract sequences using the location info, or is there other way?
Can someone give me some hints?
Thank you!
With the latest version of Dfam there is an API you can use to access the consensus sequences without downloading the HMMs. You would use the
/families/{id}/sequence
endpoint.See http://dfam.org/help/api and https://dfam.org/releases/Dfam_3.0/apidocs/