Ensembl reference genomes and annotations by chromosome only.
1
0
Entering edit mode
4.3 years ago
devarts ▴ 40

Hi All. Ensembl has the reference genome and annotations that I need separated into files by chromosome. I'd like to combine these into one file with an additional column to specify which chromosome they are found on, so that I can align and generate feature counts for the organisms whole genome, from which I can do differential expression and gene newtwork analysis for the organisms whole genome.

Does anyone know of a way I can do this? Software, scripts?

ensembl alignment RNA-Seq feature counts • 1.0k views
ADD COMMENT
1
Entering edit mode

What have you tried? Shell loops in conjuction with awk can do this if used well.

ADD REPLY
0
Entering edit mode

I didn't have any initial ideas, other than running alignment and feature counting for each chromosome individually and then trying to use deplyr to combine data tables in R, as I was planning on doing the differential expression with EdgeR in R. But, I'll try using awk and shell loops to do this prior to alignment. Thank you. Any additional hints would be greatly appreciated.

ADD REPLY
2
Entering edit mode
4.3 years ago
h.mon 35k

I have downloaded several genomes and annotations from Ensembl, each contained in single genome (fasta) and annotation (gff or gtf) files. It may be a bit confusing at first, with all the .dna.chromosome.1.fa.gz and .chromosome.X.gff3.gz, but you just want the files with no chromosome in their names.

For example, for pig, you will want the Sus_scrofa.Sscrofa11.1.dna.toplevel.fa.gz for reference genome, and Sus_scrofa.Sscrofa11.1.101.gff3.gz for the gff annotation (the gtf directory doesn't split the annotation by chromosome).

ADD COMMENT
0
Entering edit mode

h.mon, I think that's the ticket. I was uncertain about using the toplevel files. I did see that those are available for my organism.

ADD REPLY

Login before adding your answer.

Traffic: 2487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6