Hi All. Ensembl has the reference genome and annotations that I need separated into files by chromosome. I'd like to combine these into one file with an additional column to specify which chromosome they are found on, so that I can align and generate feature counts for the organisms whole genome, from which I can do differential expression and gene newtwork analysis for the organisms whole genome.
Does anyone know of a way I can do this? Software, scripts?
What have you tried? Shell loops in conjuction with
awk
can do this if used well.I didn't have any initial ideas, other than running alignment and feature counting for each chromosome individually and then trying to use deplyr to combine data tables in R, as I was planning on doing the differential expression with EdgeR in R. But, I'll try using awk and shell loops to do this prior to alignment. Thank you. Any additional hints would be greatly appreciated.