Hi have performed a differential expression experiment using RNA-Seq and would like to have a go with chromosome clustering. I found this program CROC which seems like it could help but it requires a list of genes (which I have) and a reference genome in UCSCs refGene format.
Since I work in on a plant with not reference refGene file to use I have tried to generate my own following advice on SeqAnswers. Essentially I used UCSCs gtfToGenePred to create a refGene file from a GTF file.
However while I can load it into CROC, when I run the clustering program it fails to identify the genes in the reference. I think there must be something wrong with my reference file as I can see the relevant gene IDS in there. How can I adjust it so that gene IDs will be picked out.
Here are the first few entries of my generated refGene file:
XLOC_000001 TCONS_00000001 chr1 + 6523 7366 7366 7366 2 6523,7097, 6620,7366,
XLOC_000002 TCONS_00000002 chr1 + 14513 15729 15729 15729 2 14513,15502, 14556,15729,
XLOC_000003 TCONS_00000003 chr1 + 16282 18382 18382 18382 3 16282,17060,18241, 16326,17304,18382,
XLOC_000004 TCONS_00000004 chr1 + 31972 32344 32344 32344 1 31972, 32344,
and I insert in Gene IDs which look like this:
XLOC_000007
XLOC_000287
XLOC_000320
XLOC_000381
XLOC_000394
XLOC_000645
XLOC_000754
I am not familiar with this file format so thought that switching the first two columns around might help. It did not.
Any help is greatly appreciated.
do your other results have the 'chr' prefix? e.g. are they '1' instead of 'chr1?
They all have the 'chr' prefix.
These are all the values that are in that column