I have just spent quite a while trying to figure this out and finally solved the mystery: these odd lines refer to tRNAs and pseudo_tRNAs.
In the descriptions of analysis on the ENCODE website, there is no mention of any such features. I decided to look at the files that ENCODE's pipelines use as input to RSEM to figure out what they were. In the metadata table associated with the files, mine say they used annotation 'M4'. I went to ENCODE's 'Reference Sequences' page and took a look at this M4 annotation, but found that every feature in the file was of the format ENSMUSG...
It was only when I started digging through random annotation files on the ENCODE portal, such as this example, that I found the association between these values and the tRNAs.
For example, this is a snippet from the above-linked file:
10000 Pseudo_tRNA
10001 Pseudo_tRNA
10002 Pseudo_tRNA
10003 Pseudo_tRNA
10004 Pseudo_tRNA
10005 Pseudo_tRNA
10006 Ala_tRNA
10007 Pseudo_tRNA
10008 Lys_tRNA
10009 Pseudo_tRNA
10027 Ser_tRNA
I'm not entirely sure why these features are included in the output files, I suspect that it may be a mistake (if it's not, the analysis descriptions should be made clearer).
So for most analyses where you don't care about tRNAs, I reckon you can just delete the lines. Hope this answer saves some time for future explorers.
HGNC ID, perhaps? Do you know what kinds of genes you are looking at? For example: https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=21175
I have a list of genes. I will take some genes that I know the name in the Gene Id and check in this website you suggested if they match. Then i will use http://biodb.jp/ to convert. Thanks for the info.
Length and effective length numbers are small these to be full genes. It would be hard to say what those numeric gene ID's are. Where did you get the file from? Do you have a link?
For example the tsv file here:
click in file details
This is what the explanation legend says:
Estimated expression levels from RSEM as a tsv file. The columns are as follows:
truncated for brevity.
thanks a lot, i think alex already answered my question :) but to confirm i should check rsem output to see which gene_id reference they use..
I don't think those are HGNC ID's. They are things which did not have a gene name.
Further down in the file you have normal gene identifiers.
oh thanks a lot, I have scrolled a bit the file but did not go down enough to see the ENSEMBL gene annotation! much appreciated genomax!
See: How to add images to a Biostars post