Dear community,
After running my single-cell pipeline (cellranger or starsolo) I get a great amount of weird feature names that are also not documented in the human protein atlas. they are contaning a point and are looking like this:
AC007750.1" "AC016766.1" "AC019197.1" "AC009495.3" "AC009495.1" "AC009495.2" "AC073050.1" "AC016723.1"
[873] "AC007277.1" "AC007405.1" "AC007405.3" "AC010092.1" "AC104088.1" "AC104088.3" "AC104088.2" "AC078883.2"
[881] "AC078883.1" "AC016737.1" "AC016737.2" "AC010894.2" "AC010894.3" "AC093459.1" "AC017048.3" "AC092162.2"
[889] "AC073636.1" "AC074286.1" "AC079305.3" "AC079305.1" "AC019080.1" "AC019080.5" "AC019080.6" "AC012499.1"
[897] "AC009948.3" "AC009948.2" "AC010680.3" "AC010680.2" "AC010680.4" "AC010680.5" "AC092640.1" "AC009478.1"
[905] "AC068196.1" "AC009962.1" "AC064871.2" "AC021851.1" "AC021851.2" "AC093639.3" "AC096555.1" "AC096667.1"
[913] "AC009315.1" "AC097500.1" "AC017071.1" "AC017101.1" "AC007319.1" "AC092598.1" "AC133106.1" "AC013468.1"
[921] "AC093388.1" "AC108047.1" "AC006460.2" "AC006460.1" "AC005540.1" "AC067945.3" "AC067945.4" "AC092614.1"
[929] "AC098617.1" "AC096647.1" "AC010983.1" "AC104823.1" "AC064834.1" "AC114760.2" "AC013264.1" "AC011997.1"
[937] "AC011997.2" "AC019330.1" "AC020718.1" "AC097717.1" "AC012459.1" "AC007163.1" "AC005037.1" "AC007272.1"
[945] "AC007279.2" "AC069148.1" "AC079354.3" "AC064836.4" "AC064836.2" "AC064836.3" "AC007736.1" "AC007383.2"
[953] "AC007383.3" "AC008269.1" "AC007879.3" "AC009226.1" "AC096772.1" "AC007038.2" "AC007038.1" "AC006994.2" ...
I have in total 26000 features in my seuratobject after aligning and doing the QC. Anyone has an Idea what those genes are and how to deal with them? Because when doing an enrichment analysis I cant map those gene names to Ensembl ID for instance.
Best,
Tolga
Where did you get your reference from? 10x provides pre-made indexes for human genome.
The ID's you have are for old BAC clones that were used in initial stages of human genome sequencing. e.g. https://www.ncbi.nlm.nih.gov/nuccore/AC064836
I have my reference from the 10x website, where I download the human reference from 2020:
and I ran the following command:
So I provided the folder with the pre-made indexes and the annotation files.
But I don't get why I have those features in, especially the BAC clones..
Best regards