extracting protein coding genes from NCBI seq_gene.md file
0
0
Entering edit mode
8.6 years ago
avari ▴ 110

Hi all,

I want to extract a list of protein coding genes from the seq_gene.md file.

ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.37.1/mapview/seq_gene.md.gz

I have already selected all rows where the feature name is “GENE” and removed all rows with the ‘LOC’ genes (which I do not want).

I want to remove any genes without good evidence and any pseudogenes.

Therefore my questions are:

1: What evidence codes to keep? Should I select genes with a partial evidence code or keep them all?

2: Should I care about the chromosome orientation? Probably not right?

I will later add a 20,000 base pair extension to the chromosome ‘start’ & ‘stop’ position of my final gene list and positionally map SNPs to these gene windows. If anyone knows a simpler way to do this please let me know.

Thanks for any comments!

genes seq_gene.md SNPs protein coding genome • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 2599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6