Hi,
I am entirely new to bioinformatics, therefore I sincerely apologize for asking a very naive question. I am interested in counting the number of genes per chromosome in the human genome. For this purpose I have downloaded the latest release of the 'gff' file from ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/latest_assembly_versions/GCF_000001405.34_GRCh38.p8/ and have also sorted it using IGVtools from the Integrative Genomics Viewer. IGV also allows for exporting the features as 'bed' file, although there are a single base differences between the start positions of a gene in the 'gff' files and the 'bed' files generated in this way. While scanning through the 'bed' file I observed a lot of repetitions and overlaps. Now my questions are:
1) Is there any available tool for extracting non-overlapping genes from the bed file?
2) Is there any way to automate the selection of only one of the several overlapping genes?
3) Is the 'bed' file converted using the 'Export Features' tool in IGV reliable enough for further processing? What are the preferred alternatives?
I am sure such a trivial topic has already been discussed scores of times in your forum. I would appreciate if you could direct me to some such discussions. It would be wonderful if you could provide a solution that can be carried out using Windows. I sincerely thank you and apologize once again.
I do not have words to thank you for explaining the differences between the two kinds of files in such beautiful details, and also for sharing ideas about the concept of gene and whether they are 'overlapping'. I have installed Ubuntu using a Oracle VirtuaBox on my Windows laptop and am struggling to install the CGAT package in it. Hope I will succeed soon and would be able to try your suggestion. Also, I get from your message that may be I should ponder over whether to select non-overlapping genes at all. Thank you so much i.sudbery, I literally can't thank you enough.
I do not have words to thank you for explaining the differences between the two kinds of files in such beautiful details, and also for sharing ideas about the concept of gene and whether they are 'overlapping'. I have installed Ubuntu using a Oracle VirtuaBox on my Windows laptop and am struggling to install the CGAT package in it. Hope I will succeed soon and would be able to try your suggestion. Also, I get from your message that may be I should ponder over whether to select non-overlapping genes at all. Thank you so much i.sudbery, I literally can't thank you enough.