Given loci coordinates, is it possible to identify gene name from hg19.2bit file?
1
0
Entering edit mode
8.4 years ago

Hi everyone.

I have a bed file with N specified loci. Is it possible to look up what gene corresponds to a given loci in the bed file?

If yes, how can I do the request to the ucsc server not downloading whole genome?

Many thanks.

genome alignment • 2.4k views
ADD COMMENT
0
Entering edit mode
8.4 years ago
anp375 ▴ 190

I assume you want all genes in a given locus. First, download the latest release of the appropriate GTF. You don't need the binary genome. This is what you can do using bash shell:

  • Download and install bedtools. Then load it.
  • Use "grep" to extract the genes only. "grep -e $'\tgene\t' /path/to/hg19.gtf > /path/to/hg19_genes.gtf"
  • Convert the bed file to the correct format: chr\tstart\tstop\tname with regex if necessary. Output this to Genes.tsv.
  • Then do bedtools intersect. "bedtools intersect -a /path/to/genes.tsv -b /path/to/hg19_genes.gtf -wao -bed | tr -d '"' | tr ';' '\t' > intersect.bed
  • Cut out the necessary columns. Include strandedness, which I don't think I included. GTFs only use forward coordinates and you'll need to take the reverse compliment of those specified sequences. "cut -f 1-4,6,13,14,18 intersect.bed | tr ' ' '\t' > present_genes.tsv"
  • For other objectives, you may need to concatenate the regions in the files to get contiguous regions, which I'm unfortunately not allowed to help you with.
  • You will most likely have to customize these commands. 'gene' which is in the same column as 'CDS', 'start_codon', 'exon', etc., may be called something else. You should also read the documentation for bedtools to find out what's going on. Good luck.
ADD COMMENT
0
Entering edit mode

I'm pretty sure you can do this on Galaxy as well, using the same commands, but I don't know how. I find it a pain to use.

ADD REPLY
0
Entering edit mode

Thanks, for you answer, anp375. I have hg19.2bit file. How do i convert between them in first place?

ADD REPLY
0
Entering edit mode

You can't convert that. A GTF doesn't have the actual sequences in it. It's just an annotation file with gene names, coordinates, etc. You can either the latest release for the annotations:

ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/

or the earliest:

ftp://ftp.ensembl.org/pub/release-55/gtf/homo_sapiens/

Note that these are from ensembl and the instruction were meant for that. You can get something from UCSC over here: http://genome.ucsc.edu/cgi-bin/hgTables?command=start where you select the hg19 assembly, genes and gene predictions, region genome, track whatever names you want, and select GTF in the output format.

ADD REPLY

Login before adding your answer.

Traffic: 1648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6