Gene list from .bed file needed, please help!
2
0
Entering edit mode
7.5 years ago
o.hickman ▴ 10

I have bed files (from an ENCODE eCLIP experiment) in the format below.

I need to obtain a gene list from the chromosomal coordinates.

I have tried Galaxy: using USCS table browers KnownGene and kgXref functions, and join operations, but the gene list I get has clearly been duplicated in some way as some genes have the correct number of eCLIP tags, and some have thousands more than are evident when I view the .bed file in IGV.

has anybody got a simple, up to date way or solving this. I do not code so simple explanations if possible. Previous workflows in galaxy have not worked.

Thanks in advance!!

Oliver

chr7 155100450 155100506 rep02 1000 + 4.49254608837777 22.7294143201152 -1 -1

chr7 155100424 155100441 rep02 1000 + 3.74937915504325 15.3042207236355 -1 -1

alignment RNA-Seq eCLIP iCLIP Galaxy • 2.7k views
ADD COMMENT
1
Entering edit mode

For your next post, don't forget to specify that you don't use Linux. You are making it harder on yourself as such because many tools in bioinformatics are made for Linux. Some might be available in Windows as well, but not optimal.

ADD REPLY
1
Entering edit mode

don't forget to specify that you don't use Linux

enter image description here

ADD REPLY
0
Entering edit mode

If you happen to use right Win10 version you would be able to use the unix bash shell available. But I do concur with @Wouter.

ADD REPLY
0
Entering edit mode
7.5 years ago
$ cat input |\
awk '{printf("select K.chrom,MIN(K.txStart),MAX(K.txEnd),X.geneSymbol from knownGene as K,kgXref as X where K.chrom=\"%s\" and NOT(K.txEnd < %s or K.txStart>%s) and K.name=X.kgId group by K.chrom,X.geneSymbol;\n",$1,$2,$3);}' |\
mysql -N --user=genome --host=genome-mysql.soe.ucsc.edu -A -D hg19  |\
sort | uniq
ADD COMMENT
0
Entering edit mode

Hi Pierre, Would you mind explaining that post? Thanks, Oliver

ADD REPLY
0
Entering edit mode
  • 'input' is your bed file;
  • awk is used to build a mysql query fetching the chrom/start/end/geneSymbol from the UCSC in each BED line.
  • pipe those queries into mysql
  • remove the duplicates with sort | uniq
ADD REPLY
0
Entering edit mode

Hi Thanks Pierre, Is this using R? Thanks, Oliver

ADD REPLY
0
Entering edit mode

No it is not. It is using cat/awk/sort/uniq that are built into UNIX and mysql.

ADD REPLY
0
Entering edit mode
7.5 years ago
EagleEye 7.6k

I assume you would like to associate your binding sites (From ENCODE eCLIP) to genes or different genomic locations. In that case you can make use of, GREAT OR Homer annotate peaks (Homer will provide detailed results when you use your own GTF annotation).

ADD COMMENT
0
Entering edit mode

Thanks for that, I don't have a Unix OS but if I get access to one I will try downloading HOMER and give it a try. O.

ADD REPLY
0
Entering edit mode

UNIX/LINUX is always bioinformatics friendly.

ADD REPLY

Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6