Getting standard 12-column format for a list of genes
1
0
Entering edit mode
3.7 years ago
zizigolu ★ 4.3k

Hi

I have a list of long non coding RNA

I need a standard 12-column format of them

How I can provide this?

Bed would look like

chr1    895966  901099  NM_198317   0   +   896073  900571  0   12  214,260,122,222,117,214,145,168,89,74,182,757,  0,706,1042,1239,1768,2117,2522,2750,3333,3520,3762,4376,

Thank you for any help

genome • 1.9k views
ADD COMMENT
1
Entering edit mode

you can download lncRNA db gtf from here and convert gtf to bed using bedops.

ADD REPLY
0
Entering edit mode

Hi, which desired format is this (is there a name?; from where did you find it?). In which format is your input data?

ADD REPLY
3
Entering edit mode

I'm guessing he wants to convert lncRNAs to BED 12 format, not a trivial task to automate (it certainly wasn't for circular RNAs)

Start with a BED 6 file, you must at the very least find the genomic coordinates of your lncRNA. You will also need a reference GTF file - filter out biotypes that are not associated with lncRNA biogenesis in your GTF file.

  1. Store the start and end coordinates of the lncRNA in a variable.
  2. Use bedtools intersect with your BED 6 file and filtered GTF file (force perfect overlaps and strand -s -f 1.00). You will now have a GTF file with all overlapping biotypes in the lncRNAs region.
  3. Use gtfToGenePred to convert the GTF file to a genepred file.
  4. Use genePredToBed to convert the genepred to a bed 12 file.
  5. Using the start and end positions of your lncRNA (step 1), extract the lines in the bed file from step 4 that match the start - end coordinates i.e columns 2 and 3. You may end up with multiple transcripts (not sure about lncRNA biogenesis). Perhaps sort the output on number of exon cassettes (column 10) and take the highest number as your lncRNA in BED12 format.

disregard, OP only wants to find out if lncRNA is coding :/

ADD REPLY
0
Entering edit mode

Thank you Kevin

This program can tells if something is coding or not

http://lilab.research.bcm.edu/cpat/index.php

Which need either a fasta format or bed

I have a list of long non coding RNAs which I want to know which one is coding

For this software I need to get a bed or fasta format of these genes

This is my lncRNA list

AZIN1-1
FGG-1
TBC1D3P2
TBC1D3P1-DHX40P1
FAM19A5-8
ZNF208
LOC100128531
TECRL
PTDSS1-1
DKFZP434K028
FAM84B-8
NPAS2-1
FLJ16779
PLCB4
RFPL3S
TAF1B-3:copy2
SH2D4A-2:copy2
CECR3
SEPSECS-1
TOP1-3
C16orf78-3
ADCY1-4
ANKRD28
OR7E156P
PPP4R1-5
RHOJ
BRE-AS1
C17orf51-2
USP47-6

Thank you for any help

ADD REPLY
1
Entering edit mode

For BED you need chromosomal coordinates for your lncRNAs, in the order chromosome | start_pos | end_pos | name. Only the first three columns out of a possible 12 are required.

ADD REPLY
0
Entering edit mode

You just need to know which is coding and non-coding?

ADD REPLY
0
Entering edit mode

I have already the coordinates of my lncRNA like

chr11   62619460    62623360    SNHG1
chr12   46777823    46781934    linc-FAM113B-3:copy2
chr7    26097439    26101262    linc-NPVF-2
chr1    28905050    28908366    SNHG12
chr1    212719036   212729407   linc-ATF3-2
chr17   74553846    74561430    SNHG16
chr1    173833039   173837125   GAS5
chr1    223354486   223361496   linc-C1orf65-1
chr1    76251879    76260775    RABGGTB
chr13   75811889    75814517    CTAGE11P
chr4    156127681   156129583   linc-FGG-1

I put this in the program but gave error

ADD REPLY
3
Entering edit mode
3.7 years ago

If we want the lincRNAs in Bed12 format, gff2bed has a tool for converting gtf to bed12.

Starting from the Ensembl GTF we can select the lincRNAs

zcat Ensembl.gtf.gz | grep "lincRNA" | gzip > lincRNAs.gtf.gz

We can then convert that into a bed12 file:

cgat gff2bed -I lincRNAs.gtf.gz --bed12-from-transcripts > lincRNAs.bed
ADD COMMENT
0
Entering edit mode

Sorry says

fi1d18@RBGO-Server2:~/Downloads$ cgat gff2bed -I lincRNAs.gtf.gz --bed12-from-transcripts > lincRNAs.bed

Command 'cgat' not found, did you mean:

  command 'gcat' from deb onioncat
  command 'ccat' from deb ccrypt
  command 'cgpt' from deb cgpt
  command 'cat' from deb coreutils
  command 'chat' from deb ppp
ADD REPLY
0
Entering edit mode

You need to install cgat-apps

ADD REPLY
0
Entering edit mode

Sorry where I can download Ensembl.gtf.gz ?

In https://www.ensembl.org/info/data/ftp/index.html I don't which file I should download to be in the same line with your code

ADD REPLY

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6