Getting the fasta format of a list of genes
1
0
Entering edit mode
3.8 years ago
zizigolu ★ 4.3k

Hello

If I have a list of genes

How I can get the fasta format of these like below?

>BC200
GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCTCTCAGGGAGGCTAAGAGGCGGGAGGATAGCTTGAGCCCAGGAGTTCGAGACCTGCCTGGGCAATATAGCGAGACT
>NPPA
AS1TGCTGGTCAGAGGTCCTGGGGGTGGTTTTGAACCATCAGAGCTTGGACTTTTCTGACTTCCCCAGCAAGGATCTTCCCACTTCCTGCTCCCTGTGTTCCCACCC
genome fasta alignment • 1.5k views
ADD COMMENT
0
Entering edit mode

Have you tried bedtools getfasta -fi <genome_fasta> -bed <your_bed>?

ADD REPLY
0
Entering edit mode

Where in the code you are defining the lost of genes?

ADD REPLY
0
Entering edit mode

You have been on this forum long enough to know that you need to ask far more detailed questions than this.

What list of genes? What identifiers? Where do you want to get the sequences from? What organism? Do you know the genome?

From your comment, it seems you understand that you need a tool which can take the gene list - have you tried looking for such tools?

ADD REPLY
0
Entering edit mode

Yup, I definitely have tools that I can tell you about but I would want to know the things Joe has listed above, plus:

  • sequence type (cDNA/genomic/CDS)

  • number of IDs

ADD REPLY
0
Entering edit mode

Thank you so much

I have a list of long non coding RNA (lncRNA) and their possible target genes for human version hg19

There is a web tool http://www.cuilab.cn/lnctar which takes the fasta formats of lncRNA and their targets and say which genes is most likely is the target of a given lncRNA

I have the list of lncRNA and even their genomic coordinates however I am not sure if they are coding or not but I need the fasta format of these for the mentioned prediction tool

I have something like below in my hand

chr11   62619460    62623360    SNHG1
chr12   46777823    46781934    linc-FAM113B-3:copy2
chr7    26097439    26101262    linc-NPVF-2
chr1    28905050    28908366    SNHG12
chr1    212719036   212729407   linc-ATF3-2
chr17   74553846    74561430    SNHG16
chr1    173833039   173837125   GAS5
chr1    223354486   223361496   linc-C1orf65-1
chr1    76251879    76260775    RABGGTB
chr13   75811889    75814517    CTAGE11P
chr4    156127681   156129583   linc-FGG-1

Thank you for any help

ADD REPLY
0
Entering edit mode

At least this much information should have been in your opening post. Please don't make us tell you again.

ADD REPLY
0
Entering edit mode

How many lines in this file, approximately?

ADD REPLY
0
Entering edit mode

Thank you @ Emily_Ensembl In total I have 8000 lncRNA but only 200 of those were differentially expressed betwen two groups of patients for which I want to predict their target genes

ADD REPLY
4
Entering edit mode
3.8 years ago
Emily 24k

BioMart. Filter by your list of IDs, get the sequence as an attribute. Help video if you want to use the online tool, instructions if you'd rather use the R package.

ADD COMMENT
0
Entering edit mode

Thanks a million

Could please look at this if I am doing right?

Attributes Sequence Unspliced (Gene) Gene name

enter image description here

ADD REPLY
0
Entering edit mode

if that's the sequence you want.

ADD REPLY
0
Entering edit mode

Thank you so much

Actually I want the nearby coding genes of lncRNAs excluding the gene with a distance of more than 100 kilobases upstream or downstream of ncRNAs.

Is this any filter to achieve this?

ADD REPLY
4
Entering edit mode

So you've received five comments and one answer from three different people, and now you tell us that your question is completely different to the one you asked originally?

I'm done. I'm going to change my notifications to Not Following for this post.

ADD REPLY

Login before adding your answer.

Traffic: 1708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6