how to get the geneNames or bedfile from fasta sequences
1
0
Entering edit mode
4.0 years ago
Kai_Qi ▴ 130

Hi:

I have got a fasta file from a bedfile. Then I analyzed the fasta file and many motifs from the fasta sequeces (about 400bp each).

Now I used grep -i motif fastafile >> new_fastafile to get all the sequences that contains the motif. The structure of the new_fastafile is like this (not all the content of the head command):

$ head new_fastafile.fasta 
GTAAGTGGCACCCTGCCAGAGATCCCTCTCTGCCCTGGGTCTCATGCCTTCCTTTCTGCACCTCCAGACAATTTCTGCTGCCCCTAGGTCCCAGATTTCAGCTGTCCAGATGTCCAGGCCTTTTAAAGGGTCTAGGCAGGGGGTCCTACTGCTCACACAGTCCTCCCACTGGCTGTTATGTTTAAAATCCTAACCTGGC
GTAGGTGTGGACGACAGACAGCTGGGTGGCATGAGAATGCAGGTGCCAGGCGAACTAGAGGGTGGTGCTGGGTGCGTCGTACCATCGGGAGAAGATCCCCTCCCCCTCAGCCTCTGCTGAAAGCAACAAGGGAACCCCTAAAAGAAGGGCTAAGAAGGTATGCACAAGATACTGGGTCTTCCCCAAGAATGGGGCTGGA
GTGGGTAGCCTGGGGACCCCTAGCACCCCAGCCTTCACCACCATCACCTTCATCGCCACCATTACTGCGCTCACCTCCGGCTTGATCACTCAGTGTCATCCTGTGCTGGACGCTGTGCTGGGCCACCATGCCATGTTAAGTCATCCTGCCTCTCATACCATCATCACCTTGTTCACCTGTCAGGGGAGATGTAGGGGAG

How can I get the coordinates or geneNames so that I can know which genes these sequences come from? Thanks,

RNA-Seq genome gene ChIP-Seq • 1.1k views
ADD COMMENT
0
Entering edit mode
4.0 years ago
xiaoguang ▴ 160

you can use ncbi-blast to realign this motif sequence to your reference.

ADD COMMENT
0
Entering edit mode

Thanks for your reply. I did not express the situation fully. For example, I got a motif like this: GGTNNAAA, I can not blast it in NCBI. Second, I got the motif from fltered fasta files.

ADD REPLY
1
Entering edit mode

I think you must have one complete reference genome, otherwise you can not get the filtered fasta file. you first download local ncbi-blast program and using your reference to make database. Then you can blast your motif to your reference database to get coordinates

ADD REPLY
0
Entering edit mode

I see what you mean. I will have a try on the advice. I have retried to get a new fasta file yesterday. I used grep to get the header of the fasta file into a csv or txt file.

The format in the output txt file is like this:

>7:127020805-127021004(-)

I am wondering how to convert the txt file into bed, so that I can tried bed with GTF.

I have used cut -f but does not work well.

ADD REPLY
0
Entering edit mode

probably you need python or R?Maybe you should parse your coordinate to chromosome, start,end,strand. then make a six column file delimited by "\t", which contains chromosome,start,end,name,score and strand.

ADD REPLY
0
Entering edit mode

also ,excel can help you

ADD REPLY

Login before adding your answer.

Traffic: 1647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6