Entering edit mode
4.0 years ago
Kai_Qi
▴
130
Hi:
I have got a fasta file from a bedfile. Then I analyzed the fasta file and many motifs from the fasta sequeces (about 400bp each).
Now I used grep -i motif fastafile >> new_fastafile
to get all the sequences that contains the motif. The structure of the new_fastafile is like this (not all the content of the head command):
$ head new_fastafile.fasta
GTAAGTGGCACCCTGCCAGAGATCCCTCTCTGCCCTGGGTCTCATGCCTTCCTTTCTGCACCTCCAGACAATTTCTGCTGCCCCTAGGTCCCAGATTTCAGCTGTCCAGATGTCCAGGCCTTTTAAAGGGTCTAGGCAGGGGGTCCTACTGCTCACACAGTCCTCCCACTGGCTGTTATGTTTAAAATCCTAACCTGGC
GTAGGTGTGGACGACAGACAGCTGGGTGGCATGAGAATGCAGGTGCCAGGCGAACTAGAGGGTGGTGCTGGGTGCGTCGTACCATCGGGAGAAGATCCCCTCCCCCTCAGCCTCTGCTGAAAGCAACAAGGGAACCCCTAAAAGAAGGGCTAAGAAGGTATGCACAAGATACTGGGTCTTCCCCAAGAATGGGGCTGGA
GTGGGTAGCCTGGGGACCCCTAGCACCCCAGCCTTCACCACCATCACCTTCATCGCCACCATTACTGCGCTCACCTCCGGCTTGATCACTCAGTGTCATCCTGTGCTGGACGCTGTGCTGGGCCACCATGCCATGTTAAGTCATCCTGCCTCTCATACCATCATCACCTTGTTCACCTGTCAGGGGAGATGTAGGGGAG
How can I get the coordinates or geneNames so that I can know which genes these sequences come from? Thanks,
Thanks for your reply. I did not express the situation fully. For example, I got a motif like this: GGTNNAAA, I can not blast it in NCBI. Second, I got the motif from fltered fasta files.
I think you must have one complete reference genome, otherwise you can not get the filtered fasta file. you first download local ncbi-blast program and using your reference to make database. Then you can blast your motif to your reference database to get coordinates
I see what you mean. I will have a try on the advice. I have retried to get a new fasta file yesterday. I used grep to get the header of the fasta file into a csv or txt file.
The format in the output txt file is like this:
I am wondering how to convert the txt file into bed, so that I can tried bed with GTF.
I have used
cut -f
but does not work well.probably you need python or R?Maybe you should parse your coordinate to chromosome, start,end,strand. then make a six column file delimited by "\t", which contains chromosome,start,end,name,score and strand.
also ,excel can help you