Hi, i need help to write a script that will take for input a multi fasta file and output the gene names and the number of times the gene is found in the file in two columns.
Hi, i need help to write a script that will take for input a multi fasta file and output the gene names and the number of times the gene is found in the file in two columns.
simplest will be: Asuming fasta header as ">gene_name description..."
grep "^>" multi_fasta.txt | sed 's/>//' | awk '{print $1}' | sort | uniq -c | awk '{print $2 "\t" $1}' >gene_count.txt
Edit: using the link provided
curl http://dl.dropbox.com/u/43445136/examplefasta.fa | grep "^>" | sed 's/>//' | awk -F"|" '{print $1}' | sort | uniq -c | awk '{print $2 "\t" $1}'
Output
ENSTGUG00000000002 1
ENSTGUG00000000010 1
ENSTGUG00000000018 1
ENSTGUG00000000021 1
ENSTGUG00000000026 1
ENSTGUG00000000027 1
ENSTGUG00000000029 1
ENSTGUG00000000037 1
ENSTGUG00000000043 1
....
ENSTGUG00000012287|ENSTGUT00000012814|1475 ACCGGTGCCAGGGGCCGCGGTTGGCTGCGAAGCGGCGGCTCCCGCCCCCTGCGGAATCAGCCCCAGGTCCGGGGCGGCTCTACCTGCCGGCACGATGAACCTCACCGCCGAGAGCCACCGCATTCCGCTGAGCGACGGCAACAGCATCCCGCTCTTGGGGCTGGGCACCTACGCCGACCCGCAGAAAACTCCCAAAGGTTCCTGTCTGGAGGCGGTGAAGATTGCCATCGATGCTGGTTACCGCCACATCGACGGTGCCTTTGTCTACTTCAATGAGCATGAAGTGGGACAAGCCATCCGGGAGAAGATTGCTGAAGGGAAGATCAAGAGAGAAGACATATTTTACTGTGGCAAGCTGTGGAATACCTGCCACCCCCCAGAGCTGGTGCGTCCCACACTGGAGAAAACCCTGAAGATCCTGCAGCTGGACTACGTTGACCTCTACATTATTGAGCTGCCAATGGCTTTCAAGCCTGGAGATGCACTCTACCCAAAAGATGAAAATGGAAAATTTATCTACCATGAGACAGACTTATGTGCCACTTGGGAGGCTCTG
ENSTGUG00000012287|ENSTGUT00000012814|1475 ACCGGTGCCAGGGGCCGCGGTTGGCTGCGAAGCGGCGGCTCCCGCCCCCTGCGGAATCAGCCCCAGGTCCGGGGCGGCTCTACCTGCCGGCACGATGAACCTCACCGCCGAGAGCCACCGCATTCCGCTGAGCGACGGCAACAGCATCCCGCTCTTGGGGCTGGGCACCTACGCCGACCCGCAGAAAACTCCCAAAGGTTCCTGTCTGGAGGCGGTGAAGATTGCCATCGATGCTGGTTACCGCCACATCGACGGTGCCTTTGTCTACTTCAATGAGCATGAAGTGGGACAAGCCATCCGGGAGAAGATTGCTGAAGGGAAGATCAAGAGAGAAGACATATTTTACTGTGGCAAGCTGTGGAATACCTGCCACCCCCCAGAGCTGGTGCGTCCCACACTGGAGAAAACCCTGAAGATCCTGCAGCTGGACTACGTTGACCTCTACATTATTGAGCTGCCAATGGCTTTCAAGCCTGGAGATGCACTCTACCCAAAAGATGAAAATGGAAAATTTATCTACCATGAGACAGACTTATGTGCCACTTGGGAGGCTCTG
http://www.4shared.com/folder/GgfoEnew/_online.html here is the file plz check it out thanks
Hi, I think there is some problem in downloading the file plz go to this link http://uploading.com/files/7bbe159c/examplefasta.fa/ THank you
http://www.fileserve.com/file/kts2Z3h/examplefasta.fa this should work in think
This can be done using Biopieces www.biopieces.org) like this:
read_fasta -i test_big.fna -n 10 |
count_vals -k SEQ_NAME |
uniq_vals -k SEQ_NAME |
write_tab -ck SEQ_NAME_COUNT,SEQ_NAME -x
Cheers,
Martin
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It will be useful if you indicate whether the answers are helpful. When we get this type of "please write my code for me" question, I often wonder whether the answer even means anything to the questioner. If your problem is that you know nothing about scripting, my advice is to go away and learn some.