Entering edit mode
7.9 years ago
bhanratt
▴
50
I am using UCSC's multiz 100 species vertebrate multiple alignment fasta for hg19. It is refGene.exponAA.fa available here: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/multiz46way/alignments/
The sequences seem to be broken up into fragments. For example the first sequence is:
>NM_152486.2_hg19_1_13 24 0 0 chr1:861322-861393+
MSKGILQVHPPICDCPGCRISSPV
In this example this is fragment 1 of 13. NM_152486.2_hg19_1_13 Further down there is 2_13, 3_13 etc.
I would like to concatenate all 13 fragments into 1 sequence for each refseq ID.
Is there existing software or a script that can perform this task?
Thanks for your response. I guess I didn't explain it very well. I need it to do it on all IDs and species and am just asking if anyone knows an existing method. Otherwise I can write one myself.
Thanks though!
Use bash loop, grep and sed one line command can deal with this problem. I give the sed part.