FASTA sequence modification
3
0
Entering edit mode
8.1 years ago
itsanju87 ▴ 10

Hi my sequence is like tis, i want to remove new line character s between two fasta seq and also want to remove new line character within the sequence.

i have used tr -d '\n' <file name<="" p="">

but the file is not geting modified and saved, also there is no line break between the seq.

>gi|339714733|gb|JK209933.1|JK209933 RIRL_009841 Ralstonia solanacearum challenged root cDNA library of peanut (Arachis hypogaea) Arachis hypogaea cDNA clone ocprra0_0120_C01.ab1, mRNA sequence
GGGGAAGCAGAGGACGGCCATTGAGCAGCCAAACCACCAGCAGCGACAGCGATGGTCCTCAAGACGGAAC
TTTGCCGCTTTAGTGGCGCGAAGATCTACCCTGGGAAGGGCATCAGATTCGTTCGTGGTGATTCTCAGGT
CTTTCTGTTTTCAAACTCGAAATGCAAAAGGTATTTCCACAATCGTTTGAAGCCATCAAAGCTCACATGG
ACTGCCATGTATAGGAAGCAACACAAAAAGGACATTGCTCAAGAAGCTGTGAAGAAGAAGCGTCGTGCTA
CCAAGAAGCCATACTCTAGGTCAATTGTTGGTGCTACCTTGGAAGTTATCCAGAAGAGAAGAACCGAGAA
GCCTGAAGTTAGAGATGCAGCAAGGGAAGCTGCTCTTCGTGAAATTAAAGAGAGGATCAAGAAGACAAAA
GATGAGAAAAAGGCTATGAAGGCAGAGGTATCGGCTAAGCAACAAAAGGCACAGGGCAAAGGCCATGTTA
CAAAGGTTGCTGCACCGAAAGGTCCCAAACTTGGTGGAGGAGGTGGCAAA

>gi|339714732|gb|JK209932.1|JK209932 RIRL_009840 Ralstonia solanacearum challenged root cDNA library of peanut (Arachis hypogaea) Arachis hypogaea cDNA clone ocprra0_0120_B12.ab1, mRNA sequence
GGGGGACCAAGGAGTTCAACTGNGAGTGTGAGTCGTAGCAAAAAAAAAAAAAATTTAAAAAACAAATTTT
AAAAATAAATGCATACGGCGTAACGAAAGTAACTTAACAAAAAAAAAGGTAAGACGTCTTTTTTTTTTTA
AAAAGAGACTCCATTACCGGCCAGCTAGGAACACTGTTGTTTTGACTTAGTGCAATGACCAGTGCTTCTT
TTTTTTTTTTTTTTTTAAAAAAAAAAGAAAGAGTAGTTGAGCTAATGAACATTTTTGTTGAGACCCGAAA
GATGGTGAACTATTCTCGGACATGGTGAAGTCAGGTGAAAACTTGATGGAAGCTTGCGTTGTGAGGAACT
GACGTGCAAATCGTTGCCTCCTGAACTGAGTATAGGGGCGAAAGACTAATCGAACCATCGAGTAGCTGGT
TCTCTCCGAAGTTTCCCTCAGGATAGCTCGAGTCTATTTTTATAGAGTANCATGG
sequence • 1.8k views
ADD COMMENT
1
Entering edit mode

you could check this post for the second part of your question multiline fasta to one line fasta. for the fisrt part I find a way with sed StackOverflow sed '/^\s*$/d' file.

ADD REPLY
0
Entering edit mode
8.1 years ago

linearize http://stackoverflow.com/documentation/bioinformatics/4194/ and then "tr '\t' '\n'"

ADD COMMENT
0
Entering edit mode
8.1 years ago

You have another choice if using Windows. You can install Notepad++ which is a fantastic program with many different uses

One of them is to search and replace ascii codes, (it can use regular expressions)

You can tell the program to use Extended mode and then replace /n/n (which is a double lane carriage ) by a single /n and then your file will be fixed very easily

I am not a Macintosh guy, but I am sure that you can find similar programs

ADD COMMENT
0
Entering edit mode
8.1 years ago

quick perl cross-platform one-liner:

perl -pe 's/[\r\n]//g; s/^(>.+)/$n$1\n/; $n="\n"' test.fasta

it first removes any carriage return present (platform independent), and then it includes line breaks after all and before all-but-first sequence names.

ADD COMMENT

Login before adding your answer.

Traffic: 2722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6