convert unique tags to fasta
1
0
Entering edit mode
5.4 years ago

I have a large fasta file of microRNA where unique tag is there in form of

>sequence1_x5
AGCTAGCTAGCTAGCT
>sequence2_x15
ATCTATCTATCT

and i want to convert into individual fasta file as

>1
AGCTAGCTAGCTAGCT
>2
AGCTAGCTAGCTAGCT
>3
AGCTAGCTAGCTAGCT
>4
AGCTAGCTAGCTAGCT
>5
AGCTAGCTAGCTAGCT
>6
ATCTATCTATCT
>7
ATCTATCTATCT

and so on....

I am new to bioinformatics kindly help

RNA-Seq • 920 views
ADD COMMENT
0
Entering edit mode

its not taking the right format i have asked question in fasta format but some error is there while posting

ADD REPLY
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

See the first answer in this thread: Renaming Entries In A Fasta File

 awk '/^>/{print ">" ++i; next}{print}' < file.fasta > new.fasta
ADD REPLY
0
Entering edit mode

This returns:

>1
AGCTAGCTAGCTAGCT
>2
ATCTATCTATCT

Without considering x5 and x15.

ADD REPLY
0
Entering edit mode

Indeed. Left here as an inspiration. Useful for anyone else who may find this thread by search.

ADD REPLY
0
Entering edit mode
5.4 years ago
AK ★ 2.2k

Hi manishbiotechie,

You can try:

$ cat input.fasta
>sequence1_x5
AGCTAGCTAGCTAGCT
>sequence2_x15
ATCTATCTATCT

$ cat input.fasta | awk 'BEGIN{RS=">"; i=1} NR>1 {gsub(".+_x", "", $1); {while ($1--) {print ">" i "\n" $2; i++}}}'
>1
AGCTAGCTAGCTAGCT
>2
AGCTAGCTAGCTAGCT
>3
AGCTAGCTAGCTAGCT
>4
AGCTAGCTAGCTAGCT
>5
AGCTAGCTAGCTAGCT
>6
ATCTATCTATCT
>7
ATCTATCTATCT
>8
ATCTATCTATCT
>9
ATCTATCTATCT
>10
ATCTATCTATCT
>11
ATCTATCTATCT
>12
ATCTATCTATCT
>13
ATCTATCTATCT
>14
ATCTATCTATCT
>15
ATCTATCTATCT
>16
ATCTATCTATCT
>17
ATCTATCTATCT
>18
ATCTATCTATCT
>19
ATCTATCTATCT
>20
ATCTATCTATCT
ADD COMMENT

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6