How to print two lines of several files to a new file with the speicific order?
2
0
Entering edit mode
7.3 years ago
ThulasiS ▴ 90

I have a task to do. I am doing multiple sequence analysis for some genes. I have several files with sequences in order. I would like to extract first sequence of each file into new file and like till the last sequence. I know only how to do with first or any specific line with awk. awk 'FNR == 2 {print; nextfile}' *.txt > newfile

Now I learned for two files with this

paste File1 File2 | awk '{ p=$2;$2="" }NR%2{ k=p; print }!(NR%2){ v=p; print $1 RS k RS v }'

Here I have input like this

File 1
Saureus081.1
ATCGGCCCTTAA
Saureus081.2
ATGCCTTAAGCTATA
Saureus081.3
ATCCTAAAGGTAAGG

File 2

SaureusRF1.1
ATCGGCCCTTAC
SauruesRF1.2
ATGCCTTAAGCTAGG
SaureusRF1.3
ATCCTAAAGGTAAGC

File 3 
SaureusN305.1 
ATCGGCCCTTACT 
SauruesN305.2 
ATGCCTTAAGCTAGA 
SaureusN305.3 
ATCCTAAAGGTAATG

And similar files 12 are there File 3 File 4 . . . .File 12 Required

Output
Saureus081.1
ATCGGCCCTTAA
SaureusRF1.1
ATCGGCCCTTAC
SaureusN305.1
ATCGGCCCTTACT
Saureus081.2
ATGCCTTAAGCTATA
SaureusRF1.2
ATGCCTTAAGCTAGG
SauruesN305.2
ATGCCTTAAGCTAGA
Saureus081.3
ATCCTAAAGGTAAGG
SaureusRF1.3
ATCCTAAAGGTAAGC
SaureusN305.3
ATCCTAAAGGTAATG

Thank you

sequence awk • 3.1k views
ADD COMMENT
0
Entering edit mode

Why the output has

Seq1
Seq1
Seq2

Can you precisely tell what is the output ? Do you want to create a separate file for each sequence and write the sequence from all the files ? Write seq1 from all files to a single file, and seq2 from all files to another file ... and so on ?

ADD REPLY
0
Entering edit mode

I typed those just for example Sorry for the laziness. Its nothing but Seq of next file with sequence.

ADD REPLY
0
Entering edit mode

works with bash shell. Install seqkit from here. Keep your fasta files in a separate folder. Output will be out.fasta and extension can be customized.

Code that works (on ubuntu/mint with bash shell):

# lists fasta files in current directory and counts the number of fasta records in first file

n=$(grep \>  $(ls *.fa| head -1) | wc -l)

# there are two loops here. Outer loop is on number of fasta records in each file and inner loop works on number of fasta files in current directory.

 for j in $(seq 1 $n); do
    for i in $(ls *.fa)
          do
          seqkit fx2tab $i | awk "NR==$j {print}"| seqkit tab2fx >> out.fasta
          done
done

input files (input files are copy/pasted from above):

$ ls 
test1.fa  test2.fa  test3.fa

input:

$ cat test1.fa 
>Saureus081.1
ATCGGCCCTTAA
>Saureus081.2
ATGCCTTAAGCTATA
>Saureus081.3
ATCCTAAAGGTAAGG

$ cat test2.fa 
>SaureusRF1.1
ATCGGCCCTTAC
>SauruesRF1.2
ATGCCTTAAGCTAGG
>SaureusRF1.3
ATCCTAAAGGTAAGC

$ cat test3.fa 
>SaureusN305.1 
ATCGGCCCTTACT 
>SauruesN305.2 
ATGCCTTAAGCTAGA 
>SaureusN305.3 
ATCCTAAAGGTAATG

Post ouptut:

$ ls
out.fasta  test1.fa  test2.fa   test3.fa  script.sh

output (from the above command): .

$ cat out.fasta 
>Saureus081.1
ATCGGCCCTTAA
>SaureusRF1.1
ATCGGCCCTTAC
>SaureusN305.1 
ATCGGCCCTTACT 
>Saureus081.2
ATGCCTTAAGCTATA
>SauruesRF1.2
ATGCCTTAAGCTAGG
>SauruesN305.2 
ATGCCTTAAGCTAGA 
>Saureus081.3
ATCCTAAAGGTAAGG
>SaureusRF1.3
ATCCTAAAGGTAAGC
>SaureusN305.3 
ATCCTAAAGGTAATG
ADD REPLY
3
Entering edit mode
7.3 years ago
microfuge ★ 1.9k

Just a paste and awk based approach based on the assumption that a) the sequence is in one line and not wrapped b) no tabs spaces in parent fasta. The getline function in awk gets the next line and it is then skipped by awk.

paste -d "\t"  *.fa |awk '{getline y;split(y,z,"\t");for (i=1;i<=NF;i++){print $i "\n" z[i]}  }'
ADD COMMENT
0
Entering edit mode

Thank you So much.. It worked exactly like how I needed the output

ADD REPLY
1
Entering edit mode
7.3 years ago

using sqlite3: just put your sequence in a database and pull them out.

 v=1 && \
rm -f db.sqlite3 &&  \
sqlite3 db.sqlite3 'create table S(name,sequence,num,file);' &&  \
for F in input1.txt input2.txt input3.txt
do
awk -v fidx=$v '{if(NR%2==1) {printf("insert into S(name,sequence,num,file) values(\"%s\",\"",$0);} else {num++;printf("%s\",%d,%d);\n",$0,num,fidx);}}'  $F |\
sqlite3 db.sqlite3; 
((v++))
done && \
v=$(sqlite3 db.sqlite3 'select max(num) from S;') && \
while [ $v -gt 0 ]
do
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--));
done
ADD COMMENT
0
Entering edit mode

Hi Thanks for script but when I am running this script I am getting these errors

Error: near line 1: near "v": syntax error
Error: near line 5: near "do": syntax error
Error: incomplete SQL: ((v++))
done && \
v=$(sqlite3 db.sqlite3 'select max(num) from S;') && \
while [ $v -gt 0 ]
do
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--));
done

I don't know the reason because I never used sqlite earlier. Thank you

ADD REPLY
0
Entering edit mode

Ah yes, I've reformatted it on the fly. I've just removed a semicolon, can you please try again please.

ADD REPLY
0
Entering edit mode

I tried but again same errors I want to edit my question once. Please take a look and thank you for your help

Error: near line 1: near "v": syntax error
Error: incomplete SQL: ((v++))
done && \
v=$(sqlite3 db.sqlite3 'select max(num) from S;') && \
while [ $v -gt 0 ]
do
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--));
done
ADD REPLY
0
Entering edit mode

okay here is the gist that worked on my machine:

ADD REPLY
0
Entering edit mode

Sorry again Only syntax errors are coming

Error: near line 1: near "v": syntax error
Error: near line 5: near "do": syntax error
Error: near line 8: near "(": syntax error
Error: near line 9: near "done": syntax error
Error: incomplete SQL: do 
sqlite3 db.sqlite3 "select (name||x'0A'||sequence) from S where num=$v order by file;" > out${v}.txt ;((v--)); 
done

Maybe fault with my data?

Shell is bin/zsh

Sorry I can't add more replies since I am a new user. So I am editing my previous commnet. Thank you

ADD REPLY
0
Entering edit mode

what is your shell ?

ADD REPLY
1
Entering edit mode

Shell is bin/zsh

From post above.

ADD REPLY
0
Entering edit mode

use /bin/bash please

ADD REPLY
0
Entering edit mode

Okay Sure I'll try with Bash But I found the solution here given by microfuge

ADD REPLY

Login before adding your answer.

Traffic: 1737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6