I have a task to do. I am doing multiple sequence analysis for some genes. I have several files with sequences in order. I would like to extract first sequence of each file into new file and like till the last sequence. I know only how to do with first or any specific line with awk. awk 'FNR == 2 {print; nextfile}' *.txt > newfile
Now I learned for two files with this
paste File1 File2 | awk '{ p=$2;$2="" }NR%2{ k=p; print }!(NR%2){ v=p; print $1 RS k RS v }'
Here I have input like this
File 1
Saureus081.1
ATCGGCCCTTAA
Saureus081.2
ATGCCTTAAGCTATA
Saureus081.3
ATCCTAAAGGTAAGG
File 2
SaureusRF1.1
ATCGGCCCTTAC
SauruesRF1.2
ATGCCTTAAGCTAGG
SaureusRF1.3
ATCCTAAAGGTAAGC
File 3
SaureusN305.1
ATCGGCCCTTACT
SauruesN305.2
ATGCCTTAAGCTAGA
SaureusN305.3
ATCCTAAAGGTAATG
And similar files 12 are there File 3 File 4 . . . .File 12 Required
Output
Saureus081.1
ATCGGCCCTTAA
SaureusRF1.1
ATCGGCCCTTAC
SaureusN305.1
ATCGGCCCTTACT
Saureus081.2
ATGCCTTAAGCTATA
SaureusRF1.2
ATGCCTTAAGCTAGG
SauruesN305.2
ATGCCTTAAGCTAGA
Saureus081.3
ATCCTAAAGGTAAGG
SaureusRF1.3
ATCCTAAAGGTAAGC
SaureusN305.3
ATCCTAAAGGTAATG
Thank you
Why the output has
Can you precisely tell what is the output ? Do you want to create a separate file for each sequence and write the sequence from all the files ? Write seq1 from all files to a single file, and seq2 from all files to another file ... and so on ?
I typed those just for example Sorry for the laziness. Its nothing but Seq of next file with sequence.
works with bash shell. Install seqkit from here. Keep your fasta files in a separate folder. Output will be out.fasta and extension can be customized.
Code that works (on ubuntu/mint with bash shell):
input files (input files are copy/pasted from above):
input:
Post ouptut:
output (from the above command): .