I have a list.csv file contains series of coordinates and the concerned coordinates are belongs to different fasta files namely ocean.fasta, lake.fasta, river.fasta
which are present in a target
folder. The list.csv
and ocean.fasta, lake.fasta, river.fasta files are shown below,
.
/target/
list.csv
Contig3,15,7
Contig2,5,10
Xantho1,12,3
ocean.fasta
>Contig2 contig1 Bacillus 985, ocean [298]
ACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCC
>Contig85 Bacillus 956, ocean [895]
ATGCNNNGCTAT
lake.fasta
>Xantho1 [Pseudomonas] cissicola strain CCUG 18839 contig_0000001
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAG
>Contig8 [Pseudomonas] cissicola strain CCUG 18839 contig_0000008
ATGCTTAGCTGATGC
>Contig20 [Pseudomonas] cissicola strain CCUG 18839 contig_0000020
ATGCTTAGCTGATGCTAGTA
river.fasta
>Contig3 8954 e.coli [856]
GCTGCGGCGCTGATCCTGGCGGCCCGCGCCGAG
>Contig8 8954 e.coli [859]
TAGTGCGTATAT
Contig3,15,7
and Xantho1,12,3
are not in right order, I mean the $2>$3, therefore I need to order those coordinates as $2<$3. Further I need to reverse complement those sequences extracted from Xantho1,3,12
and Contig3,7,15
. In addition to that, I need to save those extracted sequences as new_ocean.fasta, new_lake.fasta, new_river.fast in a fresh folder namely target_sequences. The expected output as follows,
./target/target_sequences
new_river.fasta
>Contig3
GATCAGCGC
new_ocean.fasta
>Contig2
AGCGCT
new_lake.fasta
>Xantho1
CTTGATGGCC
I have used following script but end up with an error,
for file in *.fasta
do
fastaexplode "$file" &&
awk -F[:-] '{
if($2>$3){
start=$3-1;
len=$2-start" -"
}
else{
start=$2-1;
len=$3-start
}
print $1,start,len}' list.csv &&
tmpFile=$(mktemp);
> subseqs.fa
awk -F[:-] '{
if($2>$3){
start=$3-1;
len=$2-start" -"
}
else{
start=$2-1;
len=$3-start
}
print $1,start,len}' list.csv |
while read cont start len rev; do
fastasubseq "$cont".fa $start $len > $tmpFile;
if [[ -n $rev ]]; then
fastarevcomp "$tmpFile" >> subseqs.fa;
else
cat "$tmpFile" >> subseqs.fa;
fi && cp subseqs.fa target_sequences/"new_${file}"
done
Please help me to do the same.
Did you write the awk script from scratch? At what stage of writing the script did it stop working and give you the error?
@RamRS It shows error at last stage. It extracts the sequence and reverse complementing where ever is required, but failed to create new_file. I know that I messed up the script while adopting my purpose. if it possible could you please help me to correct it.