Hi everyone, I have been trying to edit the sequence titles within a fasta file and have come up short. Please can someone help if they can. Sorry if this is a dupication. I've tried searching around. There ar elots of similar things but i don't know how to change them to my specific situation.
I am doing illumina full genome sequencing of influenza viruses. The script that processes the data outputs a comnsensus sequence fasta file. Within that file there are 8 sequences. The file has a name based on the sample number submitted to the sequencingguuys, and the sequences inside the file have titles that are actually the reference used to map data against. I want to rename the file, and all 8 sequences, with the same name, but the 8 segments also need to say which gene they are.
I have started writing a script but am now stuck, please can anyone point me in the right direction.
This is the script i started Not sure how this will display
#!/bin/bash
set -e
#script to copy, rename, and edit fasta files
#Needs an input file
fastafile=$1
# Get path from file
DIR=$(dirname "$fastafile")
echo "$DIR"
#echo information if script not used properly
if [ $# -ne 1 ]; then
echo "Use: FluSeqOutputRename.sh <a fasta file you want renamed>
"
exit 1
fi
echo "
This script will edit "$fastafile"
Re-naming the file and editing the fasta headers
"
#check how many sequences are in the fasta file
NoSeqs=$(grep -o '>' "$fastafile" | wc -l)
echo "
There are "$NoSeqs" gene segments
"
if [ "$NoSeqs" -ne 8 ]; then
echo "
There are NOT 8 gene segments
Please Start Again
"
exit 1
fi
#i might be able to just do it with 1 input IF i can replace the / with - later
read -p "Please enter the virus name: " virname
NewFileName=$(echo "$virname" | tr '-' '_' | tr '/' '-')
echo "$virname"
echo "$NewFileName"
read -p "Your file will be named "$NewFileName" is this correct y or n?: " correct
if [ "$correct" = n ]; then
echo "Please Start again"
exit 1
fi
echo "I am continuing to copy and edit your file"
#Now need to copy the file and rename it
cp -i -v "$fastafile" "$DIR"/"$NewFileName".fasta
#fastacopy="$DIR"/"$flutype-"$species-"$country"-"$identifier"-"$year".fasta
fastacopy="$DIR"/"$NewFileName".fasta
echo "The new file is
"$fastacopy""
#I can make a new text file which is the header lines i want but does this even help?
echo "$virname" PB2 > "$DIR"/SequenceNames.txt
echo "$virname" PB1 >> "$DIR"/SequenceNames.txt
echo "$virname" PA >> "$DIR"/SequenceNames.txt
echo "$virname" HA >> "$DIR"/SequenceNames.txt
echo "$virname" NP >> "$DIR"/SequenceNames.txt
echo "$virname" NA >> "$DIR"/SequenceNames.txt
echo "$virname" MP >> "$DIR"/SequenceNames.txt
echo "$virname" NS >> "$DIR"/SequenceNames.txt
but thats as far as I can get. Any help gratefully received! Thank you James
Hello James
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you!
Many thanks genomax!
Hi James,
More useful than your code at this stage, would be examples of the input files, and the desired output format, since it's not super clear to me so far.
Hi, My input file looks something like this Sample12-top_matches-iter4_consensus.fa
And my desired output, because I end up in windows after this, would be
A-turkey-Scotland-123_456-2015.fasta
I'd recommend quoting variables so:
"everytime you ask for an interaction or everytime you print your messages to stdout, god kills a kitten"
Minimum Standards For Bioinformatics Command Line Tools
Maybe so, but maybe there are too many kittens in the world anyway. Can't think of any other way to tell the script what the name should be, it's not in another file. It needs to be made up by the operator. And I will remove most of the echo lines once I understand it's working. Was just using them to show me it was working how I expected :)
You do it as a flag to the command to start with :)
My answer shows a complicated/robust way to do it, but at its most basic it can be done using the 'special' variables"
$1, $2, $3... etc
e.g:
So you could do:
and
virusname == $1
... etc.Thank Ram I'll make those edits