I am currently working on a shell script which uses minimap2 to map sequences to simulated references which have been shuffled by a recombinase enzyme. This all works fine and I am able to filter for the best matches and create new SAM files with hits with only >95% similarity. The problem I'm having at the moment is that I'm trying to create a text file output for downstream analysis using R. The text outputs are named after the read ID taken directly from the SAM file and I intend for the files to each contain a unique string which corresponds to the order and direction of modules in the shuffled array.
For example:
Read_001.txt
M3; M2; M5r; M4; M1
The unique string is actually used as the reference sequence ID so I don't have to recalculate module positions. I figured I'd use 'cut -f' to isolate the sequence IDs and Reference IDs and use those as inputs for the text file name and contents using an associative array in the following code:
for FILE in $MAPPED
do
declare -A seqs
SEQID=$(samtools view $FILE | cut -f1)
STRING=$(samtools view $FILE | cut -f3)
for key in ${SEQID[@]}
do
seqs["$key"]="${STRING[$key]}"
done
done
for key in "${!seqs[@]}"; do echo "$value" > "$key".txt; done
However, this leaves me with a bunch of empty text files though they are in the desired filename format. When inspecting the seqs array, it is evident that no values are being assigned to the SEQIDs hence it is returning empty fields. So my question is, how do I populate an associative array using loops as such that the relative positions of key and value from their original arrays are preserved? Or am I barking up the wrong tree, and is there a much easier way to do this?
I'm fairly new to shell scripts so forgive me.
What does
Read_001.txt
represent and how is it related to the code you posted?M3; M2; M5r; M4; M1
Those entries are what is inside that file on a single line?"Read_001" is an example read ID from the raw sequencing data. I don't expect it to be in as nice a format but I have been working with mock data thus far so have tried to make things easier on myself. Read_001.txt is the resultant filename that I am assigning to the new text files I'm trying to write. Essentially, they will all be uniquely named corresponding to their read ID.
And yes those entries are what I'm trying to get into the text file, but the exact string isn't relevant right now. These strings of M1; M2; M3 etc. are what I've used as the identifiers for each unique reference sequence and this corresponds to the position and orientation of modules in the sequence. As I'm using these as the Reference IDs, I supposed I could pull them from the SAM file using cut -f3. This works superficially, but in the context of trying to write new text files in a loop isn't giving me any output.
As previously stated, the text files are being written with the correct filename, but are all empty. I want the Reference ID string to be present in each text file.
Thanks.