The chalange is to append read names with UMIs, that stored in a separate file.
The sceme of files with reads:
@name 2:N:0:indx #or 1:N:0
READ
+
QUALITY
The sceme of UMI-files:
@name 2:N:0:indx
UMI
+
QLT
And what I ecpect to get:
@name_UMI 2:N:0:indx #or 1:N:0
READ
+
QULT
I wrote the code and it worked on small data. But The code on the sample stuked and did not processe any first reads. Can you suggest what whent wrong?
The code:
DIR="/rename"
i=1
s=2
len=$(wc -l R1.fastq)
len=$(echo $len)
len=$(echo ${len//R1.fastq/})
c=1
while [ ${c} -lt ${len} ]; do
k=$( echo ${s}p)
m=$( echo ${i}p)
#extracting the read name from UMI-file
name=$(echo $(sed -n ${m} UMI.fastq | cut -d" " -f1))
#extracting UMI sequence from UMI-file
umi=$(echo $(sed -n ${k} UMI.fastq))
#creating a variable with appended name
a=$name"_"$umi
#substituting name in original read-file with appended name and rewriting the file to save changes for next cycles
file=$(<$DIR/R1.fastq)
echo "${file//$name/$a}" > $DIR/R1.fastq
i=$(($i+4))
s=$(($s+4))
c=$(($c+4))
done
The test input small file:
@NB501229:643:HVY7VAFX2:1:11101:20852:1041 2:N:0:TCCTGAGC
GTCTCGTGGTCTTTTCTCACATAAGCTACATGGCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTGACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATTAAAA
+
EEAEEEEEEEEEEEEAEEEEEEEEEEEEEAEAAEEE/EEEEEEEE<EEE/EAEEAEEEAEEEEEA/EAAEEEEEA//AEE<<EE/EEAEA<E<EEEEEEEEEEAE<AAEEEE<A<<EEAEEA6EAE<E
@NB501229:643:HVY7VAFX2:1:11101:12863:1042 2:N:0:TCCTGAGC
TGGATGCTCGTGGTGAAGAAGAATCAGCTTCCCCAGGATCAGCACCAGGCCTGGATGTTTGGACATTTCGGCATCATTGCCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTGACGCTGC
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEAEAEAE<AEEAEAEEEEEEEAE/A<AA
@NB501229:643:HVY7VAFX2:1:11101:18188:1042 2:N:0:TCCTGAGC
GATTCTGTTGCTGAAATGCTGTAACTGTAGTAATGTAAACCATTGTCTCCATGATCATGTTTCCTGTGTTGTAGATTATGTAACTGCATGGCTTACATGAGGGGTCCTCATGTAAGTGCAGCAAGTCT
+
AE<E<EEEEEEEEEE//EEEEEEEEEEEEEEE/EEEE//AEEEEEEAEE<<EEEEEEAEE/EE/EEEAEEEE<EAEAAEEEEEEEEEEEEA<<EEAAE<E<EAEEE/AAEEEEEEEAEEEEEEEAA<6
@NB501229:643:HVY7VAFX2:1:11101:9570:1042 2:N:0:TCCTGAGC
GGGCCTCCCGCGCACTGCTTGGCATATTAATTAAGAATATCCTCGCTGAGGCCTGACACTGTAGTCTGGGAACTATACTCCGAGTCGCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTG
+
A//E/<//AEEEE/A/6/<///</<///////////////////////A//<EEAEA<AAEE/A<EEEEAEEEEEEE////<E/E6/6A/EE/A<EEE/AA<///A/AA<<A6<A/EE<AAA<A////
@NB501229:643:HVY7VAFX2:1:11101:20006:1042 2:N:0:TCCTGAGC
GAAGGGCCTGACCTCACCCTTGAGGACGTGCTATGGTGGCCCGCAGCGAGGGTCCCTGCCACCCAGCCATGGCCAGAGCACCTGCCACGTGCCAGGCACTGTCTGAGTCCTGAGTCTAGTCACGCGGG
+
E<EEEEAEEE/<A<EE///////E/EEEAEEA//A6EEE<AEEE<EEAE//E<EEEEEEE/EEEAEE/<EAEEEEEEEEE<AEEEEEEAAEA66AAE/<AEEEAAEEAA<AAEE/AAEE/</AEAA<<
@NB501229:643:HVY7VAFX2:1:11101:26099:1044 2:N:0:TCCTGAGC
GCCGAGTTGAAGCCCCGCTTCCTGTAGGACATCGTGATCGACGCCATTGGCGGTAGCAGGCCCCCTTGGCCGCCCCTGGAGTACCAGCCCTACCAGAGCATCTACGTCGGGGGCTTGATGGAAGGGGG
+
AAEEEEEEAE///A/EAAEE//EE//E/////A/EEEEAE/EE<E//<<AAEEEEEA/A/E//E/AAEEEEE<///EEE/AA/<A//6<<E//<E//AAEEA///<A<A/E/A/<AAE/EE//EEEE/
@NB501229:643:HVY7VAFX2:1:11101:5399:1044 2:N:0:TCCTGAGC
AGGACCAGCCCCCCCCCCCCCCGCCCCCACCGCGCCCACCCACCCAGGGGGCCCGGCCAAACGCGCAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGCCCCGGGGGGGGGGGGAACGGGCCCCCG
+
/////////E/EEAEA//6EEE///6AE///////AE//////A//////E//E///////////////EE<E///</<EA/A<///<EAAAA//<////6//////////6//////A/////AE//
Test output file:
@NB501229:643:HVY7VAFX2:1:11101:20852:1041_GAATCGGGACGA 2:N:0:TCCTGAGC
GTCTCGTGGTCTTTTCTCACATAAGCTACATGGCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTGACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATTAAAA
+
EEAEEEEEEEEEEEEAEEEEEEEEEEEEEAEAAEEE/EEEEEEEE<EEE/EAEEAEEEAEEEEEA/EAAEEEEEA//AEE<<EE/EEAEA<E<EEEEEEEEEEAE<AAEEEE<A<<EEAEEA6EAE<E
@NB501229:643:HVY7VAFX2:1:11101:12863:1042_ACGTCCGAGGAG 2:N:0:TCCTGAGC
TGGATGCTCGTGGTGAAGAAGAATCAGCTTCCCCAGGATCAGCACCAGGCCTGGATGTTTGGACATTTCGGCATCATTGCCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTGACGCTGC
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEAEAEAE<AEEAEAEEEEEEEAE/A<AA
@NB501229:643:HVY7VAFX2:1:11101:18188:1042_ATTTCTAGTTCC 2:N:0:TCCTGAGC
GATTCTGTTGCTGAAATGCTGTAACTGTAGTAATGTAAACCATTGTCTCCATGATCATGTTTCCTGTGTTGTAGATTATGTAACTGCATGGCTTACATGAGGGGTCCTCATGTAAGTGCAGCAAGTCT
+
AE<E<EEEEEEEEEE//EEEEEEEEEEEEEEE/EEEE//AEEEEEEAEE<<EEEEEEAEE/EE/EEEAEEEE<EAEAAEEEEEEEEEEEEA<<EEAAE<E<EAEEE/AAEEEEEEEAEEEEEEEAA<6
@NB501229:643:HVY7VAFX2:1:11101:9570:1042_ATAGCCGCGAAA 2:N:0:TCCTGAGC
GGGCCTCCCGCGCACTGCTTGGCATATTAATTAAGAATATCCTCGCTGAGGCCTGACACTGTAGTCTGGGAACTATACTCCGAGTCGCAAAACGCAATACTGTACATTCTGTCTCTTATACACATCTG
+
A//E/<//AEEEE/A/6/<///</<///////////////////////A//<EEAEA<AAEE/A<EEEEAEEEEEEE////<E/E6/6A/EE/A<EEE/AA<///A/AA<<A6<A/EE<AAA<A////
@NB501229:643:HVY7VAFX2:1:11101:20006:1042_AGGGGGTATTAC 2:N:0:TCCTGAGC
GAAGGGCCTGACCTCACCCTTGAGGACGTGCTATGGTGGCCCGCAGCGAGGGTCCCTGCCACCCAGCCATGGCCAGAGCACCTGCCACGTGCCAGGCACTGTCTGAGTCCTGAGTCTAGTCACGCGGG
+
E<EEEEAEEE/<A<EE///////E/EEEAEEA//A6EEE<AEEE<EEAE//E<EEEEEEE/EEEAEE/<EAEEEEEEEEE<AEEEEEEAAEA66AAE/<AEEEAAEEAA<AAEE/AAEE/</AEAA<<
@NB501229:643:HVY7VAFX2:1:11101:26099:1044_CGTTTCGGGGTA 2:N:0:TCCTGAGC
GCCGAGTTGAAGCCCCGCTTCCTGTAGGACATCGTGATCGACGCCATTGGCGGTAGCAGGCCCCCTTGGCCGCCCCTGGAGTACCAGCCCTACCAGAGCATCTACGTCGGGGGCTTGATGGAAGGGGG
+
AAEEEEEEAE///A/EAAEE//EE//E/////A/EEEEAE/EE<E//<<AAEEEEEA/A/E//E/AAEEEEE<///EEE/AA/<A//6<<E//<E//AAEEA///<A<A/E/A/<AAE/EE//EEEE/
@NB501229:643:HVY7VAFX2:1:11101:5399:1044_GTTAACGCGTAT 2:N:0:TCCTGAGC
AGGACCAGCCCCCCCCCCCCCCGCCCCCACCGCGCCCACCCACCCAGGGGGCCCGGCCAAACGCGCAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGCCCCGGGGGGGGGGGGAACGGGCCCCCG
+
/////////E/EEAEA//6EEE///6AE///////AE//////A//////E//E///////////////EE<E///</<EA/A<///<EAAAA//<////6//////////6//////A/////AE//
seqkit rename
with external file and defined kv will helpThank you, I learned about new softwar