Entering edit mode
13 months ago
noodle
▴
590
Dear Biostars,
I'd like to add a substring of the UMI to the name of each fastq line. Below is an example of what I'd like to do (that doesn't work) where I'd like to cut the first 7bp of R1 but only append the name with the first 6bp. Should this be possible? Can someone tell me how to correct my code to make this work? TIA!
cutadapt -u 7 --rename='{id} {comment} $(echo {cut_prefix} | cut -c1-6)'
The output of which is;
1:2101:13928:1000 1:N:0:GTCGCCTT+AAA/ACTAATT $(echo NTTTATT | cut -c1-6)
But what I want is;
1:2101:13928:1000 1:N:0:GTCGCCTT+AAA/ACTAATT NTTTAT
Please use a proper tool like
umi-tools
for this: https://umi-tools.readthedocs.io/en/latest/QUICK_START.html#step-3-extract-the-umisA side note to GenoMax's response, but the reason that doesn't work as-is: the prefix placeholder is filled in within cutadapt as it runs, but your shell's command substitution would need to happen before cutadapt runs, while the shell prepares the cutadapt call. (The single quotes around the rename pattern prevent your shell from interpreting the command you're trying to use, but if you switched to double quotes, it would just see a literal "{cut_prefix}" and turn it into "{cut_p" which clearly wouldn't work either.) The question made me curious if you can put arbitrary Python into cutadapt's format strings to get what you want, but apparently not. I like making use of cutadapt's impressive flexibility, but this task is beyond it, I think.
thanks, feature request incoming ;)