How to remove [ACTG] from file names in bash
2
0
Entering edit mode
2.4 years ago

Hello i have several files which names have the following pattern

VIR3A_CCGCGGTT-CTAGCGCT_L00M_R1_001.fastq.gz 

VIR3Q_TAATACAG-GTGAATAT_L00M_R2_001.fastq.gz

 VIR4J_CGTTAGAA-GACCTGAA_L00M_R1_001.fastq.gz

I want to cut all the substrings which have ACTG. This is the desired output:

VIR3A_L00M_R1_001.fastq.gz 

VIR3Q_L00M_R2_001.fastq.gz

 VIR4J_L00M_R1_001.fastq.gz

how can I do that?

Thanks for your time :)

bash substring • 1.1k views
ADD COMMENT
0
Entering edit mode

why is it an issue ?

ADD REPLY
0
Entering edit mode

I'm trying to change the file names to the mentioned desired ones, there are like 40 of the files with the same pattern of ACTG elements

ADD REPLY
2
Entering edit mode
2.4 years ago
JC 13k
$ for F in *fastq.gz; do mv $F $(echo $F | perl -pe 's/_[ACGT]+?-[ACGT]+?_/_/'); done
mv VIR3A_CCGCGGTT-CTAGCGCT_L00M_R1_001.fastq.gz VIR3A_L00M_R1_001.fastq.gz
mv VIR3Q_TAATACAG-GTGAATAT_L00M_R2_001.fastq.gz VIR3Q_L00M_R2_001.fastq.gz
mv VIR4J_CGTTAGAA-GACCTGAA_L00M_R1_001.fastq.gz VIR4J_L00M_R1_001.fastq.gz
ADD COMMENT
1
Entering edit mode
2.4 years ago

with rename:

$ rename -n 's/_[ATGC]*-[ATGC]*//' *.gz

'VIR3A_CCGCGGTT-CTAGCGCT_L00M_R1_001.fastq.gz' would be renamed to 'VIR3A_L00M_R1_001.fastq.gz'
'VIR3Q_TAATACAG-GTGAATAT_L00M_R2_001.fastq.gz' would be renamed to 'VIR3Q_L00M_R2_001.fastq.gz'
'VIR4J_CGTTAGAA-GACCTGAA_L00M_R1_001.fastq.gz' would be renamed to 'VIR4J_L00M_R1_001.fastq.gz'

Remove -n once you are satisfied with dry-run.

with parallel:

$ parallel --plus --dry-run cp {} {=s/_\[ATGC\]+-\[ATGC\]+//=} ::: *.gz

cp VIR3A_CCGCGGTT-CTAGCGCT_L00M_R1_001.fastq.gz VIR3A_L00M_R1_001.fastq.gz
cp VIR3Q_TAATACAG-GTGAATAT_L00M_R2_001.fastq.gz VIR3Q_L00M_R2_001.fastq.gz
cp VIR4J_CGTTAGAA-GACCTGAA_L00M_R1_001.fastq.gz VIR4J_L00M_R1_001.fastq.gz

Remove dry-run if you are okay with output from dry run.

in bash shell (and sed):

$ for i in *.gz; do output=$(echo $i| sed 's/_[ATGC]*-[ATGC]*//'); echo cp $i $output; done

cp VIR3A_CCGCGGTT-CTAGCGCT_L00M_R1_001.fastq.gz VIR3A_L00M_R1_001.fastq.gz
cp VIR3Q_TAATACAG-GTGAATAT_L00M_R2_001.fastq.gz VIR3Q_L00M_R2_001.fastq.gz
cp VIR4J_CGTTAGAA-GACCTGAA_L00M_R1_001.fastq.gz VIR4J_L00M_R1_001.fastq.gz

Remove second echo if you are okay with dry-run.

ADD COMMENT
0
Entering edit mode

FYI, rename command is not a basic Linux tool, it needs to be installed with apt, yum, pacman, ...

ADD REPLY

Login before adding your answer.

Traffic: 2634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6