getting base-name for file in shell
2
0
Entering edit mode
8.0 years ago
mary99 ▴ 80

dear all,

I have so many fastq files like this;

XCGD2_R2.FB-P7-Rd2-LC.10_trimLCwithMCF.fastq but instead of 10 I have different numbers. on the other hand I have this file; XCGD2_R2.FB-P7-Rd2_trimLCwithFlexbar.fastq

I want to cancotenate the first files with second .Some how I need to get basename that are common but I don't know how? they are mix with other files for this reason I want to get common part.

any help, thanks

shell basename • 2.5k views
ADD COMMENT
3
Entering edit mode
8.0 years ago

Of course we are assuming that there is a pattern in the file names that you can exploit to group files. If not, either you have to do it manually or you need a look up table to link files to groups.

These are some useful string manipulations and expansions for bash, besides the * wildcard:

myfiles[135].fastq.gz will match myfiles1.fastq.gz, myfiles3.fastq.gz, and myfiles1.fastq.gz (but not, say myfiles2.fastq.gz or myfiles11.fastq.gz).

myfiles{1..10}.fastq.gz will try to match myfiles1.fastq.gz myfiles2.fastq.gz myfiles3.fastq.gz ... myfiles10.fastq.gz

${fq%%.*} will remove everything after the first . in string $fq while ${fq%.*} will remove everything after the last ..

And lots more here for example http://www.tldp.org/LDP/abs/html/string-manipulation.html

But for more sophisticated manipulations it might be better to use python or similar.

ADD COMMENT
0
Entering edit mode

dariober thank you .could you please help me to undrestand this code?

for i in R2.*_trimLCCwithMCF.fastq; do cat "$i" basename $i _trimLCCwithMCF.fastq_trimLCwithFlexbar.fastq > basename $i _trimLCCwithMCF.fastq_trimLCC_both.fastq; done

ADD REPLY
0
Entering edit mode

(Please format it properly with Code Sample option.)

As it looks to me unformatted, it doesn't make much sense. Among other things, you are redirecting the output of cat to the basename function. Maybe it works but if so it's certainly very cryptic!

ADD REPLY
0
Entering edit mode

To me it doesn't look like correct bash code.

ADD REPLY
0
Entering edit mode
8.0 years ago

XCGD2_R2.FB-P7-Rd2-LC.*_trimLCwithMCF.fastq should match all those files with different numbers. You can check this in that directory with running ls XCGD2_R2.FB-P7-Rd2-LC.*_trimLCwithMCF.fastq and see if all files are listed.

The * is a wildcard meaning "anything of any size" and is expanded by your shell.

Or did I misunderstand what you want to get?

ADD COMMENT
0
Entering edit mode

Thanks Wouter for your answer but in addition I need to concatenate * _trimLCwithMCF.fastq files with *_trimLCwithFlexbar.fastq .while they have only XCGD2_R2.FB-P7-Rd2 as common part .

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT to reply to earlier answers, as such this thread remains logically structured and easy to follow.

In that case, what about XCGD2_R2.FB-P7-Rd2*.fastq?

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6