How to extract filename and change text in the same file
2
2
Entering edit mode
5.8 years ago

Hello,

I have about 30 VCF files with file names as ID_001.new.vcf. I want to extract only the "ID_001" part from the file name and change it in the header line of the VCF file where "Sample1" is given.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1

So that the result looks like that:

 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ID_001

How can I do it ? I tried to use echo in bash and extract the IDs from the Filename but I am unable to iterate it to change inside the file. Thanks for your help.

sequence VCF Script • 3.9k views
ADD COMMENT
0
Entering edit mode
  1. Extract sample names from VCF using bcftools (query -l)
  2. Prepare a new file with sample names (new names) one per line in the order of sample names from point 1
  3. Use bcftools reheader option to change the sample names from point 2.

Take a back up of original file before proceeding.

ADD REPLY
2
Entering edit mode
5.8 years ago
Jeffin Rockey ★ 1.3k

In bash this should do.

for i in *.new.vcf
do
        ID_NAME=$(basename "$i" .new.vcf)
        sed -i "1s|Sample1|$ID_NAME|g" $i
done

Caution: I have used -i with sed. So the actual files will get edited in place.

Now added 1s also as to limit the replacement to first line alone.

ADD COMMENT
2
Entering edit mode

I think would be better to use 'bcftools view --samples-file` than sed

ADD REPLY
0
Entering edit mode

Hi Pierre, I did not understand. Would bcftools view do any replacement ?

ADD REPLY
0
Entering edit mode

the option sample-file can be used to rename the samples. https://samtools.github.io/bcftools/bcftools.html

This file can also be used to rename samples by giving the new sample name as a second white-space-separated column, like this: "old_name new_name".

ADD REPLY
0
Entering edit mode

This works when all files have Sample1 in the file name. Will that be the case?

ADD REPLY
0
Entering edit mode

Yes all files have Sample1

ADD REPLY
0
Entering edit mode

@Jeffin , Thanks for your response. This line is not the first line within the file. How can I change sed in a way that it find the particular line where Sample1 is there and then change it to $ID_NAME ?

ADD REPLY
1
Entering edit mode

Changing 1s| to simply s| will do replacements for all Sample1 occurrences.

ADD REPLY
0
Entering edit mode

Thanks a lot. This worked !

ADD REPLY
3
Entering edit mode
5.8 years ago
Malcolm.Cook ★ 1.5k

If you have GNU parallel installed, you can use it instead of a bash for loop:

parallel 'sed -i "s|Sample1$|{=s/.new.vcf$//=}|"' {} ::: *.new.vcf
ADD COMMENT
0
Entering edit mode

Hi Malcom, The suggested command appears to be super efficient, even though I did not understand many of the usages. Can you please explain the {=s/.new.vc$f//=}, {}, ::: etc

ADD REPLY
1
Entering edit mode

Sure.

In general, in your command line:

  • {} gets replaced with the file being processed.
  • {=perl expression=} gets replaced with the value of a perl expression being evaluated in the context of the perl variable $_ being set to the name of the file being processed.

So, in my example, we are using sed to replace the word "Sample1' appearing at the end of line with the result of removing the trailing .new.vcf from each filename.

Documentation for this can be found in parallel's manpage by searching for "{=perl expression=}", and where you can also read

::: arguments
Use arguments from the command line as input source instead of stdin (standard input). 
ADD REPLY
0
Entering edit mode

Fix: vc$f -> vcf\$.

Also try: parallel --plus 'sed -i "s|Sample1$|{%.new.vcf}|"' {} ::: *.new.vcf

ADD REPLY
0
Entering edit mode

Hi Ole,

Could you please point me to some link or so which would help me understand the {},::: etc.

ADD REPLY
0
Entering edit mode

It is covered in GNU Parallel 2018 chapter 5 (Online https://doi.org/10.5281/zenodo.1146014, printed www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html)

ADD REPLY
0
Entering edit mode

thanks for the fix and the alternate!

ADD REPLY
0
Entering edit mode

deleted my comment since it is solved

ADD REPLY

Login before adding your answer.

Traffic: 1376 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6