Hi,
Here is a recurrent problem I face from time to time, specially because I would rather have object names that resemble my file names other than creating a list with different names for those given files. It is just easier for me to track down a file in case of troubleshooting. So, my question is how can I extract a pattern on my object rather than a specific string of characters? For example:
vcfs <- list.files()
vcfs
[1] "OV-TCGA-05-1456-01.vcf" "OV-TCGA-05-4578-01.vcf" "OV-TCGA-08-5666-01.vcf" "LUSC-TCGA-10-5684-01.vcf" "LUAD-TCGA-02-6574-01.vcf"
So, as you can see, the first part of each file defines the cohort type (OV, LUSC, LUAD) and the rest after "TCGA" is unique to each one of them. I would like to (1) remove the hyphens, (2) keep the cohort name, and (3) keep the 6 digits coming after "TCGA". So it should look like this:
"OV051456", "OV054578", "OV085666", "LUSC105684", "LUAD026574"
Now, I always struggle using those symbols (*, ., ?, \", "") to extract a string from a character object. So, if in addition any of you could also recommend me where to find a good tutorial on those, I would truly appreciate it. And sorry by the simple question. I am not a hardcore bioinformatician. And I love how this community is always so engaging and helpful.
So, thanks a lot in advance!
Cheers,
Douglas
I also find regex confusing, and often refer to this site to help me:
https://regexr.com/
It has good information, cheatsheets, guides, and a live editor in which you can play around with your expressions.