Entering edit mode
2.8 years ago
pramach1
▴
40
I have 108 files of BLAST output. I am treating them as list files and filtering them out based on %identity and Qcoverage.
fnames <- list.files()
data4 = lapply(files, function(x) { res <- read.table(x, header=TRUE, sep="\t", quote = "", fill = FALSE) res$sample <- x res })
colnames <- c("qseqid", "sseqid", "stitle", "pident", "qcovs", "Sample")
out <- lapply(data, setNames, colnames)
data <- lapply(out, "[", 3:6)
data1 <- lapply(data, function (x) x[(x$qcovs > 90),])
data2 <- lapply(data1, function (x) x[(x$pident > 90),])
After this, I want to split the stitle column based on the paranthesis and this "|". How do I do that in the list of files.
Here is the example of the stitle column.
gb|AM260957.1|+|4186-5086|ARO:3003071|mphF [uncultured bacterium]
gb|NC_008618.1|-|1667063-1670624|ARO:3004480|Bifidobacterium adolescentis rpoB conferring resistance to rifampicin [Bifidobacterium adolescentis]
gb|AP006618.1|+|4835199-4838688|ARO:3000501|Nocardia rifampin resistant beta-subunit of RNA polymerase (rpoB2) [Nocardia farcinica IFM 10152]
gb|AY043299.1|-|3984-5175|ARO:3000167|tet(C) [Aeromonas salmonicida]
gb|AB571865.1|-|144312-145536|ARO:3003745|mefC [Photobacterium damselae subsp. damselae]
gb|AE004091.2|+|2810008-2813197|ARO:3000804|MexF [Pseudomonas aeruginosa PAO1]
gb|AB219524.1|+|1176-4338|ARO:3003699|mexQ [Pseudomonas aeruginosa]
I want the column split based on "|" and tab. Thank you for the help.
I have different number of rows but the same number of columns in 108 files. The number if rows range from 4000 to 12000 rows. If I have to use the above code, that means I have the same number of rows and exact same information on all the 108 files. I don't have that. so..how would I separate/split the column1 (stitle) on all 108 files? Thank you. I apologize for not being clear previously.
I think your confusion might be coming from
into=LETTERS[1:7]
since there also happens to be 7 rows. You're splitting thestitle
column into 7 separate columns, so that argument was just telling the function to name the 7 new columns A-G. This function works for any number of rows.I think I am doing something wrong. The first I did was
I ended up with a single data frame of this split into 7 columns.It is not separating the stitle column in the list of all108 files. But creating a single data frame only with this column split into 7 columns.
If you want to get into data analysis in R I would suggest reading R for Data Science by Hadley Whickam. It's going to be difficult to write R code without investing the time into learning it.
With that being said the code I provided was an example, and was not meant to be copy and pasted directly into your code. In your code it should look something like this.