R "For I In" Plot In Sorted Order
4
0
Entering edit mode
12.6 years ago
PoGibas 5.1k

My files (>100) are:

AT.BP.50.txt
AT.BP.200.txt
AT.BP.500.txt 
SP.BP.50.txt
SP.BP.200.txt 
SP.BP.500.txt 
....

I want to plot them with R.

Usually I do it by this:

files <- list.files()
par(mfrow=c(3,3))
for (i in 1:length(files)) {
b <- read.table(files[i])
barplot(table(b), main=files[i])
....

But R plots them in such order:

"AT.BP.200.txt" "AT.BP.500.txt" "AT.BP.50.txt"

"SP.BP.200.txt" "SP.BP.500.txt" "SP.BP.50.txt"

........

And I want them to be plotted in sorted order:

"AT.BP.50.txt" "AT.BP.200.txt" "AT.BP.500.txt"

"SP.BP.50.txt" "SP.BP.200.txt" "SP.BP.500.txt"

........

How can I do that?

plot • 3.3k views
ADD COMMENT
3
Entering edit mode

What is the point of closing a question if it already has an answer? I sort files and stuff all the time when I'm doing bioinformatics, and I'm rarely confident I'm doing it the best way. I depend on serendipitous information like this to stumble on better methods. Given that there's already an answer, closing the question simply precludes the possibility of a better answer. How does that make life better?

ADD REPLY
0
Entering edit mode

I think the original poster would actually get the best answer on stack overflow, there's a lot of R gurus over there. I recommend any people here interested in R questions like this follow the rss feed for the R tag from stack overflow. Biostars is supposed to be limited to bioinformatics specific questions. Just trying to keep the internet organized.

ADD REPLY
3
Entering edit mode

"Biostars is supposed to be limited to bioinformatics specific questions." I guess that explains why a question about decreasing sequencing costs remains open and gets 29 votes. :) I agree that this question could have been geared more explicitly towards bioinformatics with perhaps two words "My sequence score files..." But consider that a programmer doing bioinformatics might see this as simply a sorting question to be asked in another expert forum, whereas a biologist trying to do bioinformatics thinks that engaging in programming in a biological context is doing bioinformatics and would ask it where they see people doing bioinformatics, and people like me find it related, and useful, and welcome, even thought the word "bioinformatics" was not used in the question. Just trying to keep the internet open and collegial.

ADD REPLY
1
Entering edit mode

Okay, go for it.

ADD REPLY
0
Entering edit mode

I would post this on stackoverflow.com, it's not exactly bioinformatics-centric.

ADD REPLY
3
Entering edit mode
12.6 years ago

Here's my somewhat hideous attempt, but in the event that you had too many to want to use grep...

files<-c("AT.BP.200.txt", "AT.BP.50.txt", "AT.BP.500.txt", "SP.BP.200.txt", "SP.BP.50.txt", "SP.BP.500.txt", "SP.BP.80.txt")

filename_parts<-data.frame(t(sapply(strsplit(files,"\\."),function(x){x[c(1,3)]})),stringsAsFactors=F)
filename_parts[,2]<-as.numeric(filename_parts[,2])
colnames(filename_parts)<-c("a","b")

ord.iv<-with(filename_parts,order(a,b))

files.reorder<-files[ord.iv]

for(i in 1:length(files.reorder))
{
    b <- read.table(files.reorder[i])
    ...
}
ADD COMMENT
0
Entering edit mode

In practice, I usually create a file that lists file names and aliases in whatever order I want and then read that in.

ADD REPLY
1
Entering edit mode
12.6 years ago

There may be a better way, but what about using something like that ? :

ind.50 = grep("50.txt",files)
ind.200 = grep("200.txt",files)
ind.500 = grep("500.txt",files)

for(i in 1:length(files)/3) {
  for(j in c(ind.50[i], ind.200[i], ind.500[i])) {
    b = read.table(files[j])
    ...
  }
}
ADD COMMENT
1
Entering edit mode
12.6 years ago
bdemarest ▴ 460

Your file names are not sorted in the order you want, because list.files() returns an alphabetically sorted list.

# Reproducible code showing the problem.

# files = list.files() will return file names in alphabetical order:
files = c("AT.BP.200.txt", "AT.BP.50.txt", "AT.BP.500.txt",
          "SP.BP.200.txt", "SP.BP.50.txt", "SP.BP.500.txt")

par(mfrow=c(3, 3))

for (fname in files) {
    plot(1:10, main=fname)
}

# Proposed solution: Rename files so that alphabetical sorting works.

sorted_files = sort(gsub("\\.(\\d{2})\\.", ".0\\1.", files))
sorted_files
# [1] "AT.BP.050.txt" "AT.BP.200.txt" "AT.BP.500.txt"
# [4] "SP.BP.050.txt" "SP.BP.200.txt" "SP.BP.500.txt"

par(mfrow=c(3, 3));

for (fname in sorted_files) {
    plot(1:10, main=fname)
}
ADD COMMENT
1
Entering edit mode
12.6 years ago

Let me tell you about a very clean tool called mixedsort, part of package gtools which does alphanumeric sorting. So, to comply your data with it, we have to replace periods and then get an order. By this, In just a single command you can achieve what you need.

files<-c("AT.BP.200.txt", "AT.BP.50.txt", "AT.BP.500.txt", "SP.BP.200.txt", "SP.BP.50.txt", "SP.BP.500.txt", "SP.BP.80.txt")
wanted -> files[mixedorder(gsub('[.]','',files))]

So, its done. I removed the dots using gsub, inputted the output to mixedsort tool to get order and accessed the files in that order. Go ahead and plot it now.

Cheers

ADD COMMENT

Login before adding your answer.

Traffic: 1799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6