I have two files 1) leafdata file with readcount values 2) metadata file with sample information... both are in tab delimited format. They look like this:
Data file:
genus sample1 sample2 sample3 sample4 ........ sample206
Massilistercora 26 419 16 2974 159
Aminipila 104 59 183 2594 209
Mogibacterium 502 971 591 218 2974
Flintibacter 418 0 981 2397 264
.
.
Metadata file:
samplename group timepoint gender
sample1 case A M
sample2 control B F
sample3 control A F
.
.
.
sample206 case E M
I loaded the data into R as below:
testdata <- read.table("leafdata.txt", sep = "\t", header = TRUE, check.names = FALSE)
Then checked the dimension as below:
dim(testdata)
2874 207
However when I loaded the metadata as below:
leafmetadata <- read.table("metadata.txt", sep = "\t", header = TRUE, check.names = FALSE)
Then dimensions as below:
dim(leafmetadata)
206 4
My question is why do I get number 206 for metadata but 207 for the leafdata even though my sample number is same in both files? This is what causing error for further analysis. Am I reading the file incorrectly in R?
I would really appreciate if some expert could please help me to solve this issue. Many thanks in advance!
There is no error and please go through your data before you post this kind of queries. Hint: "genus"
Thanks for pointing that out. I also tried by removing "genus" but still get the same error.
You need
dos2unix
orunix2dos
, depending on which system you are using, and the system on which the file was created.Can you try following code in R (one or both of them) and print the output here?:
Assuming that following data exists: