I have 210 genomic DNA sequences of different strains of amoeba and I am trying to do DESeq analysis on them (I am looking for lowest logfold changes). I have mapped them, counted the number of reads per gene and loaded the dataset into R. Unfortunately, I can't make the data frame that encompasses the sequence ID, the level (control or other) and the type of sequencing (paired or single end). Since after the command R gives a + I was pretty sure it is a punctuation problem but I have looked over and over and I can't find anything wrong. I used the command as below:
colData <- data.frame("strain"=c("10NC87.1","11NC96.1","12NC99.1","13NC34.2","14NC39.1","15NC52.3","16NC54.2","17NC58.1","18NC60.1","19NC60.2","1NC105.1","20NC63.2","2NC28.1","3NC67.2","3P51S75","4NC69.1","5NC71.1","6NC73.1","7NC76.1","8NC80.1","9NC85.2","A01.311S1merged.bam","A02.486S8merged.bam","A03.488S16merged.bam","A04.571S24merged.bam","A05.582S32merged.bam","A06.593S40merged.bam","A07.670S1merged.bam","A08.700S8merged.bam","A09.728S15merged.bam","A10.734S23merged.bam","A11.363S31merged.bam","A12.667S38merged.bam","AC9S2","B01.579S2merged.bam","B02.532S9merged.bam","B03.655S17merged.bam","B04.672S25merged.bam","B05.505S33merged.bam","B06.786S41merged.bam","B07.487S2merged.bam","B08.530S9merged.bam","B09.544S16merged.bam","B10.576S24merged.bam","B11.577S32merged.bam","B12.578S39merged.bam","B1AS67","B25CS96","B34AS78","B41AS84","BM5AS25","BS3","C01.580S3merged.bam","C02.600S10merged.bam","C03.763S18merged.bam","C04.732S26merged.bam","C05.398S34merged.bam","C06.118S42merged.bam","C09.586S17merged.bam","C10.531S25merged.bam","C11.ws2162S33merged.bam","C12.815S40merged.bam","CF2ddS82","CH14AS81","CT6AS54","CT9AS51","D01.608S4merged.bam","D02.777S11merged.bam","D03.401S19merged.bam","D04.735S27merged.bam","D05.606S35merged.bam","D06.738S43merged.bam","D07.616S3merged.bam","D08.561S10merged.bam","D09.602S18merged.bam","D10.758S26merged.bam","D11.180S34merged.bam","D12.18S41merged.bam","DCB5AS23","DD10C2S22","DD20B2bS49","DD20BS59","DD44S14","DD7S7","E01.642S5merged.bam","E02.744S12merged.bam","E03.448S47merged.bam","E04.375S46merged.bam","E05.805S36merged.bam","E06.ws655S44merged.bam","E07.317S4merged.bam","E08.ws380S1L001","E09.782S19merged.bam","E10.c5aS27merged.bam","E11.413S35merged.bam","E12.417S42merged.bam","E2C2S74","EI10AS57","F01.524S6merged.bam","F02.433S13merged.bam","F03.749S21merged.bam","F04.648S29merged.bam","F05.336S37merged.bam","F06.572S45merged.bam","F07.756S5merged.bam","F08.587S12merged.bam","F09.PJ11S20merged.bam","F10.583S28merged.bam","F11.483S36merged.bam","F12.438S43merged.bam","FC4CS19","G01.427S7merged.bam","G02.750S14merged.bam","G03.442S22merged.bam","G04.419S30merged.bam","G05.307S38merged.bam","G06.949S46merged.bam","G07.826S6merged.bam","G08.434S13merged.bam","G09.366S21merged.bam","G10.421S29merged.bam","H02.537S15merged.bam","H03.568S23merged.bam","H04.824S31merged.bam","H05.181S39merged.bam","H05.181S45merged.bam","H06.mfdS47merged.bam","H07.ws582S7merged.bam","H08.v12S14merged.bam","H09.21S22merged.bam","H10.9S30merged.bam","H11.1071S2L001","H11.1071S37merged.bam","H11A3S94","H12.304S44merged.bam","H15A1S66","H15B1S85","H20B2S80","H4A1S90","HD45B1S46","HD48D1S83","HD54C1S30","LB10CS70","LL20DS92","M1AS4","M4BS1","MA2A1S27","MA2F1S12","MA4B1S50","NC1011S73","NC21B1S87","NC26C1S34","NC26L1S79","NC26V1S55","NC282S89","NC412S68","NC431S58","NC672S21","NC741S8","NC752S10","NC942S42","OH594S18","OHIOS15","OZK11AS48","PL11AS88","S118S24","S220S53","S25S39","S2AS9","S53S40","SM12AS13","TN34A1S35","TN39C2S6","TN40J3S31","TN45T3AS33","TN50J1S95","TN52E1S65","TN52F1S5","TN52G1S64","TNSC14S28","V301B2S60","V319B3S29","V323C1BS47","V324B3S16","V328A1S32","V329B1S43","V330B1S62","V330D1S72","V330D2S91","V331C1S61","V331D1S71","V331D2S38","V336B1S36","V341A2S26","V341C1S17","V342B2S76","V343D2S44","V348C1S63","V4F4S37","V54C2S52","V55A3S41","V56D1S86","V64D2S56","V72B3S20","WS1956S69","WS2162S93","WS472S11","WS7S45","ZA3AS77","Ax4"),"strain"=c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100","101","102","103","104","105","106","107","108","109","110","111","112","113","114","115","116","117","118","119","120","121","122","123","124","125","126","127","128","129","130","131","132","133","134","135","136","137","138","139","140","141","142","143","144","145","146","147","148","149","150","151","152","153","154","155","156","157","158","159","160","161","162","163","164","165","166","167","168","169","170","171","172","173","174","175","176","177","178","179","180","181","182","183","184","185","186","187","188","189","190","191","192","193","194","195","196","197","198","199","200","201","202","203","204","205","206","207","208","209","control"),"type"="paired.end")
I have tried to see where R thinks the problem is and up until the 171 strain it gives no error. Further down, even if I add different strain names it still thinks the command wrong. I have read online that R has a limit of rows and column that you can make but I am far from that threshold. Other said something about the RAM but I have a 7.7 GiB of RAM. It is a really dumb question, but can somebody please explain what is going on and why I always get that +? If I can't make it work, will it affect the log fold changes results if I split the data into 2 sets of 105 sequences?
Thanks!
You want to run deseq on genomic dna? Are you sure that's meaningful?
Good catch. But hopefully that is an error since the next sentence says this:
As strain and type are the names of the arguments, it should bestrain=
andtype=
, not"strain"=
and"type"=
.edit: in addition, you have two
strain=
arguments, you should rename one of them.edit2: indeed, the command works fine either with or without double quotes.
I agree. But even with "" the command works fine when the sample is lower than 171. I still have the same problem with these changes:
I can run it on my laptop (same specs as yours):
Thank you for your replies! I managed to run it when I deleted the sorted.bam and merged.bam endings from the strain names.