Entering edit mode
4.6 years ago
shrinka.genetics
▴
40
I have a file, like below containing 50000000 rows. I want to split it chromosome wise, like output file 1 will contain Chr 1 and output file 2 will contain Chr 2 and so on.
V1 V2 V3 V4 V5 V6
1 chr1 10469 + 3 3 TCGC
2 chr1 10470 - 25 30 GCGA
3 chr1 10471 + 1 5 GCGG
4 chr1 10472 - 13 39 CCGC
5 chr1 10484 + 0 6 CCGG
I am using UBUNTU platform and csplit command. I could not figured it out. Could you please help me what will be the syntax?
Thanks Shrinka
It can be as simple as
grep chr1 yourfile > chr1file
,grep chr2 yourfile > chr2file
etc. Add the header at top if you need it.Thanks for your reply. I have used that. It is producing 0 kb output file, may be it is memory related issue. My RAM size is not good to tackle, as my file size is big. So I thought csplit command can be useful in the memory constrained situation
Thanks Shrinka
If your example file above it correct then the above command should work. Did you copy the file over to unix from a windows machine by any chance?
Thank you for your response.
By using this I loaded UBUNTU and I am using that. I have Windows 10 in my laptop https://crashcourse.housegordon.org/split-fasta-files.html
I used precisely this command
grep "chr1" B19818.CEMT_178.Bisulfite-Seq.hg38.B19818_2_lanes_dupsFlagged.q5.5mC.CpG
It is generating 0 KB files
If needed I can send one file to you
Regards
Shrinka
grep don't use any memory, please provide what exact command are you typing and which OS
Thank you for your response.
By using this I loaded UBUNTU and I am using that. I have Windows 10 in my laptop https://crashcourse.housegordon.org/split-fasta-files.html
I used precisely this command
grep "chr1" B19818.CEMT_178.Bisulfite-Seq.hg38.B19818_2_lanes_dupsFlagged.q5.5mC.CpG
It is generating 0 KB files
If needed I can send one file to you
Regards
Shrinka
you need to use output redirection:
grep -w chr1 B19818.CEMT_178.Bisulfite-Seq.hg38.B19818_2_lanes_dupsFlagged.q5.5mC.CpG > B19818.CEMT_178.Bisulfite-Seq.hg38.B19818_2_lanes_dupsFlagged.q5.5mC.CpG.chr1
the new file is generated as B19818.CEMT_178.Bisulfite-Seq.hg38.B19818_2_lanes_dupsFlagged.q5.5mC.CpG.chr1
Nope the same problem remain
Put an example file up (does not need to be complete file) at pastebin.com.
B19818.CEMT_178.Bisulfite-Seq.hg38.B19818_2_lan... https://drive.google.com/file/d/1gNpHAqmeFRCB0lFAvd4KiJSDE--rU7A7/view?usp=drive_web