Hi everyone!
I have an issue when I'm trying to concatenate several .gmt
files downloaded from MsigDB site. I want to customize a gmt file using the pathways that I'm intersted in from KEGG and REACTOME (eventually from GO BP). However, when I use the next line to store the result in a new gmt or txt file:
cat KEGG_APOPTOSIS.gmt KEGG_ABC_TRANSPORTERS.gmt > my_gmt.gmt
I got the next output:
KEGG_APOPTOSIS > Apoptosis AIFM1 AKT1 AKT2 AKT3 APAF1 ATM BAD BAX BCL2 BCL2L1 BID BIRC2 BIRC3 CAPN1 CAPN2 CASP10 CASP3 CASP6 CASP7 CASP8 CASP9 CFLAR CHP1 CHP2 CHUK CSF2RB CYCS DFFA DFFB ENDOD1 ENDOG EXOG FADD FAS FASLG IKBKB IKBKG IL1A IL1B IL1R1 IL1RAP IL3 IL3RA IRAK1 IRAK2 IRAK3 IRAK4 MAP3K14 MYD88 NFKB1 NFKBIA NGF NTRK1 PIK3CA PIK3CB PIK3CD PIK3CG PIK3R1 PIK3R2 PIK3R3 PIK3R5 PPP3CA PPP3CB PPP3CC PPP3R1 PPP3R2 PRKACA PRKACB PRKACG PRKAR1A PRKAR1B PRKAR2A PRKAR2B PRKX RELA RIPK1 TNF TNFRSF10A TNFRSF10B TNFRSF10C TNFRSF10D TNFRSF1A TNFSF10 TP53 TRADD TRAF2 XIAPKEGG_ABC_TRANSPORTERS > ABC transporters ABCA1 ABCA10 ABCA12 ABCA13 ABCA2 ABCA3 ABCA4 ABCA5 ABCA6 ABCA7 ABCA8 ABCA9 ABCB1 ABCB10 ABCB11 ABCB4 ABCB5 ABCB6 ABCB7 ABCB8 ABCB9 ABCC1 ABCC10 ABCC11 ABCC12 ABCC2 ABCC3 ABCC4 ABCC5 ABCC6 ABCC8 ABCC9 ABCD1 ABCD2 ABCD3 ABCD4 ABCG1 ABCG2 ABCG4 ABCG5 ABCG8 CFTR TAP1 TAP2
And I need, in the new .gmt
file, each pathway in each row like this:
KEGG_APOPTOSIS > Apoptosis AIFM1 AKT1 AKT2 AKT3 APAF1 ATM BAD BAX BCL2 BCL2L1 BID BIRC2 BIRC3 CAPN1 CAPN2 CASP10 CASP3 CASP6 CASP7 CASP8 CASP9 CFLAR CHP1 CHP2 CHUK CSF2RB CYCS DFFA DFFB ENDOD1 ENDOG EXOG FADD FAS FASLG IKBKB IKBKG IL1A IL1B IL1R1 IL1RAP IL3 IL3RA IRAK1 IRAK2 IRAK3 IRAK4 MAP3K14 MYD88 NFKB1 NFKBIA NGF NTRK1 PIK3CA PIK3CB PIK3CD PIK3CG PIK3R1 PIK3R2 PIK3R3 PIK3R5 PPP3CA PPP3CB PPP3CC PPP3R1 PPP3R2 PRKACA PRKACB PRKACG PRKAR1A PRKAR1B PRKAR2A PRKAR2B PRKX RELA RIPK1 TNF TNFRSF10A TNFRSF10B TNFRSF10C TNFRSF10D TNFRSF1A TNFSF10 TP53 TRADD TRAF2 XIAP
KEGG_ABC_TRANSPORTERS > ABC transporters ABCA1 ABCA10 ABCA12 ABCA13 ABCA2 ABCA3 ABCA4 ABCA5 ABCA6 ABCA7 ABCA8 ABCA9 ABCB1 ABCB10 ABCB11 ABCB4 ABCB5 ABCB6 ABCB7 ABCB8 ABCB9 ABCC1 ABCC10 ABCC11 ABCC12 ABCC2 ABCC3 ABCC4 ABCC5 ABCC6 ABCC8 ABCC9 ABCD1 ABCD2 ABCD3 ABCD4 ABCG1 ABCG2 ABCG4 ABCG5 ABCG8 CFTR TAP1 TAP2
Thanks in advance!
Hi Kevin!
Thanks for your answer. First of all I retrieved these files using Mac. Second, after visualizing the files with
vi
, I don't see if there is an end-line or carriage return at the end of each row in these files. Your suggested code worked well when concatenating just few files, I'm trying to figure out how to do it with the rest of them.Thanks Kevin
I am unsure why all of your files are separated in this way, and also unsure about the end-line issue. In any case, if you have many gmt files in the current working directory, then concatenating all of them could be done with:
..same code but better structured:
Once again Kevin ¡muchas gracias! thanks a lot! Your code worked very well with all the files. Last night, I was working in using for loops to concatenate only KEGG gmt files. However, my approach did not work with REACTOME and GOBP files... Perhaps is an issue related with the source file.
Rodo
¡No hay de que! / No problem, Rodo. Nos vemos.