Delet Some Raw In Map File And Corosponed Coloum In Ped File In Linux
4
1
Entering edit mode
12.1 years ago
mary ▴ 210

hello all

How can i delete some raw in Map file and corresponded column in Ped file in Linux? the column separated by space is there any one that help me?

map ped • 3.9k views
ADD COMMENT
0
Entering edit mode

easy with R or linux, depending on, are these different entries or entire column. Please tell us how the data looks like.

ADD REPLY
0
Entering edit mode

the ped format are as below, (without family ID, Parents Id, sex and pheno) plate07 1 2 3 1 4 3 2 1 plate08 2 3 1 4 2 3 1 4 plate09 3 2 4 1 2 3 1 3 .... as you know each column and annotation in map file

ADD REPLY
0
Entering edit mode

How does the map file looks like. So, if the rownumbers are same in the both the dataset and row 8 in Ped file corresponds to row 8 in Map file, then you can just remove the numbers using sed or if the rows are uneven and jumbled, then you would need an common identifier in the both dataset, subset it and delete it like using in R.

ADD REPLY
0
Entering edit mode
12.1 years ago
kumar.vinod81 ▴ 340

But, if you want to delete then why you have included them from Genomestudio or any software that you have used for SNP data processing. You can also open your file in excel and delete corresponding row and column from map and ped files. If your data are in 'tab' format then use text to column option in excel and you will get your data columnwise. I think it will help you. Thanks

ADD COMMENT
0
Entering edit mode
12.1 years ago
mary ▴ 210

thanks a lot Kummar the story is long, actually i have genotyped data for bovine 50k beadchip on Btau4.0 chr by chr. I get this data from my Prof. and he lost the rawdata. now, I want to make chr. genotype based on UMD3.1, so I seperated anontation based on UMD3.1 assembly for each chr. and then in map file with some script i found which anontion is not related to for example chr12 in UMD3.1 (i means it belong to Btau4.0) , now i want to delete this column number in ped file , I tried to open in excell 2012 but the the colum is long and i recived an error now i dont kon what can I do? i will be appriciate if some one give some suggetion to me

ADD COMMENT
0
Entering edit mode
12.1 years ago

If I understood correctly, you are selecting the wanted SNPs you want from the map and you want to create a new ped with only them. Or the inverse: you want to remove from .ped the selected SNPs based in .map.

The safest way is to do it with a program that understand ped files and have slicing functions. plink is the industry standard one, used in many of the big GWAS projects. Look in the section for data management for extract and remove

ADD COMMENT
0
Entering edit mode
12.1 years ago
mary ▴ 210

Thank you Pablo I did it , I make a snp list file and then use this command in plink as below ./plink --file my-file --exclude my-snp-list file
it wrote for me on report ----------------------------------------------------------@ | PLINK! | v1.07 | 10/Aug/2009 | |----------------------------------------------------------| | (C) 2009 Shaun Purcell, GNU General Public License, v2 | |----------------------------------------------------------| | For documentation, citation & bug-report instructions: | | http://pngu.mgh.harvard.edu/purcell/plink/ | @----------------------------------------------------------@

Web-based version check ( --noweb to skip ) Recent cached web-check found... OK, v1.07 is current

Writing this text to log file [ plink.log ] Analysis started: Tue Nov 13 03:55:52 2012

Options in effect: --file mari --exclude mysnplist

1722 (of 1722) markers to be included from [ mari.map ] 410 individuals read from [ mari.ped ] 410 individuals with nonmissing phenotypes Assuming a disease phenotype (1=unaff, 2=aff, 0=miss) Missing phenotype value is also -9 0 cases, 410 controls and 0 missing 410 males, 0 females, and 0 of unspecified sex Reading list of SNPs to exclude [ mysnplist ] ... 404 read Before frequency and genotyping pruning, there are 1318 SNPs 410 founders and 0 non-founders found Total genotyping rate in remaining individuals is 0.986356 0 SNPs failed missingness test ( GENO > 1 ) 0 SNPs failed frequency test ( MAF < 0 ) After frequency and genotyping pruning, there are 1318 SNPs After filtering, 0 cases, 410 controls and 0 missing After filtering, 410 males, 0 females, and 0 of unspecified sex

Analysis finished: Tue Nov 13 03:55:53 2012

but know when i control the number of raw in map in new file is 1722, but I think it should be 1318 ??!?!?

I don't know its deleted 404 SNP or not? it didn't give me output file . I think it maybe put some (-) on some thing like that on map file , that it not readable in future analysis whats your idea?

ADD COMMENT
0
Entering edit mode

Your case is one of the typical mistakes newcomers do in PLINK(don't worry it happened to all of us the first time). PLINK NEVER modify your files nor creates a new one when using any filter or transformation UNLESS you tell it to do that. You need to pass the option --output myfiltereddata, and probably you would like to add also --make-bed or --transpose for create a bed or tped file instead the default ped one. The rationale about that is that you can apply a filter and then an analysis all in the same command line and you don't need to create a new file with Gb of data if you don't need it. Think for large GWAS studies of 10000 samples and 1 million SNPs and you want to try a filter only once.

ADD REPLY
0
Entering edit mode

PS: remember to +1 the answers or comments you like, and don't forget to 'accept' an answer if it fulfill your needs.

ADD REPLY

Login before adding your answer.

Traffic: 1749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6