Question

A problem with read ped file in plink?

0

Entering edit mode

6.9 years ago

mary ▴ 210

Hi all

I have two txt file (file1: genotype.txt and file2: 6first coloum of ped), I used them to make ped with paste file1.txt file2.txt which use it for make a ped. the problem is that when I run plink with :

Options in effect:

--ped scottsheep.ped
--map scottisheep.map
--noweb

it give me this error:

52857 (of 52857) markers to be included from [ scottisheep.map ]

ERROR: 
A problem with line 1 in [ scottsheep.ped ]
Expecting 6 + 2 * 52857 = 105720 columns, but found 52864

when I chek the ped file I found that I have successfully added all of the required information columns and just need to split all of my SNP columns which are currently in the following format ("AA") into two separate columns per SNP ("A" "A"). I search it and I know it could be solve in R, But I am new in use R. dose any command for txt file in bash which can split a colom to two coloum .

I am tired to search for this and not found any slouction.

dose any one has any suggestion for me?

plink • 4.8k views

ADD COMMENT • link updated 6.9 years ago by zx8754 12k • written 6.9 years ago by mary ▴ 210

0

Entering edit mode

Can you confirm exactly what you want to get?

Your data is currently in this format:

AA GG TG CC ...

You need to get it to this format for input to PLINK:

A A G G T G C C

From where did you obtain your data?

Grazie mille.

ADD REPLY • link 6.9 years ago by Kevin Blighe 89k

0

Entering edit mode

Hi Kevin I downloaded the data from https://datadryad.org, and yes I need to do what you wrote. I can do it on R but I want to know can I do it on bash

ADD REPLY • link 6.9 years ago by mary ▴ 210

0

Entering edit mode

Are you using Mac or linux?

This works on linux:

cat test.txt
AA GG TG CC
AA GG TG CC

sed 's/ \+//g' test.txt | awk '{for (i=1; i<=NF; i+=1) {printf$(i)" "; if (i==NF) printf "\n"}}' FS=''
A A G G T G C C 
A A G G T G C C

I cannot see your exact input, though.

ADD REPLY • link 6.9 years ago by Kevin Blighe 89k

0

Entering edit mode

Hi , I use above command but I get this error

awk: program limit exceeded: maximum number of fields size=32767 FILENAME="-" FNR=1 NR=1

I try to install and use gawk , but I use Ubuntu 12.04 I think the package I am looking for doesn't existand. so I am looking Python script to do that dose any body have sloution?

ADD REPLY • link 6.9 years ago by mary ▴ 210

0

Entering edit mode

thanks a lot every body, all of them worked

ADD REPLY • link 6.9 years ago by mary ▴ 210

0

Entering edit mode

Which solution worked?

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

I have moved comments which provide a (potential) solution to your issue so you can mark them as accepted if they solve your issue.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY • link 6.9 years ago by WouterDeCoster 48k

score 2 · Answer 1 · 2018-07-18

2

Entering edit mode

6.9 years ago

WouterDeCoster 48k

A python solution:

python -c "for line in open('test.txt'): print(' '.join(list(line.strip().replace(' ', ''))))"

Adapt test.txt to suit your input file.

This all requires that there are only normal SNPs in there, no funky alleles etc.

ADD COMMENT • link 6.9 years ago by WouterDeCoster 48k

score 1 · Answer 2 · 2018-07-18

1

Entering edit mode

6.9 years ago

Kevin Blighe 89k

I think that Python may have the same issue - not sure.

Could you take a look here to see about installing gawk on Ubuntu 12.04? - https://askubuntu.com/questions/244268/installing-gawk-4-0-on-ubuntu-12-04

Edit: Wouter has helpfully added a Python solution for you to test. Here is a sed only solution, too:

sed 's/ \+//g' test.txt | sed 's/\(.\{1\}\)/\1 /g'
A A G G T G C C 
A A G G T G C C

ADD COMMENT • link 6.9 years ago by Kevin Blighe 89k

0

Entering edit mode

thanks alot, it work but I have plate number at the first of each raw and I wont its seperate. I means I have

R921B02 GG TT AA GG ...

R921E06 TT AA GG CC...

I want

R921B02 G G T T A A G G ...

R921E06 T T A A G G C C...

ADD REPLY • link 6.9 years ago by mary ▴ 210

1

Entering edit mode

You would have saved us (and you) some time if you provided a small example of your data from the start. Try this:

perl -ne '($id, $tmp) = split( / /, $_, 2 ); $tmp =~ s/ //g; print "$id "; print join(" ", split( //, $tmp ) );' test.txt > out.txt

ADD REPLY • link 6.9 years ago by h.mon 35k

score 1 · Answer 3 · 2018-07-18

1

Entering edit mode

6.9 years ago

h.mon 35k

A Perl solution:

perl -ne 's/ //g; print join(" ", split( // ) );' test.txt > out.txt

ADD COMMENT • link 6.9 years ago by h.mon 35k

0

Entering edit mode

Good work!

ADD REPLY • link 6.9 years ago by Kevin Blighe 89k

score 1 · Answer 4 · 2018-07-18

1

Entering edit mode

6.9 years ago

Pierre Lindenbaum 166k

$ echo "AA GG TG CC" | sed 's/\([^ ]\)\([^ ]\)/\1 \2/g'
A A G G T G C C

?

ADD COMMENT • link 6.9 years ago by Pierre Lindenbaum 166k

score 1 · Answer 5 · 2018-07-18

1

Entering edit mode

6.9 years ago

cpad0112 21k

echo "AA GG TG CC" | sed 's/\s//g;s/./& /g'

or

 echo "AA GG TG CC" | sed 's/./& /g' | tr -s " "

A A G G T G C C

ADD COMMENT • link 6.9 years ago by cpad0112 21k