Penncnv: Finding Overlapping Genes
1
0
Entering edit mode
12.6 years ago
romsen ▴ 70

Hello,

I'm trying to find overlapping genes for my CNV calls. I downloaded the gene annotations (hg18 (Mar2006, NCBI build 36)) from UCSC:

  • [knownGene.txt.gz]

  • [kgXref.txt.gz]

and the same for refGene annotation explained on PennCNV website.

But when I run the 'scan_region.pl' command an error occurs:

    C:\penncnv>scan_region.pl sample.rawcnv hg18_refGene.txt -refgene -reflink hg18_refLink.txt > sample.cnv.rg18
    Error: invalid record in template-location-file hg18_refGene.txt (expecting 16 or 10 tab-delimited fields in refGene file): <1410,2804,5917067  N525,1506,525,15824069132,140691R_02,,  873     7974,   215506,5254,2,1,,       218281,,
    238422,,        23-1,6,525,05784525,1392,       6913282406918345,,      87372251586LIS995,      37974586,1,-    85544155       8,   0 CEP68,2,30,15,2061314048,88390,33,0,21066480,21066480,21066480488390,33,,,291384439717  -8,,2106335,883909781,,2913XR1   9717    4695,210664805392OC1924750493576081593549121593> 
at C:\penncnv\scan_region.pl line 540 main::scanUCSCGene('sample.rawcnv', 'hg18_refGene.txt', 0, 'refgene', undef, undef) called at C:\penncnv\scan_region.pl line 108

Something seems to be broken in the annotation file. How can I avoid or fix this? I'm a biologist, not a computer scientist, so please be kind.;)

Thank you

genes cnv • 2.5k views
ADD COMMENT
0
Entering edit mode

Can you show how hg18_refGene.txt looks?

ADD REPLY
0
Entering edit mode

It's a tab-delimited txt file. When I open it in excel there are 16. columns. But from line 900 the format seems to be destroyed. Therefore I think I found the problem suspecting the extraction of the .gz archive!?

Update: Yes, extraction problems with powerarchiver. Using winrar let it works!

ADD REPLY
1
Entering edit mode

You should put that in the answer and then accept it, in case someone else has the same problem.

ADD REPLY
1
Entering edit mode
12.6 years ago
romsen ▴ 70

It's a tab-delimited txt file. When I open it in excel there are 16. columns. But from line 900 the format seems to be destroyed. Therefore I think I found the problem suspecting the extraction of the .gz archive!?

Update: Yes, extraction problems with powerarchiver. Using winrar let it works!

ADD COMMENT

Login before adding your answer.

Traffic: 2295 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6