Question

Using ReadDepth for non-human data

2

Entering edit mode

9.1 years ago

sagi.polani ▴ 100

Hi,

Does anyone know where I can find a manual / walkthrough that I can use for analyzing non-human data with ReadDepth for CNV detection (i.e. step-by-step info as to how to create the input files (mapability, .dat files etc...)? The info on the user manual is a bit lacking in that sense.

Thanks!

Sagi

ReadDepth non-human • 4.5k views

ADD COMMENT • link updated 8.2 years ago by misbahabas ▴ 70 • written 9.1 years ago by sagi.polani ▴ 100

score 1 · Answer 1 · 2016-06-11

1

Entering edit mode

9.1 years ago

Chris Miller 22k

You'll want to start with the "createCustomAnnotations" scripts from here: https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/readDepth/index.html

There's a readme in that folder that describes the basic steps. If you have specific questions after looking through those scripts, happy to answer.

ADD COMMENT • link 9.1 years ago by Chris Miller 22k

0

Entering edit mode

Hi Chris,

thanks for the info.

So I've worked according to the Readme file you've mentioned, and everything started working OK (i.e. the fastq files were generated) but then I got this error:

[bwa_aln] 17bp reads: max_diff = 2 [bwa_aln] 38bp reads: max_diff = 3 [bwa_aln] 64bp reads: max_diff = 4 [bwa_aln] 93bp reads: max_diff = 5 [bwa_aln] 124bp reads: max_diff = 6 [bwa_aln] 157bp reads: max_diff = 7 [bwa_aln] 190bp reads: max_diff = 8 [bwa_aln] 225bp reads: max_diff = 9 [bwa_aln] fail to locate the index [bwa_sai2sam_se] fail to locate the index Can't call method "getline" on an undefined value at mapAndGc.pl line 24. gzip: /root/Desktop/ReadDepth/output//1.gc: No such file or directory gzip: /root/Desktop/ReadDepth/output//1.map: No such file or directory

It seems that the program is missing the .gc and .map files for chromosome 1 or something else with mapAndGc.pl, but I'm not sure. I'd appreciate your help please.

Thanks!

Sagi

ADD REPLY • link 9.1 years ago by sagi.polani ▴ 100

0

Entering edit mode

Sorry, but I can't tell you much based on the information that you've provided. You're going to have to go line by line through that script, making sure the appropriate files are in the right place. Best of luck!

ADD REPLY • link 9.1 years ago by Chris Miller 22k

0

Entering edit mode

Hi Chris,

So I've managed to debug most of the errors, but now I'm left with one: "Can't call method "getline" on an undefined value at mapAndGc.pl line 24."

It seems that I'm missing the .genome file. I went through the README file like you said, but there's no info there as how to create that.genome file (which, according to the perl script should include the "list of chrs and lengths").

I'd appreciate any help on this.

Thanks!

Sagi

ADD REPLY • link 9.1 years ago by sagi.polani ▴ 100

0

Entering edit mode

For human hg19, it would be just the first two columns from this file: https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/readDepth/entrypoints.hg19.male

Make something similar for your organism and you're set.

ADD REPLY • link 9.1 years ago by Chris Miller 22k

0

Entering edit mode

OK, did that, passed the previous error (line 24), now another one addressing line 39: "Can't call method "getline" on an undefined value at mapAndGc.pl line 39."

Any ideas?

Sorry for the hassle....

Thanks

Sagi

ADD REPLY • link 9.1 years ago by sagi.polani ▴ 100

0

Entering edit mode

Looks like that line is trying to read in your sam file, check the paths going in from the bash script.

ADD REPLY • link 9.1 years ago by Chris Miller 22k

0

Entering edit mode

Alright! got passed everything in the mean time - thanks!

I'll continue on and let you know how things go.

Thanks for all the help - I really appreciate it.

Best

Sagi

ADD REPLY • link 9.1 years ago by sagi.polani ▴ 100

0

Entering edit mode

Hi Chris,

OK, so I now want to generate the chr*.bed files for the /read directory. Sadly, I only have the .bam files in hand.

I should state that my genome has 31 chromosomes and a single X chromosome. The chromosomes are defined with numbers without the "chr" prefix, i.e: 1, 2, 3, ..., 31, X.

I went through your post, and used the command that you've recommended: for $chr in $(seq 1 31) X;do samtools view myfile.bam $chr:1-999999999 | cut -f 3,4 >$chr.bed;done

but I kept getting the "bash: '$chr': not a valid identifier" error.

So I went though your instruction here, and used the command: samtools view -F 4 myfile.bam | awk 'OFS="\t"{print $3,$4-1,$4}' >myfile.bed

But then I get only a single file for the entire genome, and not one file per chromosome like it should be. I'm also wondering why in the first command you're using the -f command instead of -F as in the second command?

I'd appreciate your help.

Thanks!

Sagi

ADD REPLY • link 9.1 years ago by sagi.polani ▴ 100

0

Entering edit mode

That first $chr should be chr.

Bamwindow may also be useful https://github.com/genome-vendor/bam-window

ADD REPLY • link 9.1 years ago by Chris Miller 22k

0

Entering edit mode

Hi Chris,

Thanks. Got past that issue.

I prepared all the relevant files accoridng to the data here, installed ReadDepth via Rstudio, and started putting in the commands listed in the above link. Everything went smoothly until I get to an error:

$ library("readDepth")

Loading required package: foreach

foreach: simple, scalable parallel programming from Revolution Analytics

Use Revolution R for scalability, fault tolerance and more.

http://www.revolutionanalytics.com

Loading required package: doMC

Loading required package: iterators

Loading required package: parallel

Loading required package: IRanges

Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, do.call, duplicated, eval, evalq, Filter, Find, get, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist, unsplit

Loading required package: S4Vectors

Loading required package: stats4

Loading required package: DNAcopy

$ rdo = new("rdObject")

1173273408 total reads

genome size: 2367053447

genome mapability percentage: 0.9781022

effectiveGenomeSize: 2315220289

expect 30.17975 % haploid, 44.82025 % diploid, 25 triploid

approx. 50.67653 X coverage of mappable genome

expected mean: 405.4123

adjusted mean: 416.1911

$ rdo = readDepth(rdo)

Binning Start: Tue Jul 5 10:25:10 2016

cut: write error: Broken pipe

I'm wondering if this has to do with different versions of R. I'm using R version 3.1.2 (2014-10-31). On what version was the package designed to work on?

I'd appreciate your help.

Thanks

Sagi

ADD REPLY • link 9.1 years ago by sagi.polani ▴ 100

0

Entering edit mode

If you're doing this on windows, you may encounter problems. For efficiency, the script uses built in unix commands like "cut" in several places. Since the package is old and not being maintained anymore, I'm not going to put in a fix, but would welcome pull requests if you hack into it and add a windows-compliant way to do it.

ADD REPLY • link 9.1 years ago by Chris Miller 22k

0

Entering edit mode

I'm working in linux red-hat.

I'll see if I can perform some tweaks and let you know.

Thanks again!

Sagi

ADD REPLY • link 9.1 years ago by sagi.polani ▴ 100

0

Entering edit mode

In that case, I'm guessing that the files that are supposed to be cut aren't actually in the expected paths.

ADD REPLY • link 9.1 years ago by Chris Miller 22k

0

Entering edit mode

hello

I used "createCustomAnnotations" scripts from here for non human data:

https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/readDepth/index.html

I have bam files and reference genome, so i convert bam file into sam file and run only last command of "runEachchr.sh" !/gsc/bin/bash arguments

chr=chromosome37 readLength=100 fastaDir=$3 outdir=outdir windowSize=100 generate all possible reads of the given size from that chromsome perl allSeq.pl $fastaDir/$chr.fa $readLength > $outdir/$chr.fastq align those reads using bwa bwa aln -t 4 $fastaDir/all_sequences.fa $outdir/$chr.fastq >$outdir/$chr.aln.sai bwa samse -n 1 $fastaDir/all_sequences.fa $outdir/$chr.aln.sai $outdir/$chr.fastq >$outdir/$chr.sam clean up rm -f $outdir/$chr.fastq $outdir/$chr.aln.sai convert the sam file to a mapability and gc-content tracks

perl mapAndGc.pl /home/misbah/InstalledUtilites/Dogsd_bamfiles/outdir/chr37.sam canfam3.1.txt $windowSize $chr $readLength $outdir gzip $outdir/$chr.gc gzip $outdir/$chr.map rm -f $outdir/$chr.sam

but it give error

Use of uninitialized value $len in division (/) at mapAndGc.pl line 83, <gen1> line 2097423.

i am in trouble, please tell me about this issue

thankx

ADD REPLY • link 8.2 years ago by misbahabas ▴ 70

0

Entering edit mode

if i changed in chr list file

like chr1 122678785 2

chr2 85426708 2

chr3 91889043 2

chr4 88276631 2

it give no error but gc and map file have 0 and NA values . it not give proper output please tell me about it .. i am so worried about it :( thankx in advance

ADD REPLY • link 8.2 years ago by misbahabas ▴ 70

0

Entering edit mode

hello

I am using the same tool, to generate annotation files for readDepth . I also have errors like above but i sloved them. Now still one error occure. Please tell me to solve it thanks in advance

Use of uninitialized value $ARGV[5] in concatenation (.) or string at mapAndGc.pl line 78, <gen1> line 8438174.

Use of uninitialized value in concatenation (.) or string at mapAndGc.pl line 78, <gen1> line 8438174. Can't open output file.

gzip: /.gc: No such file or directory

gzip: /.map: No such file or directory

ADD REPLY • link 8.2 years ago by misbahabas ▴ 70

0

Entering edit mode

I can't tell from what you've provided what the issue is (and since the code is so old, I don't remember the details either). You'll probably have to step through the script step by step to figure out what's going on. I'd start by making sure that the outputs created by the previous steps in runEachChr.sh are present and sane.

ADD REPLY • link 8.2 years ago by Chris Miller 22k

score 0 · Answer 2 · 2016-06-10

0

Entering edit mode

9.1 years ago

QVINTVS_FABIVS_MAXIMVS ★ 2.6k

If you have BAM files you can use samtools depth to get the per base pair depth of coverage or count reads with samtools view <region> and then covert the read counts into coverage.

For CNV calling I'm not familiar with non-human species.

Lumpy should be able to call CNVs with your reference FASTA file so it should work for your samples.

Also Manta should work for you but I think it's for diploid organism (or optimized for diploids)

If you're interested in a CNV calling pipeline check out a recent publication from my lab were we characterized SVs in an ASD cohort. We implemented 2 CNV callers and describe how we filtered the raw calls.

ADD COMMENT • link 9.1 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.6k

0

Entering edit mode

I already have bam file of non human genome and reference genome Now i only want to generate mappbility and gcWinds file thats why i only run this command "perl mapAndGc.pl" but it give error

Use of uninitialized value $ARGV[5] in concatenation (.) or string at mapAndGc.pl line 78, <gen1> line 8438174.

Use of uninitialized value in concatenation (.) or string at mapAndGc.pl line 78, <gen1> line 8438174. Can't open output file.

gzip: /.gc: No such file or directory

gzip: /.map: No such file or directory

Please tell me how resolved it thanks

ADD REPLY • link 8.2 years ago by misbahabas ▴ 70

score 0 · Answer 3 · 2017-06-07

hello

I used "createCustomAnnotations" scripts from here for non human data:

https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/readDepth/index.html

I have bam files and reference genome, so i convert bam file into sam file and run only last command of "runEachchr.sh" !/gsc/bin/bash arguments

chr=chromosome37 readLength=100 fastaDir=$3 outdir=outdir windowSize=100 generate all possible reads of the given size from that chromsome perl allSeq.pl $fastaDir/$chr.fa $readLength > $outdir/$chr.fastq align those reads using bwa bwa aln -t 4 $fastaDir/all_sequences.fa $outdir/$chr.fastq >$outdir/$chr.aln.sai bwa samse -n 1 $fastaDir/all_sequences.fa $outdir/$chr.aln.sai $outdir/$chr.fastq >$outdir/$chr.sam clean up rm -f $outdir/$chr.fastq $outdir/$chr.aln.sai convert the sam file to a mapability and gc-content tracks

perl mapAndGc.pl /home/misbah/InstalledUtilites/Dogsd_bamfiles/outdir/chr37.sam canfam3.1.txt $windowSize $chr $readLength $outdir gzip $outdir/$chr.gc gzip $outdir/$chr.map rm -f $outdir/$chr.sam

but it give error

Use of uninitialized value $len in division (/) at mapAndGc.pl line 83, <gen1> line 2097423.

i am in trouble, please tell me about this issue

if i changed in chr list file

like chr1 122678785 2

chr2 85426708 2

chr3 91889043 2

chr4 88276631 2

it give no error but gc and map file have 0 and NA values . it not give proper output please tell me about it .. i am so worried about it :(

thankx in advance