hello
I used "createCustomAnnotations" scripts from here for non human data:
https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/readDepth/index.html
I have bam files and reference genome, so i convert bam file into sam file and run only last command of "runEachchr.sh" !/gsc/bin/bash arguments
chr=chromosome37 readLength=100 fastaDir=$3 outdir=outdir windowSize=100 generate all possible reads of the given size from that chromsome perl allSeq.pl $fastaDir/$chr.fa $readLength > $outdir/$chr.fastq align those reads using bwa bwa aln -t 4 $fastaDir/all_sequences.fa $outdir/$chr.fastq >$outdir/$chr.aln.sai bwa samse -n 1 $fastaDir/all_sequences.fa $outdir/$chr.aln.sai $outdir/$chr.fastq >$outdir/$chr.sam clean up rm -f $outdir/$chr.fastq $outdir/$chr.aln.sai convert the sam file to a mapability and gc-content tracks
perl mapAndGc.pl /home/misbah/InstalledUtilites/Dogsd_bamfiles/outdir/chr37.sam canfam3.1.txt $windowSize $chr $readLength $outdir gzip $outdir/$chr.gc gzip $outdir/$chr.map rm -f $outdir/$chr.sam
but it give error
Use of uninitialized value $len in division (/) at mapAndGc.pl line 83, <gen1> line 2097423.
i am in trouble, please tell me about this issue
if i changed in chr list file
like chr1 122678785 2
chr2 85426708 2
chr3 91889043 2
chr4 88276631 2
it give no error but gc and map file have 0 and NA values . it not give proper output please tell me about it .. i am so worried about it :(
thankx in advance
Hi Chris,
thanks for the info.
So I've worked according to the Readme file you've mentioned, and everything started working OK (i.e. the fastq files were generated) but then I got this error:
[bwa_aln] 17bp reads: max_diff = 2 [bwa_aln] 38bp reads: max_diff = 3 [bwa_aln] 64bp reads: max_diff = 4 [bwa_aln] 93bp reads: max_diff = 5 [bwa_aln] 124bp reads: max_diff = 6 [bwa_aln] 157bp reads: max_diff = 7 [bwa_aln] 190bp reads: max_diff = 8 [bwa_aln] 225bp reads: max_diff = 9 [bwa_aln] fail to locate the index [bwa_sai2sam_se] fail to locate the index Can't call method "getline" on an undefined value at mapAndGc.pl line 24. gzip: /root/Desktop/ReadDepth/output//1.gc: No such file or directory gzip: /root/Desktop/ReadDepth/output//1.map: No such file or directory
It seems that the program is missing the .gc and .map files for chromosome 1 or something else with mapAndGc.pl, but I'm not sure. I'd appreciate your help please.
Thanks!
Sagi
Sorry, but I can't tell you much based on the information that you've provided. You're going to have to go line by line through that script, making sure the appropriate files are in the right place. Best of luck!
Hi Chris,
So I've managed to debug most of the errors, but now I'm left with one: "Can't call method "getline" on an undefined value at mapAndGc.pl line 24."
It seems that I'm missing the .genome file. I went through the README file like you said, but there's no info there as how to create that.genome file (which, according to the perl script should include the "list of chrs and lengths").
I'd appreciate any help on this.
Thanks!
Sagi
For human hg19, it would be just the first two columns from this file: https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/readDepth/entrypoints.hg19.male
Make something similar for your organism and you're set.
OK, did that, passed the previous error (line 24), now another one addressing line 39: "Can't call method "getline" on an undefined value at mapAndGc.pl line 39."
Any ideas?
Sorry for the hassle....
Thanks
Sagi
Looks like that line is trying to read in your sam file, check the paths going in from the bash script.
Alright! got passed everything in the mean time - thanks!
I'll continue on and let you know how things go.
Thanks for all the help - I really appreciate it.
Best
Sagi
Hi Chris,
OK, so I now want to generate the chr*.bed files for the /read directory. Sadly, I only have the .bam files in hand.
I should state that my genome has 31 chromosomes and a single X chromosome. The chromosomes are defined with numbers without the "chr" prefix, i.e: 1, 2, 3, ..., 31, X.
I went through your post, and used the command that you've recommended: for $chr in $(seq 1 31) X;do samtools view myfile.bam $chr:1-999999999 | cut -f 3,4 >$chr.bed;done
but I kept getting the "bash: '$chr': not a valid identifier" error.
So I went though your instruction here, and used the command: samtools view -F 4 myfile.bam | awk 'OFS="\t"{print $3,$4-1,$4}' >myfile.bed
But then I get only a single file for the entire genome, and not one file per chromosome like it should be. I'm also wondering why in the first command you're using the -f command instead of -F as in the second command?
I'd appreciate your help.
Thanks!
Sagi
That first $chr should be chr.
Bamwindow may also be useful https://github.com/genome-vendor/bam-window
Hi Chris,
Thanks. Got past that issue.
I prepared all the relevant files accoridng to the data here, installed ReadDepth via Rstudio, and started putting in the commands listed in the above link. Everything went smoothly until I get to an error:
$ library("readDepth")
Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: doMC
Loading required package: iterators
Loading required package: parallel
Loading required package: IRanges
Loading required package: BiocGenerics
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB
The following object is masked from ‘package:stats’:
xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, do.call, duplicated, eval, evalq, Filter, Find, get, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: DNAcopy
$ rdo = new("rdObject")
1173273408 total reads
genome size: 2367053447
genome mapability percentage: 0.9781022
effectiveGenomeSize: 2315220289
expect 30.17975 % haploid, 44.82025 % diploid, 25 triploid
approx. 50.67653 X coverage of mappable genome
expected mean: 405.4123
adjusted mean: 416.1911
$ rdo = readDepth(rdo)
Binning Start: Tue Jul 5 10:25:10 2016
cut: write error: Broken pipe
cut: write error: Broken pipe
cut: write error: Broken pipe
cut: write error: Broken pipe
cut: write error: Broken pipe
cut: write error: Broken pipe
I'm wondering if this has to do with different versions of R. I'm using R version 3.1.2 (2014-10-31). On what version was the package designed to work on?
I'd appreciate your help.
Thanks
Sagi
If you're doing this on windows, you may encounter problems. For efficiency, the script uses built in unix commands like "cut" in several places. Since the package is old and not being maintained anymore, I'm not going to put in a fix, but would welcome pull requests if you hack into it and add a windows-compliant way to do it.
I'm working in linux red-hat.
I'll see if I can perform some tweaks and let you know.
Thanks again!
Sagi
In that case, I'm guessing that the files that are supposed to be cut aren't actually in the expected paths.
hello
I used "createCustomAnnotations" scripts from here for non human data:
https://xfer.genome.wustl.edu/gxfer1/project/cancer-genomics/readDepth/index.html
I have bam files and reference genome, so i convert bam file into sam file and run only last command of "runEachchr.sh" !/gsc/bin/bash arguments
chr=chromosome37 readLength=100 fastaDir=$3 outdir=outdir windowSize=100 generate all possible reads of the given size from that chromsome perl allSeq.pl $fastaDir/$chr.fa $readLength > $outdir/$chr.fastq align those reads using bwa bwa aln -t 4 $fastaDir/all_sequences.fa $outdir/$chr.fastq >$outdir/$chr.aln.sai bwa samse -n 1 $fastaDir/all_sequences.fa $outdir/$chr.aln.sai $outdir/$chr.fastq >$outdir/$chr.sam clean up rm -f $outdir/$chr.fastq $outdir/$chr.aln.sai convert the sam file to a mapability and gc-content tracks
perl mapAndGc.pl /home/misbah/InstalledUtilites/Dogsd_bamfiles/outdir/chr37.sam canfam3.1.txt $windowSize $chr $readLength $outdir gzip $outdir/$chr.gc gzip $outdir/$chr.map rm -f $outdir/$chr.sam
but it give error
Use of uninitialized value $len in division (/) at mapAndGc.pl line 83, <gen1> line 2097423.
i am in trouble, please tell me about this issue
thankx
if i changed in chr list file
like chr1 122678785 2
chr2 85426708 2
chr3 91889043 2
chr4 88276631 2
it give no error but gc and map file have 0 and NA values . it not give proper output please tell me about it .. i am so worried about it :( thankx in advance
hello
I am using the same tool, to generate annotation files for readDepth . I also have errors like above but i sloved them. Now still one error occure. Please tell me to solve it thanks in advance
Use of uninitialized value $ARGV[5] in concatenation (.) or string at mapAndGc.pl line 78, <gen1> line 8438174.
Use of uninitialized value in concatenation (.) or string at mapAndGc.pl line 78, <gen1> line 8438174. Can't open output file.
gzip: /.gc: No such file or directory
gzip: /.map: No such file or directory
I can't tell from what you've provided what the issue is (and since the code is so old, I don't remember the details either). You'll probably have to step through the script step by step to figure out what's going on. I'd start by making sure that the outputs created by the previous steps in runEachChr.sh are present and sane.