Hello all! I am fairly new to bioinformatics and I have been struggling to treat some NGS files I have just received. Here is the thing: I have 35 samples sequenced using the RAD-seq methodology and I have been trying to replicate the pipeline that the company which sequenced my samples used to treat them (we asked for a bioinformatics support). I do this because i) I want to learn how to use the softwares properly and ii) not all samples in the project belong to my thesis and so I want to process my files separately (the bioinformatics support treated the total of 95 samples in the same pipeline).
Briefly, their pipeline uses Velvet to assemble a genome (in my case, a group of contigs generated using the reads of the sample with the highest number of reads) and then align the reads of all individuals against the assembled genome (reference) using Bowtie. I succeeded in assembling the genome and aligning the samples against the reference, but then problems arise when running SAMtools (e.g. producing indexed .bam files). The idea is to process the samples correctly so I can generate .bcf files to call the SNP variants obtained for each individual.
Here are the commands I have used up to where it goes wrong:
Velvet 1.2.10:
velveth SampleOne_ref 19 -fastq.gz -short .SampleOne.fastq.gz (hash was defined at 19 because of lack of memory when using lower hash values)
velvetg SampleOne_ref -cov_cutoff 5 -mas_coverage 500 -min_contig_lgth 31 -exp_cov auto
Bowtie 2-2.3.4.1:
bowtie2-build SampleOne_ref.fa SampleOne_ref (building reference using SampleOne)
bowtie2-align-s --no-unal --sensitive-local -x SampleOne_ref.* -U SampleTwo.fastq -S SampleOne_ref_QuerySampleTwo.sam
SAMtools-1.8:
samtools dict SampleOne_ref.fa
samtools faidx SampleOne_ref.fai
samtools view -b SampleOne_ref_QuerySampleTwo.sam -o SampleOne_ref_QuerySampleTwo.bam
samtools sort -l 0 -n SampleOne_ref_QuerySampleTwo.bam -o SampleOne_ref_QuerySampleTwo.sorted.bam
samtools index SampleOne_ref_QuerySampleTwo.sorted.bam --> [E::hts_idx_push] Chromosome blocks not continuous
samtools index: failed to create index for "SampleOne_ref_QuerySampleTwo.sorted.bam"
And this is where I'm stuck. I did check the files to see if they were ok, and apparently, they are.
Any clues about where I went wrong?
Thank you so much in advance! Any help will be greatly appreciated!
Why are you using -l 0 in sort? And yeah, like h.mon says, -n sorts by name, which precludes indexing. Are you sure you want to sort by name?
I used "-l 0" because I thought I had to specify that the output should be uncompressed. And no, I am not sure I want to sort by name. Actually now I think I don't have to sort by name because, as h.mon said, to index the files they'd have to be sorted by position, not by name.
Thank you very much for answering!