I got a sam file after do an alignment (of my NGS data), using this function:
bowtie2 -f -x scaffold_filt_PV014_OD_DB.fa PV002_cmv3.fa -S scaffold_nofilt_OD_PV002.sam
After it, I change to bam file :
samtools view -bS scaffold_nofilt_OD_PV002.sam > scaffold_nofilt_OD_PV002.bam
I removed the unpaired reads:
samtools view -F 0x04 -b scaffold_nofilt_OD_PV002.bam >scaffold_nofilt_OD_rm_PV002.bam
I select those with a map quality higher than 10
samtools view -q 10 -b scaffold_nofilt_OD_rm_PV002.bam > scaffold_nofilt_OD_MapQ10_PV002.bam
I sort and I index it:
samtools sort scaffold_nofilt_OD_MapQ10_PV002.bam scaffold_nofilt_OD_PV002_index
samtools index scaffold_nofilt_OD_PV002_index.bam
When I am visualizing these bam files I got different coverages among my libraries. I would like normalize this coverage among them, but I don't know how to do that.
Can you guide me how can I normalize them?
Thank in advantage.
In unix, remember that redirections are among the first operations executed, which means
>scaffold_nofilt_OD_PV002.bam
is executed first, obliterating the content of the filescaffold_nofilt_OD_PV002.bam
. Your code should stop working at that step unless what you've pasted here is not what you ran.well, I run them using different directories for input files and output files
Should have made that clear in your command as the command would not run as-is.
What kind of data are these? Normalization for which purpose (visualisation, some differential analysis, if so which)?
My data comes frome NGS data which I aligned against a database. I would like normalize it for find the polymorphisms, due to I have different coverages in each library.
Data from NGS is like saying a car comes from a factory but does not tell the brand at all, please be more specific so that we can help you. Is this exome sequencing or something similar that is actually intended to find polymorphisms or is it something like RNA-seq that you now try to tweak in order to find them? Given that the files are called
scaffold...
I guess it is whole genome or exome sequencing. If so, there is no need for normalization. Use any of the established variant callers such asGATK
,Strelka
etc (use the search function please for more suggestions) to find variants. The tools will ahndle normalization internally.This is RNA-seq, and I am trying to modify them.
I don’t understand how coverage (quantitative data) relates to polymorphisms (qualitative data). Why do you need to normalize? What do you hope to achieve?
The concepts "RNAseq", "normalization" and "polymorphisms" don't make sense together. Please explain what you mean by normalization, how it is useful in this scenario and why you're looking for polymorphisms using RNAseq in the first place (taking you further away from changes at the DNA level and being unable to account for any changes at splice sites / RNA modifications)