Dear all, I have one problem I need to count number of rows and GC content in bam file for individual segments, but I do not know, how to separate it. the size of segment is 60kb. Than for reads, which has start coordinates to 60000 in chr 1, I will receive the count of reads (rows) and GC content. Please could you help me. I have the source code for count NR in awk and GC content, but it is only for all reads in bam file. And I have some source code to separate bed file to each 60kb segments if it help you.
Count of GC
awk '{ n=length($10); print $10, gsub(/[GCCgcs]/,"",$10)/n;}' your.sam
Count of reads (rows)
awk 'END {print NR}'
Separate bed file:
$ awk \
'{ \
regionChromosome = $1; \
regionStart = $2; \
regionStop = $3; \
val = 1; \
for ( baseStart = regionStart; baseStart < regionStop; baseStart += 60000 ) { \
baseStop = baseStart + 60000; \
print regionChromosome"\t"baseStart"\t"baseStop"\t"val++; \
} \
}' hg19.extents.bed \
| bedmap --echo --echo-map-id --delim "\t" 1.bed - > answer.1.bed
Thank you so much.
Yes I understand you but iI need to do it for all bam file. It is mean, I separate bam file to each bam files, separate by chromosome (I receive 25 files) and every file I have to separate to each part. Than from 0-60kb; 60kb-120kb,120kb-180kb to the end and for this partitions count GC and fragments.
You can specify the chromosome you want:
etc.
Of course, I'd recommend creating a small script to do it, but you don't need to pull apart the bam file to operate on different chromosomes.
Ok, i understand of this, but I separate becasue for cycle will be for me easier than if I have all chrom together, i am totally beginner. I do not have any other idea to do this. Because if I will write all segments by hand, it is unpossible.
This is a great opportunity to learn some programming and scripting. I would suggest identifying a local "mentor" who can work with you to do so; doing so will be much more efficient than using biostars. If this is not something you feel comfortable with, then I think you might consider identifying a collaborator who can help you with your data, as data analysis is absolutely FULL of small tasks like this that require scripting.