Dear All,
I have a relative simple question but i don't know how to solve this. I want to make a BED file from SAM or BAM incluiding nucleotide sequence of each mapped reads.
For example the BAM files contain genomic regions from chromosome 18 covered by short reads obtained from NGS-Illumina experiment (MRE-seq).
I need to create a BED file containing the following columns:
- the first column is of type character and contains the chromosome of the region (e.g. chr1)
- the second column is of type numeric and contains the start position of the mapped read
- the third column is of type numeric and contains the stop position of the mapped read
- the fourth column contains the nucleotide sequence of the mapped read
An example BED file is:
chr18 9954 10104 AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCTACCCTAACCC 0 -
chr18 10053 10203 TAACCCTAACCCTAACCCTAAACCCTAACCCTGAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT 0 -
chr18 10084 10234 CCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTTAACCCTAACCA 0 +
chr18 10112 10262 ACCCTAACCCTAACCCTAACCCTAACCCTTAACCTTAACCCTAACCCTTAACCCTAACCCTAACCCTAACCCTAA 0 +
chr18 10114 10264 CCTAACCCTAACCCTAACCCTAACCCTAACCCTTAACCCTAACCCTTAACCCTAACCCTAACCCTAACCCTAACC 0 +
chr18 10116 10266 TAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCT 0 +
chr18 10126 10276 CCTAACCCTAACCCTAACCCTAAACCCTAACCCTAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC 0 +
chr18 10139 10289 CAACCCCAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA 0 -
chr18 10148 10298 AACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCAACCCCAA 0 +
Hi Istvan, I applied your instructions and I was able to create the file I needed. Thank you very much for the help
samtools view -F 4 data.bam.
some reads in the samfile may not map to ref ,but bamtobed ignore it.
NC_000915.1 10 88 readname2 0 - readname1 117 NC_000915.1 11 0 *
NC_000915.1 10 65 readname3 0 - readname2 185 NC_000915.1 11 0 21S78M