Entering edit mode
8.2 years ago
dr.genetics
▴
60
I used "sam2bed" to convert a sam file to a bed file, and the bed file looks like the following:
chr1 9995 10070 ATTTGAG_CCCTAAC_TTGAGTT_1 4 - 147 18S34M23S = 10010 -20 TACCCTAACCCTCCCCCTTCCGATAACCCTAACCCTAACCCTAACCCTAACCGTTATTAACATATGACAACTCAA //EAA/E///<///6/EE/AA</A////EEEAEEEEEEEEE/EEEEEEEEEEEEAEEAEAEEAEAEEEEEAAAAA NM:i:0 MD:Z:34 AS:i:34 XS:i:31
According to what is described in "https://genome.ucsc.edu/FAQ/FAQformat#format1", the 7th & 8th fields are supposed to be "thickStart" & "thickEnd", but in the above line, the 7th field "147" may be interpreted as "thickStart", but "18S34M23S" does not look anything like "thickEnd". Also, what does the 5th score field ("4") mean? Number of sequence count?
The first fields are consistent with the minimal BED format (chromosome, start, end), but the remainder do not match the UCSC specs for additional optional fields. E.g., field 8 is the CIGAR string.
Edit: see @AlexReynolds link for explanation. Note that there are multiple software tools available for converting SAM/BAM to BED format (e.g., Bedtools, MACS, BEDOPS), each with different default behavior.
See: http://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/sam2bed.html#column-mapping
OK, that explains. Thank you!
where did you get the file from?
Where did I get the sam files? From fastq files using fq2sam.