bed file format?
0
0
Entering edit mode
8.2 years ago
dr.genetics ▴ 60

I used "sam2bed" to convert a sam file to a bed file, and the bed file looks like the following:

chr1    9995    10070   ATTTGAG_CCCTAAC_TTGAGTT_1       4       -       147     18S34M23S       =       10010   -20   TACCCTAACCCTCCCCCTTCCGATAACCCTAACCCTAACCCTAACCCTAACCGTTATTAACATATGACAACTCAA     //EAA/E///<///6/EE/AA</A////EEEAEEEEEEEEE/EEEEEEEEEEEEAEEAEAEEAEAEEEEEAAAAA     NM:i:0  MD:Z:34 AS:i:34 XS:i:31

According to what is described in "https://genome.ucsc.edu/FAQ/FAQformat#format1", the 7th & 8th fields are supposed to be "thickStart" & "thickEnd", but in the above line, the 7th field "147" may be interpreted as "thickStart", but "18S34M23S" does not look anything like "thickEnd". Also, what does the 5th score field ("4") mean? Number of sequence count?

next-gen sequencing gene • 2.9k views
ADD COMMENT
2
Entering edit mode

The first fields are consistent with the minimal BED format (chromosome, start, end), but the remainder do not match the UCSC specs for additional optional fields. E.g., field 8 is the CIGAR string.

Edit: see @AlexReynolds link for explanation. Note that there are multiple software tools available for converting SAM/BAM to BED format (e.g., Bedtools, MACS, BEDOPS), each with different default behavior.

ADD REPLY
0
Entering edit mode

OK, that explains. Thank you!

ADD REPLY
0
Entering edit mode

where did you get the file from?

ADD REPLY
0
Entering edit mode

Where did I get the sam files? From fastq files using fq2sam.

ADD REPLY

Login before adding your answer.

Traffic: 1836 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6