What do the "L" and "R" here refer to? Left/right part of the single reads spanning exon-intron junction? Or Left reads and Right reads of the paired-end reads?
How can I convert such bed format into bam? The Epigenome Roadmap doesn't provide sra for this dataset.
What do the "L" and "R" here refer to? Left/right part of the single reads spanning exon-intron junction? Or Left reads and Right reads of the paired-end reads?
I'm going to infer that indeed those are the "left" and "right" paired-end reads, given that the BED name entry seems to indicate a flowcell coordinate, and said coordinate is shared within sets of two reads in your example.
How can I convert such bed format into fastq? The Epigenome Roadmap doesn't provide sra for this dataset.
If you truly wanted FASTQ and not FASTA, and the only source you have for the data is this BED file, then you would have to fake the quality scores. But you could construct the rest of the FASTQ like this:
For the first line of each FASTQ read, use the fourth column of the BED file.
For the second line of each FASTQ read, you would need to extract the portion of the reference genome given by the first three columns of the BED file. So for the first line of your BED, you would want to have the sequence between bases 24,291,630 and 24,291,704 on chromosome 1, inclusive.
For the third line of each FASTQ read, just put a '+' [or some arbitrary value(s)]
For the fourth line, you would need to create fake quality scores, the number of which would correspond to the number of bases you extracted from the reference genome for that read.
This might be made easier through usage of the BedTools getfasta tool.
EDIT: The subject asks for conversion to BAM format, but the question body asks for conversion to FASTQ.
To convert to BAM, there's a tool suite called "Bedtools," which has a tool, BedToBam, that should do the job for you if you supply a reference genome.
Thanks. I also think it's paired-end. But problem is, I guess BedToBam is for single reads; how can I convert "L" and "R" here into bam format for paired-end reads? thx
I am suffering the same problem as yours. And how do you deal with 'Thanks. I also think it's paired-end. But problem is, I guess BedToBam is for single reads; how can I convert "L" and "R" here into bam format for paired-end reads? thx'
The way I would tackle this would be to use a BAM library like pysam. Libraries like these make it easy to write data directly to BAM, though it would also be possible to write to SAM without using a focused library. In that case you'd just want to make sure you're adhering to the spec so that downstream processes don't choke on the file you've created.
Thanks. I also think it's paired-end. But problem is, I guess BedToBam is for single reads; how can I convert "L" and "R" here into bam format for paired-end reads? thx
Hi,
I am suffering the same problem as yours. And how do you deal with 'Thanks. I also think it's paired-end. But problem is, I guess BedToBam is for single reads; how can I convert "L" and "R" here into bam format for paired-end reads? thx'
Thanks...
In the BAM format, read1 vs read2 is specified by a bitwise flag. Check out the SAM specification, section 1.4.
The way I would tackle this would be to use a BAM library like pysam. Libraries like these make it easy to write data directly to BAM, though it would also be possible to write to SAM without using a focused library. In that case you'd just want to make sure you're adhering to the spec so that downstream processes don't choke on the file you've created.