I'm constructing an artificial "genome" to do alignments against, and there are several segments of it that I'd like to keep visually distinct, just for my own reference later (e.g. I use "BC" for "barcode" instead of the usual [chr]omosome). My "genome" looks like this:
>BC1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
GGGACCGGT
CGAGGTGGTTGAAGGTCCTATAATGTCGCCCTCTCCTTCAT
CAGACCAGTAGACCGATTAGGATAGAAAGGCTTAAAACTTA
GGAGTGTGGTTTGTAATTAGGATAGAAAGGCTTAAAACTTA
CGAGGTGGTTGAAGGTCCTATAATGTCGCCCTCTCCTTCAT
CAGACCAGTAGACCGATTAGGATAGAAAGGCTTAAAACTTA
GGAGTGTGGTTTGTAATTAGGATAGAAAGGCTTAAAACTTA
GGTATAGTTATC
CCTAG
AAAAAAAAAAAAAAAAAAA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>BC2
...(similar to above)
Each "chromosome" that I'm defining (e.g. BC1
, BC2
, etc...) has a few segments corresponding to restriction sequences, poly-A tail, etc.. There is no actual biological segmentation within each >BC
block, but just for my own ability to quickly come back and visually distinguish each part later, I'm separating them by a newline. Can this create any potential problems? Are there any indexing or genome-conversion packages that assume fixed line lengths? I'm just wondering if this is bad practice.
*Edit: * I should add that I'm planning on using minimap2 and samtools for indexing and alignment.