Hi all,
I would like to split a very big BAM file into smaller files for the purpose of annotating it in parallell. Someone suggested splitting it by tile number, which is a good idea since that guarantees that all the alignments for a given read are contained within the same file.
However, I am stuck as to how to phrase the awk command for this purpose, since the tile number is contained within the READ ID string in the first filed of the alignment, separated from the other information in the string by ":" , while this field is separated from the other fields by "\t" .
HWI-ST975:104:C0W47ACXX:8:1101:8269:91631
Tile number (encrypted) = 1101 (5th field) How could I use awk to get each line put into its new corresponding file based on its tile number?
Thanks, Carmen
I think i may have a perl solution to this, but I don't know the exact way to phrase the output. Can anybody help me out ? :)
I have made a hash of hashes, where all the lines of a file are sorted into a key of the "master" hash depending on the value of their 5th field.
%Tiles has n keys, where each key is a different $Tile_Number.
Each $Tile_Number opens a new hash that contains all lines whose $Tile_Number was the right number of the current key. The value of each of these new keys (lines) is just 1.
$Tiles{Tile_Number}($Line}=1 , where $Tiles{Tile_Number} has many $Line=1 entries.
I want to print each $Tiles{$Tile_Number} hash in a separate file, preferably, creating the file upon the creation of the $Tile_Number key, and printing as each new $Tiles{$Tile_Number}{$Line}=1 is added, to save memory. The best would be to not print the final value (1), but I can do away with this, I guess..
How can I tell perl to open a new file for each key in the "master" hash and print all of its keys?
Thank you, Carmen