Hi,
I downloaded an interesting dataset of which I need to get some information out. The data looks like this:
readID Seq 0-misHit 1-misHit 2-misHit chr start end strand
HWUSI-EAS230-R:2:99:1151:1802#0/1 GAGCTCATTGGTGGCGTGGTGGCCTTGACCTTCCGG 1 0 0 chr10 70914936 70914971 -
HWUSI-EAS230-R:2:44:642:495#0/1 TTGGCTGCCTTCTGGGGTGAACTTTCTGCTATTTCC 0 0 1 chr7 47298110 47298145 -
...
In the GEO dataset (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22260) it is stated that the
"Alignment: Sequence reads were obtained and mapped to the hg18 (March, 2006) using the Illumina Genome Analyzer Pipeline. All reads mapping with two or fewer mismatches were retained"
My aim is to detect a splice variant of a certain gene. I tried to further process the data with cufflinks however get the error that the format is not correct (as it is not SAM or BAM).
I would really appreciate if someone could give me some hints and suggestions on what tool to use best and if I can work with this data format or have to change it?