Hello,
I have a tab delimited format Splice Junction file and the file looks something like this:
chr1 11212 12009 1 1 0 0 2 48
chr1 11672 12009 1 1 0 0 1 31
chr1 11845 12009 1 1 0 0 1 28
chr1 12228 12612 1 1 1 0 1 32
chr1 12722 13220 1 1 1 0 3 9
chr1 14830 14969 2 2 1 0 218 50
chr1 15039 15795 2 2 1 0 98 50
chr1 15948 16606 2 2 1 1 10 48
chr1 16766 16857 2 2 1 0 24 44
chr1 16766 16875 2 2 0 0 2 36
The task is to filter out lines in which Column 6 has value 1, Column 7 has value 1 and Column 8 has value 10 or greater.
I have been going through the bedtools documentation but I am not quite sure on how to get started, I would appreciate a few pointers on how to get going. My input file is going to be in the tab delimited format and I also have the Gencode V.19 GTF file for annotation.
Thanks!
Edit
- Column 1: chromosome
- Column 2: first base of the intron (1-based)
- Column 3: last base of the intron (1-based)
- Column 4: strand
- Column 5: intron motif: 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT
- Column 6: 0: unannotated, 1: annotated (only if splice junctions database is used)
- Column 7: number of uniquely mapping reads crossing the junction
- Column 8: number of multi-mapping reads crossing the junction
- Column 9: maximum spliced alignment overhang
Added the field names.
Hello ruchiksy!
It appears that your post has been cross-posted to another site: SeqAnswers.
This is typically not recommended as it runs the risk of annoying people in both communities.
Hi
Can you please tell me from where can I get the splice junction annotation file for human.
You can download a GTF file from Ensembl.