Question

Splice Junction file intersection with genome annotation

0

Entering edit mode

10.8 years ago

ruchiksy ▴ 50

Hello,

I have a tab delimited format Splice Junction file and the file looks something like this:

chr1    11212    12009    1    1    0    0    2    48
chr1    11672    12009    1    1    0    0    1    31
chr1    11845    12009    1    1    0    0    1    28
chr1    12228    12612    1    1    1    0    1    32
chr1    12722    13220    1    1    1    0    3    9
chr1    14830    14969    2    2    1    0    218    50
chr1    15039    15795    2    2    1    0    98    50
chr1    15948    16606    2    2    1    1    10    48
chr1    16766    16857    2    2    1    0    24    44
chr1    16766    16875    2    2    0    0    2    36

The task is to filter out lines in which Column 6 has value 1, Column 7 has value 1 and Column 8 has value 10 or greater.

I have been going through the bedtools documentation but I am not quite sure on how to get started, I would appreciate a few pointers on how to get going. My input file is going to be in the tab delimited format and I also have the Gencode V.19 GTF file for annotation.

Thanks!

Edit

Column 1: chromosome
Column 2: first base of the intron (1-based)
Column 3: last base of the intron (1-based)
Column 4: strand
Column 5: intron motif: 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT
Column 6: 0: unannotated, 1: annotated (only if splice junctions database is used)
Column 7: number of uniquely mapping reads crossing the junction
Column 8: number of multi-mapping reads crossing the junction
Column 9: maximum spliced alignment overhang

Added the field names.

RNA-Seq splice-junction bedtools • 5.5k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 10.8 years ago by ruchiksy ▴ 50

0

Entering edit mode

Hello ruchiksy!

It appears that your post has been cross-posted to another site: SeqAnswers.

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 10.8 years ago by Devon Ryan 105k

0

Entering edit mode

Hi

Can you please tell me from where can I get the splice junction annotation file for human.

ADD REPLY • link 9.0 years ago by Govardhan Anande ▴ 150

0

Entering edit mode

You can download a GTF file from Ensembl.

ADD REPLY • link 9.0 years ago by Devon Ryan 105k

1

Entering edit mode

10.8 years ago

Ann ★ 2.4k

It's hard to tell from your example what each field is meant to represent as there are many possible ways you could use BED format to indicate splicing patterns. If you can give more detail it will be easier to recommend your next step.

ADD COMMENT • link 10.8 years ago by Ann ★ 2.4k

1

Entering edit mode

I have just added the field names, should have done that in the first place. Thanks!

ADD REPLY • link 10.8 years ago by ruchiksy ▴ 50

0

Entering edit mode

10.0 years ago

shirley0818 ▴ 110

Hi ruchiksy,

I found your input bed file is quite useful. May I ask which tool/software you used to obtain your splice Junction file?

Many thanks,
Shirley

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 10.0 years ago by shirley0818 ▴ 110

0

Entering edit mode

I think its STAR

ADD REPLY • link 9.9 years ago by Lhl ▴ 760

0

Entering edit mode

Definitely looks like my recent STAR output to me, in case another vote was needed.

ADD REPLY • link 9.7 years ago by calizarr ▴ 10

score 2 · Accepted Answer · 2014-06-17

2

Entering edit mode

10.8 years ago

Devon Ryan 105k

Since this isn't a BED file, it'd be extra work to get bedtools to deal with it. Just use awk:

awk '{if($6!=1 && $7!=1 && $8<10) print $0}' original.txt > filtered.txt

For your example, that would print:

chr1    11212    12009    1    1    0    0    2    48
chr1    11672    12009    1    1    0    0    1    31
chr1    11845    12009    1    1    0    0    1    28
chr1    16766    16875    2    2    0    0    2    36

ADD COMMENT • link 10.8 years ago by Devon Ryan 105k

0

Entering edit mode

It worked, thanks Mr. Devon. Although there is a small correction I wanted reads greater than 10 not less than, I changed it when I ran this.

ADD REPLY • link 10.8 years ago by ruchiksy ▴ 50