Extracing the regions outside the target(transctip) region in genome
0
0
Entering edit mode
8.7 years ago
EVR ▴ 610

Hi,

I have a bam file aligned to a genome and a gff file generated from PASA which tell about the location of transcripts in the scaffold.

Now I would like to make use of these two files and extract only the regions outside of transript regions(off-target) where the reads from bam file aligned.

I created a genome file using samtools faidx to use it in bedtools complement but when I use it like following bedtools -i bed_file -g genome_file it gave me an error "Exiting...teger conversion of string"

Any suggestions. Thanks in advance

RNA-Seq bedtools samtools bam_file genome • 3.2k views
ADD COMMENT
0
Entering edit mode

Maybe this post helps.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion, but in that post they described a method to find the reads outside the target region. In my case I have only the reads which aligned outside the transcript regions. I would like to find regions and the counts outside the target regions based on the bam file and GFF/bed files. Any guidance will be really useful

ADD REPLY
0
Entering edit mode

Maybe I'm missunderstanding something... but it should be the same procedure regardless of the region you are interested in (target/transcript). I mean, for one hand, you have a bam file with reads mapped against the genome, and for other hand you have a gff3/bed/gtf (I think that PASA outputs the three types) with transcript regions. If you want to extract the number of reads falling outside the transcripts, first you could get to get the complement region of your transcripts, using "complementBed" (bedtools utility). Then, using samtools view -L complement.bed file.bam, you should be able to extract the reads mapping outside transcripts.

Or even simpler, using the following command bedtools intersect -v -abam file.bam -b transcripts.bed you should get your desired output.

ADD REPLY
0
Entering edit mode

@iraun Thanks for pointing out the bedtools intersect with -v option. When I tried complementbed, it throwed me error like I mentioned above. I ran the bedtools like following

    bedtools -v -abam bamfile -b gtf_file -bed | head -20

X80982  8867    8886    H134:251:C8ML1ACXX:6:1302:5543:78403    0   +   8867    8886    0,0,0   1   19, 0,
X80982  8867    8886    H134:251:C8ML1ACXX:7:1106:12234:104069  0   +   8867    8886    0,0,0   1   19, 0,
X80982  8869    8884    H134:251:C8ML1ACXX:7:1205:3156:142743   0   +   8869    8884    0,0,0   1   15, 0,
X80982  8869    8886    H134:251:C8ML1ACXX:7:1207:11670:48174   0   +   8869    8886    0,0,0   1   17, 0,
X80982  8908    8932    H134:251:C8ML1ACXX:5:1107:21133:47870   0   +   8908    8932    0,0,0   1   24, 0,
X80982  8908    8924    H134:251:C8ML1ACXX:5:1205:16482:135462  0   +   8908    8924    0,0,0   1   16, 0,
X80982  8908    8932    H134:251:C8ML1ACXX:5:1206:5259:150448   0   +   8908    8932    0,0,0   1   24, 0,
X80982  8908    8934    H134:251:C8ML1ACXX:5:1207:15592:146682  0   +   8908    8934    0,0,0   1   26, 0,
X80982  8908    8930    H134:251:C8ML1ACXX:7:1102:19242:182117  0   +   8908    8930    0,0,0   1   22, 0,
X80982  8909    8930    H134:251:C8ML1ACXX:5:1108:8801:143653   0   +   8909    8930    0,0,0   1   21, 0,
X80982  8910    8927    H134:251:C8ML1ACXX:5:1307:13400:37053   0   +   8910    8927    0,0,0   1   17, 0,
X80982  8910    8927    H134:251:C8ML1ACXX:7:1303:10625:116903  0   +   8910    8927    0,0,0   1   17, 0,
X80982  8911    8933    H134:251:C8ML1ACXX:5:1103:14340:136181  0   +   8911    8933    0,0,0   1   22, 0,
X80982  8911    8932    H134:251:C8ML1ACXX:5:1104:7391:156872   0   +   8911    8932    0,0,0   1   21, 0,
X80982  8911    8933    H134:251:C8ML1ACXX:5:1105:21118:47352   0   +   8911    8933    0,0,0   1   22, 0,
X80982  8911    8926    H134:251:C8ML1ACXX:5:1203:6810:55506    0   +   8911    8926    0,0,0   1   15, 0,
X80982  8911    8933    H134:251:C8ML1ACXX:5:1307:10209:105518  0   +   8911    8933    0,0,0   1   22, 0,
X80982  8911    8933    H134:251:C8ML1ACXX:5:1308:14706:129329  0   +   8911    8933    0,0,0   1   22, 0,
X80982  8911    8933    H134:251:C8ML1ACXX:6:1208:5898:106696   0   +   8911    8933    0,0,0   1   22, 0,
X80982  8912    8932    H134:251:C8ML1ACXX:5:1203:5034:23711    0   +   8912    8932    0,0,0   1   20, 0,

Now how to count the reads for the for the regions? Can I make use of samtools flagstat ot HTSeq counts? Kindly guide me.

ADD REPLY

Login before adding your answer.

Traffic: 1478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6