Entering edit mode
4.0 years ago
supertech
▴
180
Hi, I have rna-seq data in genomic positions in BED file format : chr, start, stop (stop - start = 1 nt, i.e. single nucleotide )
How can I find corresponding NCBI Refseq transcripts for each position (I need the accession name/mRNA identity not the position on mRNA). In other words based on the genomic positions I need to find the corresponding RNA.
Thanks.
Take a look at
bedtools annotate
andbedtools intersect
for this sort of thing. Both of these tools accept a GFF3 file as one of the inputs, so you should be able to use the RefSeq annotation in GFF3 or GTF formats.If what you have is more like a bedgraph file with the columns chr, start, stop, count where 'count' is the number of reads that align to that position, you can use the UCSC tool
bedGraphPack
to pack consecutive genomic positions that have the same read count. This can reduce the size of your inputs significantly and improve the speed.