How can I get the sequences of the first 20nt of introns and the last 250nt of introns of hg38 genome?
2
0
Entering edit mode
10 months ago
xiaoleiusc ▴ 140

Dear Biostars users,

I have the hg38 genome fasta file and a bed file of all the introns (GENCODE V44) of the hg38 genome. I would like to get two fasta files: 1) a fasta file of the first 20nt of all the introns; and 2) a fasta file of the last 250nt of all the introns.

What tools (and command lines) should I use to generate these two output fasta files?

Thanks ahead,

Xiao

annotation RNA • 669 views
ADD COMMENT
1
Entering edit mode
10 months ago

You can use Bedtools flank to generate the BED file of the desired intron intervals, then getfasta to output the sequences.

ADD COMMENT
0
Entering edit mode

The example code is bedtools flank [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> [-b or (-l and -r)], in my case, can I use -l -20 and -r -250 ? I wonder if I can use negative values (e.g. -20 or -250 in my case) followed the -l or -r. Thanks.

ADD REPLY
0
Entering edit mode
-l  The number of base pairs to subtract from the start coordinate. Integer.

You are covered on the left end. May need to create a BED with one base intervals with start and end of the introns. One entry for start and one for end.

ADD REPLY
0
Entering edit mode
10 months ago
Juke34 8.9k

Another solution is to use AGAT Either you use the gff/gtf annotation as input or convert your bed to gff with agat_convert_bed2gff. Then you can use agat_sp_extract_sequences using parameter to extract flank regions : https://agat.readthedocs.io/en/latest/tools/agat_sp_extract_sequences.html

ADD COMMENT

Login before adding your answer.

Traffic: 2207 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6