LncRNA pipeline
2
0
Entering edit mode
11 months ago
Researcher ▴ 30

HI ALL, I am working with lncrna, my pipeline is mapping with star and then sorting, folloed by stringtie and from the output using awk command, extract lncrna: -

awk '($3 == "transcript" && $5 - $4 > 200) { exon_count = gsub(/exon_number "[0-9]+"/, "&"); } (exon_count > 2) { print }' 
lncrna RNA-Seq NGS • 1.1k views
ADD COMMENT
2
Entering edit mode
11 months ago
Prash ▴ 280

Subhikhsa, subtle changes

=200

and exon count could be equivalent or more than 3 to avoid intergenic boundaries and capture lincRNAs

Can you please send the screenshot?

Prash

PS: Please don't type in caps. It means SHOUTING in net jargon

ADD COMMENT
0
Entering edit mode

okay thank you sir.

1   StringTie   transcript  14529   29358   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "9.250028"; FPKM "1.439988"; TPM "3.138408";
1   StringTie   transcript  14529   24901   1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.2"; cov "15.766621"; FPKM "2.454452"; TPM "5.349400";
1   StringTie   transcript  135141  135895  1000    -   .   gene_id "STRG.2"; transcript_id "STRG.2.1"; reference_id "ENST00000494149"; ref_gene_id "ENSG00000268903"; cov "1.669579"; FPKM "0.259910"; TPM "0.566465";
1   StringTie   transcript  137682  137965  1000    -   .   gene_id "STRG.3"; transcript_id "STRG.3.1"; reference_id "ENST00000595919"; ref_gene_id "ENSG00000269981"; cov "1.830701"; FPKM "0.284992"; TPM "0.621132";
1   StringTie   transcript  164282  168955  1000    -   .   gene_id "STRG.4"; transcript_id "STRG.4.1"; cov "1.205568"; FPKM "0.187676"; TPM "0.409033";
1   StringTie   transcript  184957  197026  1000    -   .   gene_id "STRG.5"; transcript_id "STRG.5.1"; cov "3.289948"; FPKM "0.512159"; TPM "1.116234";
1   StringTie   transcript  185217  195411  1000    -   .   gene_id "STRG.5"; transcript_id "STRG.5.2"; reference_id "ENST00000623083"; ref_gene_id "ENSG00000279457"; ref_gene_name "WASH9P"; cov "8.402778"; FPKM "1.308094"; TPM "2.850948";
1   StringTie   transcript  185217  187958  1000    -   .   gene_id "STRG.5"; transcript_id "STRG.5.3"; cov "4.316052"; FPKM "0.671897"; TPM "1.464378";
1   StringTie   transcript  185217  199874  1000    -   .   gene_id "STRG.5"; transcript_id "STRG.5.4"; cov "2.504329"; FPKM "0.389859"; TPM "0.849685";
1   StringTie   transcript  185217  199874  1000    -   .   gene_id "STRG.5"; transcript_id "STRG.5.5"; cov "4.394231"; FPKM "0.684067"; TPM "1.490903";
1   StringTie   transcript  185217  197026  1000    -   .   gene_id "STRG.5"; transcript_id "STRG.5.6"; cov "1.085559"; FPKM "0.168993"; TPM "0.368315";

these are few lines from the lncrna extracted gtf file, that i obtained after using the above command .

ADD REPLY
0
Entering edit mode

I also have one more doubt. so if this gtf file is right, to find the expression of lncrna, ill have to run htseq for the mapped bam file along with this lncrna gtf file to get lncrna expression count, which i can further use for dseq2. is that right?

ADD REPLY
2
Entering edit mode
11 months ago
Prash ▴ 280

This is the pipeline I suggest where we benchmarked . However, it all depends on the questions you are asking! Is it towards the regulatory potential of lncRNAs or lncRNA x gene interatcions etc.

ADD COMMENT
0
Entering edit mode

Thank you for sharing the pipeline sir. My end goal is lncrna and gene interaction

ADD REPLY

Login before adding your answer.

Traffic: 1565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6