I have used ballgown on 150 samples. And the ballgown outputs are like below:
Sample1
|_____ e2t.ctab
|_____ e_data.ctab
|_____ Sample1.gtf
|_____ i2t.ctab
|_____ i_data.ctab
|_____ t_data.ctab
Sample1.gtf looks:
# stringtie -e -B -p 8 -G /path/stringtie_output/stringtie_merged.gtf -o /path/Sample1.gtf /path/Sample1.sorted.bam
# StringTie version 1.3.3
chr1 StringTie transcript 10001 10390 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.1"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
chr1 StringTie exon 10001 10101 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.1"; exon_number "1"; cov "0.0";
chr1 StringTie exon 10179 10390 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.1"; exon_number "2"; cov "0.0";
chr1 StringTie transcript 10001 10465 . - . gene_id "MSTRG.6918"; transcript_id "MSTRG.6918.2"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
chr1 StringTie exon 10001 10167 . - . gene_id "MSTRG.6918"; transcript_id "MSTRG.6918.2"; exon_number "1"; cov "0.0";
chr1 StringTie exon 10423 10465 . - . gene_id "MSTRG.6918"; transcript_id "MSTRG.6918.2"; exon_number "2"; cov "0.0";
chr1 StringTie transcript 10001 10467 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.2"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
chr1 StringTie exon 10001 10101 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.2"; exon_number "1"; cov "0.0";
chr1 StringTie exon 10173 10249 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.2"; exon_number "2"; cov "0.0";
chr1 StringTie exon 10398 10467 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.2"; exon_number "3"; cov "0.0";
chr1 StringTie transcript 10001 10467 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.3"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
chr1 StringTie exon 10001 10101 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.3"; exon_number "1"; cov "0.0";
chr1 StringTie exon 10173 10224 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.3"; exon_number "2"; cov "0.0";
chr1 StringTie exon 10391 10467 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.3"; exon_number "3"; cov "0.0";
chr1 StringTie transcript 10005 10467 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.4"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
chr1 StringTie exon 10005 10178 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.4"; exon_number "1"; cov "0.0";
chr1 StringTie exon 10361 10467 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.4"; exon_number "2"; cov "0.0";
chr1 StringTie transcript 10011 10467 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.5"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
chr1 StringTie exon 10011 10178 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.5"; exon_number "1"; cov "0.0";
chr1 StringTie exon 10405 10467 . + . gene_id "MSTRG.6917"; transcript_id "MSTRG.6917.5"; exon_number "2"; cov "0.0";
chr1 StringTie transcript 10001 10465 1000 - . gene_id "MSTRG.6918"; transcript_id "MSTRG.6918.1"; cov "0.567742"; FPKM "0.066922"; TPM "0.283503";
chr1 StringTie exon 10001 10465 1000 - . gene_id "MSTRG.6918"; transcript_id "MSTRG.6918.1"; exon_number "1"; cov "0.567742";
chr1 StringTie transcript 11612 14409 . + . gene_id "MSTRG.7557"; transcript_id "MSTRG.7557.1"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
chr1 StringTie exon 11612 12697 . + . gene_id "MSTRG.7557"; transcript_id "MSTRG.7557.1"; exon_number "1"; cov "0.0";
chr1 StringTie exon 12975 13052 . + . gene_id "MSTRG.7557"; transcript_id "MSTRG.7557.1"; exon_number "2"; cov "0.0";
chr1 StringTie exon 13221 14409 . + . gene_id "MSTRG.7557"; transcript_id "MSTRG.7557.1"; exon_number "3"; cov "0.0";
I actually want the transcript_id
and TPM
as two columns for each sample of my 150 sample gtfs
. How do I do that with awk
or any other way to export the two columns from all files into a single file.
this is what I don't know how to do. Can you please help me with an example. thanq
I actually tried the below command for one of the file
And this gave an output like below:
Explore
awk
online. You'll need theawk -v
option. And always use the-F
option to specifyIFS
- better safe than sorry.If you use cut first and pipe the output to
awk
, you'll need to use something like a shell variable to get the sample name from the filename. You need neither grep nor awk. awk can operate on lines that match a pattern, like so:$1 ~ /transcript/ { do_something; }