Essentially I made a pseudo reference genome using a BED file with specific regions of interest (eg TP53 at chr17).
Now I have to generate a genome.fq with splice junctions inserted using STAR before mapping my samples. Easy enough.
But when I try mapping, I get an error stating no valid exon lines in GTF file. A common cause is chromosome naming difference between my fastq and gtf.
Upon closer inspection, the chrName.txt for my pseudo refgenome and whole ref genome that's generated by STAR are different:
pseudogenome
17:7565096-7579937
17:7569403-7579937
17:7571719-7578811
17:7571719-7590868
17:7571719-7590868
17:7577498-7590868
17:7577498-7590868
17:7571719-7576926
17:7571719-7578811
17:7571719-7578811
17:7571719-7590868
17:7571719-7590868
17:7577498-7578554
17:7579311-7590868
17:7577850-7590868
17:7571719-7579937
17:7571719-7590868
whole genome
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y
MT
I'm assuming bedtools fastafrombed is the culprit. Could it be that using BED12 (instead of BED6) is cause for all this?
BED file:
17 7565096 7579937 uc002gig.1 0 - 7565256 7579912 0 7 236,110,113,184,279,22,99, 0,12402,13080,13274,14215,14603,14742,
17 7569403 7579937 uc002gih.3 0 - 7569523 7579912 0 9 159,74,137,110,113,184,279,22,99, 0,7449,7615,8095,8773,8967,9908,10296,10435,
17 7571719 7578811 uc002gii.2 0 - 7572926 7578452 0 7 1289,107,74,137,110,113,441, 0,2207,5133,5299,5779,6457,6651,
17 7571719 7590868 uc002gij.3 0 - 7572926 7579569 0 11 1289,107,74,137,110,113,184,279,22,102,174, 0,2207,5133,5299,5779,6457,6651,7592,7980,8119,18975,
17 7571719 7590868 uc002gim.3 0 - 7572926 7579912 0 11 1289,107,74,137,110,113,184,279,22,99,174, 0,2207,5133,5299,5779,6457,6651,7592,7980,8119,18975,
17 7577498 7590868 uc002gin.3 0 - 7577500 7579912 0 6 110,113,184,22,102,174, 0,678,872,2201,2340,13196,
17 7577498 7590868 uc002gio.3 0 - 7577500 7578533 0 4 110,113,184,174, 0,678,872,13196,
17 7571719 7576926 uc010cne.1 0 - 7571719 7571719 0 2 1289,74, 0,5133,
17 7571719 7578811 uc010cnf.2 0 - 7576536 7578452 0 8 1289,107,60,74,137,110,113,441, 0,2207,4805,5133,5299,5779,6457,6651,
17 7571719 7578811 uc010cng.2 0 - 7576624 7578452 0 8 1289,107,133,74,137,110,113,441, 0,2207,4805,5133,5299,5779,6457,6651,
17 7571719 7590868 uc010cnh.2 0 - 7576536 7579912 0 12 1289,107,60,74,137,110,113,184,279,22,102,174, 0,2207,4805,5133,5299,5779,6457,6651,7592,7980,8119,18975,
17 7571719 7590868 uc010cni.2 0 - 7576624 7579912 0 12 1289,107,133,74,137,110,113,184,279,22,102,174, 0,2207,4805,5133,5299,5779,6457,6651,7592,7980,8119,18975,
17 7577498 7578554 uc010cnj.1 0 - 7577498 7577498 0 2 110,184, 0,872,
17 7579311 7590868 uc010cnk.2 0 - 7579311 7580659 0 5 279,22,102,103,174, 0,388,527,1331,11383,
17 7577850 7590868 uc010vug.3 0 - 7578137 7579569 0 5 439,184,279,241,174, 0,520,1461,1849,12844,
17 7571719 7579937 uc031qyp.1 0 - 7571719 7571719 0 10 1289,107,74,137,110,118,184,279,22,99, 0,2207,5133,5299,5779,6452,6651,7592,7980,8119,
17 7571719 7590868 uc031qyq.1 0 - 7572926 7579569 0 10 1289,107,74,137,110,113,184,279,241,174, 0,2207,5133,5299,5779,6457,6651,7592,7980,18975,
I did not understood completely what you are trying to do, but when you use
fastafrombed
, the sequence names are obviously renamed to thechr:start-end
format, which would be different from your original fasta files.