Modifying Genbank Contamination File for Bedtools
0
0
Entering edit mode
3.4 years ago
selplat21 ▴ 20

I received a contamination file after submitting a genome assembly to Genbank and was wondering if anyone has experience modifying the list of intervals they give you for bedtools maskfasta.

The following are the contents of the contamination.txt file:

[] We ran your sequences through our Contamination Screen. The screen found 
contigs that need to be trimmed and/or excluded. The results are in the 
Contamination.txt file posted in your submission on the WGS submission portal 
https://submit.ncbi.nlm.nih.gov/subs/genome/.  

GenBank staff will automatically remove contaminants that are found to be 
the entire sequence or at the end of a sequence, and will post the reports 
and edited fasta file to the submission portal. Note that internal contamination 
will not be automatically removed since the sequence may be misassembled and 
therefore should be split at the contamination and resubmitted as separate sequences.
In addition, we do not automatically remove mitochondrial sequences in 
eukaryotic submissions. 

If you selected the submission portal option "Do not automatically trim or 
remove sequences identified as contamination" then you will need 
to adjust the sequences appropriately and then resubmit your sequences. 
After you remove the contamination, trim any Ns at the ends of the sequence 
and remove any sequences that are shorter than 200 nt and not part of a 
multi-component scaffold.

Note that mismatches between the name of the adaptor/primer identified in the screen 
and the sequencing technology used to generate the sequencing data should not be used 
to discount the validity of the screen results as the adaptors/primers of many 
different sequencing platforms share sequence similarity.


Mitochondrion:
[] Some sequences are similar to a mitochondrial sequence. There are three options:
 (A) If you are not sequencing mitochondrial DNA, remove these sequences
 (B) If you want to include the mitochondrial sequences in the genome submission, 
 label them as mitochondrial. 
- To label the sequences in a BATCH submission, add a source qualifier in the fasta  
 definition line [location=mitochondrion].  See "IMPORTANT: Additional requirements 
 for batch submissions" at https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/#batch_assignment
- If this is not a BATCH submission, indicate that these sequences are mitochondrial in the 
 Assignment tab of the submission portal.
- In addition, it is recommended but not required that you move these sequences to the end 
of your submission or put them in a separate file so that  they are clustered together.
(C) If these are nuclear mitochondrial pseudogene regions (numt's) that should remain 
 in the submission, please notify us by including a comment in the submission portal 
or by emailing us. Note that numts should be integrated into the genome and therefore 
should only be marked as "Trim".  Any sequence marked as "Exclude" because of a hit 
to mitochondria must either be removed (option A) or labeled as mitochondrial (option B).
The numt sequences will still appear in the contamination report, but once you have 
removed all those marked to exclude and told us that the rest are numts, you can ignore 
the errors. We will manually override the error when we review your submission.


Screened 1,340 sequences, 450,786,658 bp.
Note: 119 sequences with runs of Ns 10 bp or longer (or those longer that 20 MB) were split before screening.
82 sequences with locations to mask/trim
(85 split spans with locations to mask/trim)

Trim:
Sequence name, length, span(s), apparent source
000000F 10718699    1845347..1845478,3309508..3310062,3686734..3687014,4566627..4566818,4582486..4582837    mitochondrion
000001F 10374927    1243815..1244105,2211823..2212157,9224026..9225141  mitochondrion
000002F 9682142 6370591..6371853,8801874..8802128   mitochondrion
000003F 7674730 4397340..4397673    mitochondrion
000005F 2418719 1516348..1516987    mitochondrion
000006F 6130004 175695..176200,1368648..1368883,3578718..3579667    mitochondrion
000008F 6665304 4761589..4762119    mitochondrion
000009F 6358232 1807815..1807973,4641072..4641792,5428340..5428953,6047984..6048418 mitochondrion
000010F 6389901 125850..126094,734615..734785,3342761..3343000,4223963..4224288,4574033..4574247    mitochondrion
000011F 6114160 3569219..3569742,3831575..3832275,4024782..4024931  mitochondrion
000012F 6036981 362201..362901,4404978..4405102 mitochondrion
000014F 5682764 1281919..1282272,1542973..1546176   mitochondrion
000016F 5634279 3981592..3981804    mitochondrion
000017F 5750491 2812567..2812725,4850337..4850469   mitochondrion
000019F 5112736 2348398..2348536,3050337..3050496   mitochondrion
000020F 4868406 1067760..1068057    mitochondrion
000021F 5059621 2129428..2129566,2129772..2129915,2290329..2290617,2799695..2800583,3482351..3483948,4411681..4412559   mitochondrion
000022F 4776965 2818919..2820038,4671479..4671847   mitochondrion
000024F 4579434 2831427..2831869,3567969..3568394,3568477..3569059  mitochondrion
000027F 4339752 789609..789795,1859523..1859682,3773127..3773442    mitochondrion
000028F 3985823 2808230..2808648    mitochondrion
000029F 3964867 1158046..1158174    mitochondrion
000031F 3187727 1256145..1256401,2263958..2264177   mitochondrion
000032F 3811840 1599793..1600201,1670679..1670849,1814206..1814465,3447311..3449636,3450583..3451644    mitochondrion
000034F 1089070 1048969..1049389    mitochondrion
000037F 3207319 128503..128625  mitochondrion
000039F 3240232 2640248..2640458    mitochondrion
000041F 2450599 6165..6364  mitochondrion
000044F 3118497 2040129..2040647,2050634..2050780,2052035..2053538,2057614..2058735,2061235..2061465,2061745..2062442,2062838..2063368,2073753..2074168,2199171..2199291    mitochondrion
000049F 2859690 1387483..1387688    mitochondrion
000051F 2879007 1067486..1067707    mitochondrion
000054F 973803  802110..802413  mitochondrion
000056F 413187  123424..123612  mitochondrion
000059F 2699107 915725..915869,2191943..2192110 vector/etc
000065F 2564273 243386..243550,1925197..1925339 mitochondrion
000070F 2229282 242194..242402  mitochondrion
000072F 2391111 927082..936209,2067053..2067291 mitochondrion
000073F 2206652 810626..810938  mitochondrion
000075F 2264779 798452..798601  mitochondrion
000078F 2277580 311353..311502  mitochondrion
000079F 1897683 433075..433197  mitochondrion
000081F 1182999 217336..217595  mitochondrion
000085F 2139560 1007648..1008344,1508098..1508358   mitochondrion
000088F 1057830 539702..539889  mitochondrion
000094F 1916475 315..2828,5901..6138,9087..9847,11159..11280,1192361..1192573   vector/etc
000101F 1913017 1836912..1837265    mitochondrion
000106F 544451  465130..465426  mitochondrion
000107F 369373  207136..207549  mitochondrion
000111F 1862678 914190..914401  mitochondrion
000124F 1008822 538706..538873  mitochondrion
000126F 1697619 718111..718304  mitochondrion
000130F 996569  909966..910091  mitochondrion
000134F 1401600 225246..225418  mitochondrion
000152F 1393241 496944..497062,497544..497694,497912..498297    mitochondrion
000153F 145345  16164..17437,19532..20818,45089..45248,45468..45936 mitochondrion
000155F 1120018 1079313..1081130,1083130..1083260,1100918..1102204,1105792..1106264 mitochondrion
000176F 1108250 925549..926112  mitochondrion
000193F 544079  441736..441872,442932..443182   mitochondrion
000214F 671671  395139..396598,397750..398633,403849..403989,407040..407170,408501..408855,410142..410288,426065..427367,434035..434181,437765..440529,441249..441646,442958..443079,449066..449825,467011..468364,488540..490517,491663..491815,494263..496240,501342..501800,503141..503287,505112..507095,545573..545888,555106..557404,561338..562703,574191..575182,579479..581456,583281..583427,593852..594321,596071..597560,598713..599999,626530..627729,627877..628743,639847..640606,641962..642637,644460..644606,645618..645767   mitochondrion
000228F 681544  623035..623552  mitochondrion
000232F 703900  293603..293727  mitochondrion
000250F 601353  130722..130911  mitochondrion
000279F 455376  169724..170056  mitochondrion
000300F 371277  368532..369191  mitochondrion
000310F 274701  195298..195560,195835..196583,197476..197622,199798..200210,206114..206731,209044..209281,212557..213260,213354..214430,217838..217968,220149..221462   mitochondrion
000350F 293903  68656..69013,85101..85458,154665..155022    mitochondrion
000377F 261818  213534..213777  mitochondrion
000418F 226830  4793..5150  mitochondrion
000428F 135233  94503..94727    mitochondrion
000451F 209099  186575..186697  mitochondrion
000473F 166641  54212..55527,56679..57971   mitochondrion
000559F 76849   5249..5486,8518..9278,15994..16198  mitochondrion
000583F 99957   41053..41875    mitochondrion
000814F 104678  94246..94603,98318..98675   mitochondrion
000843F 93854   4727..5031,5705..6012   mitochondrion
000849F 45017   349..676,4818..6530,9459..9589,13997..14460 mitochondrion
000973F 77717   1..144  mitochondrion
001063F 69923   69635..69857    mitochondrion
001229F 63775   53560..55537,57362..57508,58849..58979,61904..63624 mitochondrion
001413F 45160   1284..1415  mitochondrion
001459F 47867   24759..24889,28935..31448,34518..34755,37747..38507,39794..39933    mitochondrion
001533F 39796   27600..27876    mitochondrion
Genome-Assembly Linux • 701 views
ADD COMMENT

Login before adding your answer.

Traffic: 1632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6