RepeatModeler Version 2.0.7 =========================== Using output directory = /dev/shm/rModeler.idZXKu/RM_755533.MonSep82031352025 Search Engine = rmblast 2.14.1+ Threads = 32 Dependencies: TRF 4.09, RECON , RepeatScout 1.0.7, RepeatMasker 4.2.1, RepeatAfterMe 0.0.7 LTR Structural Analysis: Disabled [use -LTRStruct to enable] Random Number Seed: 1757388694 Database = /dev/shm/rModeler.idZXKu/GCA_003317015.2_ASM331701v2 - Sequences = 153 - Bases = 42398840 - N50 = 4235512 - Contig Histogram: Size(bp) Count ----------------------------------------------------------------------- 5702085-6109303 | [ 1 ] 5294867-5702084 | [ ] 4887650-5294867 | [ ] 4480432-4887649 | [ ] 4073215-4480432 |* [ 4 ] 3665997-4073214 | [ 1 ] 3258779-3665996 | [ ] 2851562-3258779 | [ 1 ] 2444344-2851561 | [ 2 ] 2037127-2444344 | [ 1 ] 1629909-2037126 | [ 1 ] 1222691-1629908 | [ ] 815474-1222691 | [ ] 408256-815473 | [ ] 1039-408256 |************************************************** [ 142 ] Storage Throughput = excellent ( 1098.75 MB/s ) RepeatModeler Round # 1 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 40000000 bp - Final Sample Size = 40099249 bp ( 40024449 non ambiguous ) - Num Contigs Represented = 148 - Sequence extraction : 00:00:08 (hh:mm:ss) Elapsed Time -- Running RepeatScout on the sequences... - RepeatScout: Running build_lmer_table ( l = 14, min = 10 ).. - RepeatScout: Running RepeatScout.. : 41 raw families identified - RepeatScout: Running filtering stage.. 24 families remaining - RepeatScout: 00:01:09 (hh:mm:ss) Elapsed Time - Collecting repeat instances... - Refining 21 families... 00:02:11 (hh:mm:ss) Elapsed Time - Redundant Families and Large Satellite Filtering.. : 0 satellite(s), 6 contained, found in 00:00:03 (hh:mm:ss) Elapsed Time Family Refinement: 00:00:03 (hh:mm:ss) Elapsed Time Round Time: 00:03:36 (hh:mm:ss) Elapsed Time : 15 families discovered. RepeatModeler Round # 2 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 10000000 bp - Sequence extraction : 00:00:02 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence... - TRFMask time 00:00:13 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds... 373 repeats masked totaling 143128 bp(s). - TE Masking time 00:00:21 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 10039758 bp Num Contigs Represented = 46 Non ambiguous bp: Initial: 10020025 bp After Masking: 9868348 bp Masked: 1.51 % -- Input Database Coverage: 10039758 bp out of 42398840 bp ( 23.68 % ) Sampling Time: 00:00:38 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... - Total Comparisons = 37950 Comparison Time: 00:05:52 (hh:mm:ss) Elapsed Time, 2742 HSPs Collected Number of families returned by RECON: 574 Round Time: 00:10:28 (hh:mm:ss) Elapsed Time : 3 families discovered. RepeatModeler Round # 3 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 30000000 bp - Sequence extraction : 00:00:06 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence... - TRFMask time 00:00:35 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds... 1604 repeats masked totaling 461326 bp(s). - TE Masking time 00:00:46 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 30059435 bp Num Contigs Represented = 121 Non ambiguous bp: Initial: 30004368 bp After Masking: 29510814 bp Masked: 1.64 % -- Input Database Coverage: 40099193 bp out of 42398840 bp ( 94.58 % ) Sampling Time: 00:01:30 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... - Total Comparisons = 360825 Comparison Time: 00:31:08 (hh:mm:ss) Elapsed Time, 16440 HSPs Collected Number of families returned by RECON: 3460 Round Time: 00:36:41 (hh:mm:ss) Elapsed Time : 14 families discovered. - Increasing sample size to include end piece now = 92359082 RepeatModeler Round # 4 ======================== Searching for Repeats -- Sampling from the database... - Gathering up to 92359082 bp - Sequence extraction : 00:00:01 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence... - TRFMask time 00:00:03 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds... 158 repeats masked totaling 21745 bp(s). - TE Masking time 00:00:10 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 2299439 bp Num Contigs Represented = 19 Non ambiguous bp: Initial: 2296439 bp After Masking: 2273051 bp Masked: 1.02 % -- Input Database Coverage: 42398632 bp out of 42398840 bp ( 100.00 % ) Sampling Time: 00:00:14 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... - Total Comparisons = 1891 Comparison Time: 00:00:52 (hh:mm:ss) Elapsed Time, 72 HSPs Collected Number of families returned by RECON: 56 Round Time: 00:01:07 (hh:mm:ss) Elapsed Time : 0 families discovered. RepeatScout/RECON discovery complete: 32 families found # # RepeatClassifier # # Version 2.0.7 # Threads: 32 # Current Working Directory: /dev/shm/rModeler.idZXKu/RM_755533.MonSep82031352025 # Protein Library: /hive/data/outside/RepeatMasker/RepeatMasker-4.2.1/Libraries/RepeatPeps.lib # - 18011 proteins # Consensi Library: /hive/data/outside/RepeatMasker/RepeatMasker-4.2.1/Libraries/RepeatMasker.lib # - 26292 consensus sequences - Looking for simple/tandem and low complexity sequences.. - Looking for similarity to known repeat proteins.. - Looking for similarity to known repeat consensi.. Classification Time: 00:00:23 (hh:mm:ss) Elapsed Time Program Time: 00:52:15 (hh:mm:ss) Elapsed Time