Hi All,
I have around 2.1 million records that i am trying to creating an index on. I’ve been playing around with tuning (i.e threads and batch size). No matter the setting, after a few hours it settles into 20 documents a second.
I had the threads at 30 (my pool size was 40 min 100 max), but kept getting connection resets / or no JDBC connections. So i reduced it to 10 to stop errors.
public void buildIndex() {
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
try {
fullTextEntityManager
.createIndexer( Interaction.class )
.batchSizeToLoadObjects( 100 )
.cacheMode( CacheMode.IGNORE )
.threadsToLoadObjects( 10 )
.idFetchSize( 300 )
.transactionTimeout( 691200 )
//.progressMonitor( monitor ) //a MassIndexerProgressMonitor implementation
.startAndWait();
} catch (InterruptedException e) {
logger.error("Caught Exception: ", e);
}
}
Example speed at start:
2022-02-11 22:55:15.631 INFO 96088 --- [ntifierloader-1] o.h.s.b.i.SimpleIndexingProgressMonitor : HSEARCH000027: Going to reindex 2135095 entities
2022-02-11 23:02:18.521 INFO 96088 --- [ entityloader-3] o.h.s.b.i.SimpleIndexingProgressMonitor : HSEARCH000030: 21450 documents indexed in 403867 ms
2022-02-11 23:02:18.522 INFO 96088 --- [ entityloader-3] o.h.s.b.i.SimpleIndexingProgressMonitor : HSEARCH000031: Indexing speed: 53.111546 documents/second; progress: 1.00%
Then after a few hours:
2022-02-12 09:17:34.491 INFO 96088 --- [ entityloader-5] o.h.s.b.i.SimpleIndexingProgressMonitor : HSEARCH000030: 851900 documents indexed in 37319836 ms
2022-02-12 09:17:34.491 INFO 96088 --- [ entityloader-5] o.h.s.b.i.SimpleIndexingProgressMonitor : HSEARCH000031: Indexing speed: 22.827003 documents/second; progress: 39.90%
I did initially test on a smaller data set (same entities etc) of around 100k, i was able to get through that in 10 mins. And that was around 100 documents a second.
Am i missing something here? or can anything be done to speed this up?
Here are the indexed entities (they are pretty big, but only have a few fields), ive not included getters / setters
@Entity
@Indexed
@NormalizerDef(name = "lowercase", filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class)
}
)
@Table(name = "Interaction")
public class Interaction {
private static Logger logger = LoggerFactory.getLogger(Interaction.class);
@Id
@Column(name="Id")
private String id;
@Field
@Column(name="Status")
private Short status;
@Column(name="EntityTypeId")
private Short entityTypeId;
@Column(name="MediaTypeId")
private String mediaTypeId;
@Column(name="TypeId")
private String typeId;
@Lob
@Column(name="AllAttributes")
private byte[] allAttributes;
@Column(name="CanBeParent")
private Boolean canBeParent;
@Column(name="CategoryId")
private String categoryId;
@Column(name="ContactId")
private String contactId;
@Column(name="CreatorAppId")
private Integer creatorAppId;
@Field
@Column(name="EndDate")
@DateBridge(resolution=Resolution.SECOND)
private Timestamp endDate;
@Column(name="ExternalId")
private String externalId;
@Column(name="IntAttribute1")
private Integer intAttribute1;
@Column(name="IntAttribute2")
private Integer intAttribute2;
@Column(name="IntAttribute3")
private Integer intAttribute3;
@Column(name="IntAttribute4")
private Integer intAttribute4;
@Column(name="IntAttribute5")
private Integer intAttribute5;
@Column(name="IsCategoryApproved")
private Boolean isCategoryApproved;
@Column(name="IsSpam")
private Boolean isSpam;
@Column(name="Lang")
private String lang;
@Column(name="ModifiedDate")
@DateBridge(resolution=Resolution.SECOND)
private Timestamp modifiedDate;
@Column(name="OwnerId")
private Integer ownerId;
@Column(name="ParentId")
private String parentId;
@Column(name="QueueName")
private String queueName;
@Field
@SortableField
@Column(name="StartDate")
@DateBridge(resolution=Resolution.SECOND)
private Timestamp startDate;
@Column(name="StoppedReason")
private String stoppedReason;
@Column(name="StrAttribute1")
private String strAttribute1;
@Column(name="StrAttribute10")
private String strAttribute10;
@Column(name="StrAttribute2")
private String strAttribute2;
@Column(name="StrAttribute3")
private String strAttribute3;
@Column(name="StrAttribute4")
private String strAttribute4;
@Column(name="StrAttribute5")
private String strAttribute5;
@Column(name="StrAttribute6")
private String strAttribute6;
@Column(name="StrAttribute7")
private String strAttribute7;
@Column(name="StrAttribute8")
private String strAttribute8;
@Column(name="StrAttribute9")
private String strAttribute9;
@Column(name="StructTextMimeType")
private String structTextMimeType;
@Column(name="StructuredText")
private String structuredText;
@Field
@Column(name="Subject")
private String subject;
@Column(name="SubTenantId")
private Integer subTenantId;
@Column(name="SubtypeId")
private String subtypeId;
@Column(name="TenantId")
private Integer tenantId;
@Field
@Column(name="Text")
private String text;
@Field
@Column(name="TheComment")
private String theComment;
@Column(name="ThreadHash")
private Integer threadHash;
@Column(name="ThreadId")
private String threadId;
@Column(name="Timeshift")
private Short timeshift;
@Column(name="WebSafeEmailStatus")
private String webSafeEmailStatus;
@OneToOne
@JoinColumn(name = "id", insertable = false, updatable = false)
@IndexedEmbedded
@NotFound(action=NotFoundAction.IGNORE)
private PhoneCall phoneCall;
@OneToOne
@JoinColumn(name = "id", insertable = false, updatable = false)
@IndexedEmbedded
@NotFound(action=NotFoundAction.IGNORE)
private EmailIn emailIn;
@OneToOne
@JoinColumn(name = "ownerId", insertable = false, updatable = false)
@IndexedEmbedded
@NotFound(action=NotFoundAction.IGNORE)
private CfgPerson cfgPerson;
@Entity
@Table(name = "EmailIn")
public class EmailIn {
@Id
@Column(name="Id")
private String id;
@Field(normalizer = @Normalizer(definition = "lowercase"))
@Column(name="FromAddress")
private String fromAddress;
@Column(name="FromPersonal")
private String fromPersonal;
@Column(name="ReplyToAddress")
private String replyToAddress;
@Column(name="ToAddresses")
private String toAddresses;
@Column(name="CcAddresses")
private String ccAddresses;
@Column(name="BccAddresses")
private String bccAddresses;
@Column(name="SentDate")
private Timestamp sentDate;
@Column(name="Mailbox")
private String mailbox;
@Column(name="WhichRuleMatched")
private String whichRuleMatched;
@Column(name="EmailOutId")
private String emailOutId;
@OneToOne(mappedBy = "emailIn")
@NotFound(action=NotFoundAction.IGNORE)
@ContainedIn
private Interaction interaction;
@Entity
@Table(name = "PhoneCall")
public class PhoneCall {
@Id
@Column(name="Id")
private String id;
@Column(name="Duration")
private Integer duration;
@Column(name="Outcome")
private String outcome;
@Field
@Column(name="Phonenumber")
private String phoneNumber;
@Column(name="TConnectionId")
private String tConnectionId;
@OneToOne(mappedBy = "phoneCall")
@NotFound(action=NotFoundAction.IGNORE)
@ContainedIn
private Interaction interaction;
@Entity
@Table(name = "cfg_person")
public class CfgPerson {
@Id
@Column(name="dbid")
private Integer dbid;
@Column(name="tenant_dbid")
private Integer tenantDbid;
@Field(normalizer = @Normalizer(definition = "lowercase"))
@Column(name="last_name")
private String lastName;
@Field(normalizer = @Normalizer(definition = "lowercase"))
@Column(name="first_name")
private String firstName;
@Column(name="address_line1")
private String addressLine1;
@Column(name="address_line2")
private String addressLine2;
@Column(name="address_line3")
private String addressLine3;
@Column(name="address_line4")
private String addressLine4;
@Column(name="address_line5")
private String addressLine5;
@Column(name="office")
private String office;
@Column(name="home")
private String home;
@Column(name="mobile")
private String mobile;
@Column(name="pager")
private String pager;
@Column(name="fax")
private String fax;
@Column(name="modem")
private String modem;
@Column(name="phones_comment")
private String phonesComment;
@Column(name="birthdate")
private String birthdate;
@Column(name="comment_")
private String comment;
@Column(name="employee_id")
private String employeeId;
@Field
@Field(normalizer = @Normalizer(definition = "lowercase"))
@Column(name="user_name")
private String userName;
@Column(name="password")
private String password;
@Column(name="is_agent")
private Integer isAgent;
@Column(name="state")
private Integer state;
@Column(name="csid")
private Integer csid;
@Column(name="tenant_csid")
private Integer tenantCsid;
@Column(name="place_dbid")
private Integer placeDbid;
@Column(name="place_csid")
private Integer placeCsid;
@Column(name="capacity_dbid")
private Integer capacityDbid;
@Column(name="site_dbid")
private Integer siteDbid;
@Column(name="contract_dbid")
private Integer contractDbid;
@Column(name="salted_string")
private String saltedString;
@Column(name="ch_pass_on_login")
private Integer chPassOnLogin;
@Column(name="pass_updating")
private Integer passUpdating;
@Column(name="pass_hash_alg")
private Integer passHashAlg;
@OneToMany(mappedBy = "ownerId")
@NotFound(action=NotFoundAction.IGNORE)
@ContainedIn
private Set<Interaction> interaction;