Features
- Replication without requiring external scripts
- Configuration in solrconfig.xml only
- Replicates configuration files also
- Works across platforms with same configuration
- No reliance on OS-dependent hard links
- Tightly integrated with Solr; an admin page offers fine-grained control of each aspect of replication
SOLR-561 tracks the original development of this feature.
Configuration
The new Java-based replication feature is implemented as a RequestHandler. Configuring replication is therefore similar to any normal RequestHandler.
Master
<requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="master"> <!--Replicate on 'startup' and 'commit'. 'optimize' is also a valid value for replicateAfter. --> <str name="replicateAfter">startup</str> <str name="replicateAfter">commit</str> <!--Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string. Note that this is just for backup, replication does not require this. --> <!-- <str name="backupAfter">optimize</str> --> <!--If configuration files need to be replicated give the names here, separated by comma --> <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str> <!--The default value of reservation is 10 secs.See the documentation below . Normally , you should not need to specify this --> <str name="commitReserveDuration">00:00:10</str> </lst> <!-- keep only 1 backup. Using this parameter precludes using the "numberToKeep" request parameter. (Solr3.6 / Solr4.0)--> <!-- (For this to work in conjunction with "backupAfter" with Solr 3.6.0, see bug fix https://issues.apache.org/jira/browse/SOLR-3361 )--> <str name="maxNumberOfBackups">1</str> </requestHandler>
Note:
If your commits are very frequent and network is particularly slow, you can tweak an extra attribute <str name="commitReserveDuration">00:00:10</str>. This is roughly the time taken to download 5MB from master to slave. Default is 10 secs.
If you are using startup option for replicateAfter, it is necessary to have a commit/optimize entry also, if you want to trigger replication on future commits/optimizes. If only the startup option is given, replication will not be triggered on subsequent commits/optimizes after it is done for the first time at the start.
Replicating solrconfig.xml
In solrconfig.xml on the master server, in the replication request handler, include a line like the following:
<str name="confFiles">solrconfig_slave.xml:solrconfig.xml,x.xml,y.xml</str>
This ensures that the local configuration 'solrconfig_slave.xml' will be saved as 'solrconfig.xml' on the slave. All other files will be saved with their original names. (Note that the "confFiles" value is set to something else in the example shown above.)
On the master server, the file name of the slave configuration file can be anything, as long as the name is correctly identified in the "confFiles" string; then it will be saved as whatever file name appears after the colon ':'.
Slave
<requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="slave"> <!--fully qualified url for the replication handler of master . It is possible to pass on this as a request param for the fetchindex command--> <str name="masterUrl">http://master_host:port/solr/corename/replication</str> <!--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API --> <str name="pollInterval">00:00:20</str> <!-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED--> <!--to use compression while transferring the index files. The possible values are internal|external if the value is 'external' make sure that your master Solr has the settings to honour the accept-encoding header. see here for details http://wiki.apache.org/solr/SolrHttpCompression If it is 'internal' everything will be taken care of automatically. USE THIS ONLY IF YOUR BANDWIDTH IS LOW . THIS CAN ACTUALLY SLOWDOWN REPLICATION IN A LAN--> <str name="compression">internal</str> <!--The following values are used when the slave connects to the master to download the index files. Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify these unless the bandwidth is extremely low or if there is an extremely high latency--> <str name="httpConnTimeout">5000</str> <str name="httpReadTimeout">10000</str> <!-- If HTTP Basic authentication is enabled on the master, then the slave can be configured with the following --> <str name="httpBasicAuthUser">username</str> <str name="httpBasicAuthPassword">password</str> </lst> </requestHandler>
Note: If you are not using cores, then you simply omit the "corename" parameter above in the masterUrl. To ensure that the url is correct, just hit the url with a browser. You must get a status OK response.
Setting up a Repeater
A master may be able to serve only so many slaves without affecting performance. Some organizations have deployed slave servers across multiple data centers. If each slave downloads the index from a remote data center, the resulting download may consume too much network bandwidth. To avoid performance degradation in cases like this, you can configure one or more slaves as repeaters. A repeater is simply a node that acts as both a master and a slave.
Example configuration of a repeater:
<requestHandler name="/replication" class="solr.ReplicationHandler"> <lst name="master"> <str name="replicateAfter">commit</str> <str name="confFiles">schema.xml,stopwords.txt,synonyms.txt</str> </lst> <lst name="slave"> <str name="masterUrl">http://master.solr.company.com:8080/solr/replication</str> <str name="pollInterval">00:00:60</str> </lst> </requestHandler>
enable/disable master/slave in a node
If a server needs to be turned into a master from a slave or if you wish to use the same solrconfig.xml for both master and slave, do as follows,
<requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="master"> <str name="enable">${enable.master:false}</str> <str name="replicateAfter">commit</str> <str name="confFiles">schema.xml,stopwords.txt</str> </lst> <lst name="slave"> <str name="enable">${enable.slave:false}</str> <str name="masterUrl">http://master_host:8983/solr/replication</str> <str name="pollInterval">00:00:60</str> </lst> </requestHandler>
NOTE: Be sure to add a 'false' default to enable.master and enable.slave, or deploying will fail and the log will shows Solr gets a '.' instead of correct core instance dir. In Solr Admin main page, you won't see links with core names (Clicking into the only link, you'll get 'missing core in path' error).
NOTE: If deploying is ok but you'll still see no links with core name, it could be permission problem. This normally happens when you set instance dir under AP server's default webapps folder. The workaround is to create another folder with good permissions, and put solr.xml into root of that folder. Edit context file to point solr/home to that folder. Now undeploy previous webapp and redeploy again.
When the master is started, pass in -Denable.master=true and in the slave pass in -Denable.slave=true. Alternately, these values can be stored in a solrcore.properties file as follows:
#solrcore.properties in master enable.master=true enable.slave=false
and in slave
#solrcore.properties in slave enable.master=false enable.slave=true
NOTE: Each core has its own solrcore.properties that is located under the conf directory of each core's instance directory.
Replication with MultiCore
Add the replication request handler to solrconfig.xml for each core (i.e. /usr/share/tomcat6/solr/CORENAME/conf/solrconfig.xml). You can use ${solr.core.name} to avoid hard coding the core name in your config.
Master configuration remains the same:
<requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="master"> <str name="replicateAfter">commit</str> <str name="confFiles">schema.xml,mapping-ISOLatin1Accent.txt,protwords.txt,stopwords.txt,synonyms.txt,elevate.xml</str> </lst> </requestHandler>
And slave just needs the core.name added.
<requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="slave"> <str name="masterUrl">http://${MASTER_CORE_URL}/${solr.core.name}/replication</str> <str name="pollInterval">${POLL_TIME}</str> </lst> </requestHandler>
Variables (i.e. $POLL_TIME $MASTER_CORE_URL) can be defined in a solrcore.properties file for each core.
Replication Dashboard
This shows the following information
- status of current replication
- percentage/size downloaded/to be downloaded
- Current file being downloaded
- Time taken/Time remaining
The following actions can be performed from the dashboard
- Enable/Disable polling
- Force start replication
- Abort an ongoing replication