【solr源码解析】solr7的主要改变

内容目录

https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-7.html

Major Changes in Solr 7

Solr 7 is a major new release of Solr which introduces new features and a number of other changes that may impact your existing installation.

Upgrade Planning

There are major changes in Solr 7 to consider before starting to migrate your configurations and indexes. This page is designed to highlight the biggest changes - new features you may want to be aware of, but also changes in default behavior and deprecated features that have been removed.

There are many hundreds of changes in Solr 7, however, so a thorough review of the Solr Upgrade Notes as well as the CHANGES.txt file in your Solr instance will help you plan your migration to Solr 7. This section attempts to highlight some of the major changes you should be aware of.

You should also consider all changes that have been made to Solr in any version you have not upgraded to already. For example, if you are currently using Solr 6.2, you should review changes made in all subsequent 6.x releases in addition to changes for 7.0.

Reindexing your data is considered the best practice and you should try to do so if possible. However, if reindexing is not feasible, keep in mind you can only upgrade one major version at a time. Thus, Solr 6.x indexes will be compatible with Solr 7 but Solr 5.x indexes will not be.

If you do not reindex now, keep in mind that you will need to either reindex your data or upgrade your indexes before you will be able to move to Solr 8 when it is released in the future. See the section IndexUpgraderTool for more details on how to upgrade your indexes.

See also the section Upgrading a Solr Cluster for details on how to upgrade a SolrCloud cluster.

New Features & Enhancements

Replication Modes

Until Solr 7, the SolrCloud model for replicas has been to allow any replica to become a leader when a leader is lost. This is highly effective for most users, providing reliable failover in case of issues in the cluster. However, it comes at a cost in large clusters because all replicas must be in sync at all times.

To provide additional flexibility, two new types of replicas have been added, named TLOG & PULL. These new types provide options to have replicas which only sync with the leader by copying index segments from the leader. The TLOG type has an additional benefit of maintaining a transaction log (the "tlog" of its name), which would allow it to recover and become a leader if necessary; the PULL type does not maintain a transaction log, so cannot become a leader.

As part of this change, the traditional type of replica is now named NRT. If you do not explicitly define a number of TLOG or PULL replicas, Solr defaults to creating NRT replicas. If this model is working for you, you will not have to change anything.

See the section Types of Replicas for more details on the new replica modes, and how define the replica type in your cluster.

Autoscaling

Solr autoscaling is a new suite of features in Solr to make managing a SolrCloud cluster easier and more automated.

At its core, Solr autoscaling provides users with a rule syntax to define preferences and policies for how to distribute nodes and shards in a cluster, with the goal of maintaining a balance in the cluster. As of Solr 7, Solr will take any policy or preference rules into account when determining where to place new shards and replicas created or moved with various Collections API commands.

Other Features & Enhancements

  • The Analytics Component has been refactored.

  • There were several other new features released in earlier 6.x releases, which you may have missed:

Configuration and Default Changes

New Default Configset

Several changes have been made to configsets that ship with Solr; not only their content but how Solr behaves in regard to them:

  • The data_driven_configset and basic_configset have been removed, and replaced by the _default configset. The sample_techproducts_configset also remains, and is designed for use with the example documents shipped with Solr in the example/exampledocs directory.

  • When creating a new collection, if you do not specify a configset, the _default will be used.

  • If you use SolrCloud, the _default configset will be automatically uploaded to ZooKeeper.

  • If you run a user-managed cluster or a single-node installation, the instanceDir will be created automatically, using the _default configset as its basis.

Schemaless Improvements

To improve the functionality of Schemaless Mode, Solr now behaves differently when it detects that data in an incoming field should have a text-based field type.

  • Incoming fields will be indexed as text_general by default (you can change this). The name of the field will be the same as the field name defined in the document.

  • A copy field rule will be inserted into your schema to copy the new text_general field to a new field with the name <name>_str. This field’s type will be a strings field (to allow for multiple values). The first 256 characters of the text field will be inserted to the new strings field.

This behavior can be customized if you wish to remove the copy field rule, or to change the number of characters inserted to the string field, or the field type used. See the section Schemaless Mode for details.

Because copy field rules can slow indexing and increase index size, it’s recommended you only use copy fields when you need to. If you do not need to sort or facet on a field, you should remove the automatically-generated copy field rule.

Automatic field creation can be disabled with the update.autoCreateFields property. To do this, you can use the Config API with a command such as:

V1 API

curl http://host:8983/solr/mycollection/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'

V2 API

curl http://host:8983/api/collections/mycollection/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'

Changes to Default Behaviors

  • JSON is now the default response format. If you rely on XML responses, you must now define wt=xml in your request. In addition, line indentation is enabled by default (indent=on).

  • The sow parameter (short for "Split on Whitespace") now defaults to false, which allows support for multi-word synonyms out of the box. This parameter is used with the eDisMax and standard/"lucene" query parsers. If this parameter is not explicitly specified as true, query text will not be split on whitespace before analysis.

  • The legacyCloud parameter now defaults to false. If an entry for a replica does not exist in state.json, that replica will not get registered.

    This may affect users who bring up replicas and they are automatically registered as a part of a shard. It is possible to fall back to the old behavior by setting the property legacyCloud=true, in the cluster properties using the following command:

    ./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:2181 -cmd clusterprop -name legacyCloud -val true

  • The eDisMax query parser parameter lowercaseOperators now defaults to false if the luceneMatchVersion in solrconfig.xml is 7.0.0 or above. Behavior for luceneMatchVersion lower than 7.0.0 is unchanged (so, true). This means that clients must sent boolean operators (such as AND, OR and NOT) in upper case in order to be recognized, or you must explicitly set this parameter to true.

  • The handleSelect parameter in solrconfig.xml now defaults to false if the luceneMatchVersion is 7.0.0 or above. This causes Solr to ignore the qt parameter if it is present in a request. If you have request handlers without a leading '/', you can set handleSelect="true" or consider migrating your configuration.

    The qt parameter is still used as a SolrJ special parameter that specifies the request handler (tail URL path) to use.

  • The lucenePlusSort query parser (aka the "Old Lucene Query Parser") has been deprecated and is no longer implicitly defined. If you wish to continue using this parser until Solr 8 (when it will be removed), you must register it in your solrconfig.xml, as in: <queryParser name="lucenePlusSort" class="solr.OldLuceneQParserPlugin"/>.

  • The name of TemplateUpdateRequestProcessorFactory is changed to template from Template and the name of AtomicUpdateProcessorFactory is changed to atomic from Atomic

  • Also, TemplateUpdateRequestProcessorFactory now uses {} instead of ${} for template.

Deprecations and Removed Features

Point Fields Are Default Numeric Types

Solr has implemented *PointField types across the board, to replace Trie* based numeric fields. All Trie* fields are now considered deprecated, and will be removed in Solr 8.

If you are using Trie* fields in your schema, you should consider moving to PointFields as soon as feasible. Changing to the new PointField types will require you to reindex your data.

Spatial Fields

The following spatial-related fields have been deprecated:

  • LatLonType

  • GeoHashField

  • SpatialVectorFieldType

  • SpatialTermQueryPrefixTreeFieldType

Choose one of these field types instead:

  • LatLonPointSpatialField

  • SpatialRecursivePrefixTreeField

  • RptWithGeometrySpatialField

See the section Spatial Search for more information.

JMX Support and MBeans

  • The <jmx> element in solrconfig.xml has been removed in favor of <metrics><reporter> elements defined in solr.xml.

    Limited back-compatibility is offered by automatically adding a default instance of SolrJmxReporter if it’s missing AND when a local MBean server is found. A local MBean server can be activated either via ENABLE_REMOTE_JMX_OPTS in solr.in.sh or via system properties, e.g., -Dcom.sun.management.jmxremote. This default instance exports all Solr metrics from all registries as hierarchical MBeans.

    This behavior can be also disabled by specifying a SolrJmxReporter configuration with a boolean init argument enabled set to false. For a more fine-grained control users should explicitly specify at least one SolrJmxReporter configuration.

    See also the section The <metrics><reporters> Element, which describes how to set up Metrics Reporters in solr.xml. Note that back-compatibility support may be removed in Solr 8.

  • MBean names and attributes now follow the hierarchical names used in metrics. This is reflected also in /admin/mbeans and /admin/plugins output, and can be observed in the UI Plugins tab, because now all these APIs get their data from the metrics API. The old (mostly flat) JMX view has been removed.

SolrJ

The following changes were made in SolrJ.

  • HttpClientInterceptorPlugin is now HttpClientBuilderPlugin and must work with a SolrHttpClientBuilder rather than an HttpClientConfigurer.

  • HttpClientUtil now allows configuring HttpClient instances via SolrHttpClientBuilder rather than an HttpClientConfigurer. Use of env variable SOLR_AUTHENTICATION_CLIENT_CONFIGURER no longer works, please use SOLR_AUTHENTICATION_CLIENT_BUILDER

  • SolrClient implementations now use their own internal configuration for socket timeouts, connect timeouts, and allowing redirects rather than what is set as the default when building the HttpClient instance. Use the appropriate setters on the SolrClient instance.

  • HttpSolrClient#setAllowCompression has been removed and compression must be enabled as a constructor parameter.

  • HttpSolrClient#setDefaultMaxConnectionsPerHost and HttpSolrClient#setMaxTotalConnections have been removed. These now default very high and can only be changed via parameter when creating an HttpClient instance.

Other Deprecations and Removals

  • The defaultOperator parameter in the schema is no longer supported. Use the q.op parameter instead. This option had been deprecated for several releases. See the section Standard Query Parser Parameters for more information.

  • The defaultSearchField parameter in the schema is no longer supported. Use the df parameter instead. This option had been deprecated for several releases. See the section Standard Query Parser Parameters for more information.

  • The mergePolicymergeFactor and maxMergeDocs parameters have been removed and are no longer supported. You should define a mergePolicyFactory instead. See the section mergePolicyFactory for more information.

  • The PostingsSolrHighlighter has been deprecated. It’s recommended that you move to using the UnifiedHighlighter instead. See the section Unified Highlighter for more information about this highlighter.

  • Index-time boosts have been removed from Lucene, and are no longer available from Solr. If any boosts are provided, they will be ignored by the indexing chain. As a replacement, index-time scoring factors should be indexed in a separate field and combined with the query score using a function query. See the section Function Queries for more information.

  • The StandardRequestHandler is deprecated. Use SearchHandler instead.

  • To improve parameter consistency in the Collections API, the parameter names fromNode for the MOVEREPLICA command and sourcetarget for the REPLACENODE command have been deprecated and replaced with sourceNode and targetNode instead. The old names will continue to work for back-compatibility but they will be removed in Solr 8.

  • The unused valType option has been removed from ExternalFileField, if you have this in your schema you can safely remove it.

Major Changes in Earlier 6.x Versions

The following summary of changes in earlier 6.x releases highlights significant changes released between Solr 6.0 and 6.6 that were listed in earlier versions of this Guide. Mentions of deprecations are likely superseded by removal in Solr 7, as noted in the above sections.

Note again that this is not a complete list of all changes that may impact your installation, so a thorough review of CHANGES.txt is highly recommended if upgrading from any version earlier than 6.6.

  • The Solr contribs map-reducemorphlines-core and morphlines-cell have been removed.

  • JSON Facet API now uses hyper-log-log for numBuckets cardinality calculation and calculates cardinality before filtering buckets by any mincount greater than 1.

  • If you use historical dates, specifically on or before the year 1582, you should reindex for better date handling.

  • If you use the JSON Facet API (json.facet) with method=stream, you must now set sort='index asc' to get the streaming behavior; otherwise it won’t stream. Reminder: method is a hint that doesn’t change defaults of other parameters.

  • If you use the JSON Facet API (json.facet) to facet on a numeric field and if you use mincount=0 or if you set the prefix, you will now get an error as these options are incompatible with numeric faceting.

  • Solr’s logging verbosity at the INFO level has been greatly reduced, and you may need to update the log configs to use the DEBUG level to see all the logging messages you used to see at INFO level before.

  • We are no longer backing up solr.log and solr_gc.log files in date-stamped copies forever. If you relied on the solr_log_<date> or solr_gc_log_<date> being in the logs folder that will no longer be the case. See the section Configuring Logging for details on how log rotation works as of Solr 6.3.

  • The create/deleteCollection methods on MiniSolrCloudCluster have been deprecated. Clients should instead use the CollectionAdminRequest API. In addition, MiniSolrCloudCluster#uploadConfigDir(File, String) has been deprecated in favour of #uploadConfigSet(Path, String).

  • The bin/solr.in.sh (bin/solr.in.cmd on Windows) is now completely commented by default. Previously, this wasn’t so, which had the effect of masking existing environment variables.

  • The _version_ field is no longer indexed and is now defined with indexed=false by default, because the field has DocValues enabled.

  • The /export handler has been changed so it no longer returns zero (0) for numeric fields that are not in the original document. One consequence of this change is that you must be aware that some tuples will not have values if there were none in the original document.

  • Metrics-related classes in org.apache.solr.util.stats have been removed in favor of the Dropwizard metrics library. Any custom plugins using these classes should be changed to use the equivalent classes from the metrics library. As part of this, the following changes were made to the output of Overseer Status API:

  • The "totalTime" metric has been removed because it is no longer supported.

  • The metrics "75thPctlRequestTime", "95thPctlRequestTime", "99thPctlRequestTime" and "999thPctlRequestTime" in Overseer Status API have been renamed to "75thPcRequestTime", "95thPcRequestTime" and so on for consistency with stats output in other parts of Solr.

  • The metrics "avgRequestsPerMinute", "5minRateRequestsPerMinute" and "15minRateRequestsPerMinute" have been replaced by corresponding per-second rates viz. "avgRequestsPerSecond", "5minRateRequestsPerSecond" and "15minRateRequestsPerSecond" for consistency with stats output in other parts of Solr.

  • A new highlighter named UnifiedHighlighter has been added. You are encouraged to try out the UnifiedHighlighter by setting hl.method=unified and report feedback. It’s more efficient/faster than the other highlighters, especially compared to the original Highlighter. See HighlightParams.java for a listing of highlight parameters annotated with which highlighters use them. hl.useFastVectorHighlighter is now considered deprecated in lieu of hl.method=fastVector.

  • The maxWarmingSearchers parameter now defaults to 1, and more importantly commits will now block if this limit is exceeded instead of throwing an exception (a good thing). Consequently there is no longer a risk in overlapping commits. Nonetheless users should continue to avoid excessive committing. Users are advised to remove any pre-existing maxWarmingSearchers entries from their solrconfig.xml files.

  • The Complex Phrase query parser now supports leading wildcards. Beware of its possible heaviness, users are encouraged to use ReversedWildcardFilter in index time analysis.

  • The JMX metric "avgTimePerRequest" (and the corresponding metric in the metrics API for each handler) used to be a simple non-decaying average based on total cumulative time and the number of requests. The Codahale Metrics implementation applies exponential decay to this value, which heavily biases the average towards the last 5 minutes.

  • Parallel SQL now uses Apache Calcite as its SQL framework. As part of this change the default aggregation mode has been changed to facet rather than map_reduce. There have also been changes to the SQL aggregate response and some SQL syntax changes. Consult the SQL Query Language documentation for full details.

发表回复