The latest version of Azure Search Client Library (version 0.6.5370.1398) supports the usage of Scoring Profiles. But what are scoring profiles anyway?
Scoring profiles are a way for you to configure how results are ranked, based on one or more custom-defined criteria. Fortunately, Azure Search supports a few scoring profiles configuration types, which means that you can define a quite complex algorithm based on which your results are ranked. Specifically, your results could be boosted by:
All these scoring profiles are also supported in the Azure Search Client Library.
One of the coolest things about Scoring Profiles is that they can define a multitude of functions based on which you can boost the results and, moreover, each function used when calculating the score can have a different booster.
The most common way you'd probably boost your results is by having specific keywords in specific fields. For example, if you're querying for football matches, match names which contain your keyword would probably be boosted compared to matches where only the description contains that keyword.
Using Azure Search Client Library, this is done by instantiating a Scoring object and specifying the weight of the fields.
Here's an example:
var scoringProfile1 = new Scoring("scoreByName", SearchableEvent.GetSearchableEventFields()){ FunctionAggregation = FunctionAggregationTypes.Sum};scoringProfile1.Text.Weights["name"] = 100;
var scoringProfile
1
= new Scoring(
"scoreByName"
, SearchableEvent.GetSearchableEventFields())
{
FunctionAggregation = FunctionAggregationTypes.Sum
};
scoringProfile
.Text.Weights[
"name"
] =
100
;
In this example, a new Scoring object is instantiated with the "scoreByName" name of the scoring profile and with a list of fields corresponding to the Index. The name is required for the scoring profile because it is going to be referenced when querying data by using its name.
Afterwards, a scoring profile weight is applied to the field named "name". This basically specifies that when this scoring profile is used when querying documents, documents containing the keyword in the name field will be boosted by 100 compared to documents which contain the keyword in other fields.
Another common scenario when using searching systems is to have newer documents boosted compared to stale documents. In other words, if a new document is added to the index, this specific document could be ranked higher. Considering our examples of football matches, freshness boosting is useful in two ways:
Using Azure Search Client Library, a freshness boosting is applied by adding a FreshnessFunction to the list of functions within a scoring profile. Considering the previous example, this is done like this:
var function1 = new FreshnessFunction(){ Boost = 20, BoostingDuration = new TimeSpan(0, 13, 15, 18), FieldName = "dateadded", Interpolation = InterpolationTypes.Logarithmic};scoringProfile1.Functions = new List() { function1 };
var function
= new FreshnessFunction()
Boost =
20
,
BoostingDuration = new TimeSpan(
0
13
15
18
),
FieldName =
"dateadded"
Interpolation = InterpolationTypes.Logarithmic
.Functions = new List() { function
In this example, a new FreshnessFunction is instantiated with the following properties: the boost applied to any search results that match the keywords is 20 and the boosting is applied to the field named "dateadded" but only for 13 hours, 15 minutes and 18 seconds (according to the BoostingDuration property) after the date and time value specified in the "dateadded" field.
Considering our football matches example, whenever a user might search for his favorite team's matches, matches which occur closer to his location could be boosted compared to matches which occur further away. This is also a particularly useful feature for mobile applications or location aware web applications.
Using Azure Search Client Library, a geolocation boosting can be applied after instantiating a DistanceFunction object. Here's an example:
var function2 = new DistanceFunction(){ Boost = 10, BoostingDistance = 150, ReferencePointParameter = "mylocation", FieldName = "geolocation", Interpolation = InterpolationTypes.Constant,};scoringProfile1.Functions = new List() { function2 };
2
= new DistanceFunction()
10
BoostingDistance =
150
ReferencePointParameter =
"mylocation"
"geolocation"
Interpolation = InterpolationTypes.Constant,
In the previous example, a DistanceFunction is used when calculating a query's results using the scoringProfile1 scoring profile. This function instructs the scoring calculator to boost results located within 150 km away from a location sent when querying the data through a parameter called "mylocation". Due to this function parameter, the DistanceFunction is a special function because it allows the dynamic calculation of search results based on user input other than keyword. The "geolocation" value of FieldName specifies that the field containing the location of the football match is called "geolocation". Keep in mind though, that this field must be of type GeographyPoint (Note: using Azure Search Client Library version 0.6.5370.1398, you can save location data using the GeographyPoint model class. This helps in saving geolocation data because it exposes Latitude and Longitude properties, thus saving you the trouble of serializing and deserializing geolocation data).
It's common for huge index repositories to boost search results based on a specific values within a range. For example, in a movie database, a movie rated higher by viewers would be boosted compared to poor movies (e.g. IMDB search results for "love" returns the 1969 movie called "Women in Love" - rated 7.8 by the time of this writing - on the 3rd position compared to the 2011 title named "Love Birds", rated only with a score of 5.9 and positioned at the end of the search results page).
Boosting results based on a specific value within a specific range is called magnitude boosting and this is done by using a MagnitudeFunction. Here's an example using Azure Search Client Library:
var function3 = new MagnitudeFunction(){ Boost = 1000, BoostingRangeStart = 9, BoostingRangeEnd = 10, ConstantBoostBeyondRange = false, FieldName = "rating", Interpolation = InterpolationTypes.Constant};scoringProfile1.Functions = new List() { function3 };
3
= new MagnitudeFunction()
1000
BoostingRangeStart =
9
BoostingRangeEnd =
ConstantBoostBeyondRange = false,
"rating"
Interpolation = InterpolationTypes.Constant
In this example, the magnitude function boosts document results where the field named "rating" contains a value within 9 and 10 with a booster of 1000.
Even though all the previous examples only instantiate the Function numerator with a single function, the Azure Search service allows you to use more (or even all) these functions simultaneously. Moreover, there's no restrain on using the same function type over and over again, as long as the field and/or dynamic parameters used within the function are different.
In order to use all these functions simultaneously, all you have to do is simply instantiate the Function numerator with all the functions, like this:
scoringProfile1.Functions = new List() { function1, function2, function3 };
, function
Keep in mind though that the booster applied to a field containing a keyword is not considered a function, due to a few reasons:
When you specify more than one function within a scoring profile, these function will be aggregated in order to get the final result score. By default, Azure Search aggregates the results by summing their initial result. However, you can instruct the score calculator to use other aggregation mechanisms:
If no scoring profile is used, Azure Search uses a model based on term frequency-inverse document frequency (tf-idf for short), which, according to Wikipedia, is 'a numerical statistic that is intended to reflect how important a word is to a document in a collection'. More specifically, Azure Search currently uses Lucene's implementation of an algebraic model called Vector Space Model.
In other words, they check how frequent a given word is across the index (global frequency) and within the field (local frequency) and thus determine how special a given word is. From this result, Azure Search derives a specific value.
The implications of this model are:
All the results of these calculations are then summed up to result into the score you get when you query for some specific document without using any scoring profiles.
Using Azure Search Client Library, when you query an index you simply have to specify a scoring profile's name in the QueryParameters object's property named ScoringProfile. If a scoring profile parameter is required, then you also have to send out a Dictionary<string,string> object, where the key will correspond to the parameters' names and the value will correspond to the parameters' values. Here's an example:
var scoringParams = new Dictionary<string, string>();scoringParams.Add("mylocation", "-122.3358423,47.6148481");var result = await _azureSearchService.Indexes[searchIndex].QueryAsync(new QueryParameters() { QueryText = searchText, ScoringProfile = searchScoringProfile, ScoringParameters = scoringParams });
var scoringParams = new Dictionary<string, string>();
scoringParams.Add(
"-122.3358423,47.6148481"
);
var result = await _azureSearchService.Indexes[searchIndex].QueryAsync(new QueryParameters()
QueryText = searchText,
ScoringProfile = searchScoringProfile,
ScoringParameters = scoringParams
});
Note: using Azure Search Client Library version 0.6.5370.1398, when you're sending out a geolocation value as a scoring parameter, keep in mind that:
As an additional note, also keep in mind that all scoring parameters defined within a scoring profile must be sent with the query when using that scoring profile. There's currently no way of specifying a default scoring parameter value.
This article is based on the original blog post from Sep 15th, 2014 on http://alexmang.ro. The blogpost is available at http://alexmang.ro/archives/1541.