How to process crawled data using an external SOAP service call out in content enrichment pipeline.
Introduction
SharePoint 2013 (RTM) search has revamped a lot from its predecessor SharePoint 2010 search. In SharePoint 2013 we can find "Fast Search" is integrated with the "SharePoint 2010" search platform to provide an excellent search experience with less hazards. It has the back bone of SharePoint 2010 Search along with the glamour of Fast Search.Background
Some days ago I was given a situation where the requirement is to manipulate crawled data before indexing happens. In SharePoint 2013 search we found a "Content Processing" pipeline, where there is a provision where you can invoke an external SOAP service (in my case I used WCF) to manipulate your raw crawled data and put it back to the content processing pipeline to be processed as per the defined steps.Using the code
I have a requirement to consider those records for which the "Location" property is marked as "Calcutta" and let them make searchable when someone puts a query against "Kolkata". Therefore my target was to add a value of "Kolkata" against the "Location" filed for those records (items) where I found the "Location" value is set as "Calcutta". I am giving below the core requirements for calling an external WCF from the Content Processing pipeline. Please follow the following figure to get an idea of the location where WCF (SOAP Web Service) can be called.The development steps are depicted below:
- Create a WCF application from VS 2012 and add a reference toMicrosoft.office.server.search.contentprocessingenrichment.dll which you can find in C:\\program files\Microsoft office servers\15.\Search\Application\External.
- Delete the default interface (e.g.,
IService1
). - Add the following references to the Service1.svc.cs file
- Microsoft.office.server.search.contentprocessingenrichment
- Microsoft.office.server.search.contentprocessingenrichment.PropertyTypes
- Inherit
Icontentporcessingenrichmentservice
in the Service1.svc.cs file - Implement the method
ProcessItem
. This is the method where you get the required properties for each item. - Following is the sample code for implementing
ProcessItem
. - Add following in
<system.servicemodel>
in the web.config file: - Host this ECF to IIS (Create a virtual directory. Map this to the physical path of the WCF application. Right click on the Virtual Directory and click on “Convert to Application”).
- Browse and get the URL for the hosted .svc file.
- Execute the following PowerShell script to map “Content Enrichment” to the hosted custom WCF.
- Run a full crawl on the content source.
private const string LocationProperty = "Location";
// Defines the error code for managed properties with an unexpected type.
private const int UnexpectedType = 1;
// Defines the error code for encountering unexpected exceptions.
private const int UnexpectedError = 2;
private readonly ProcessedItem processedItemHolder = new ProcessedItem
{
ItemProperties = new List<AbstractProperty>()
};
public ProcessedItem ProcessItem(Item item)
{
processedItemHolder.ErrorCode = 0;
processedItemHolder.ItemProperties.Clear();
var LocationProperty = item.ItemProperties.Where(p => p.Name == "Location").FirstOrDefault();
Property<List<string>> LocProp = LocationProperty as Property<List<string>>;
// WriteLog("previous step");
if (LocProp != null && LocProp.Value.Count > 0)
{
//WriteLog("second step");
string[] propValues = LocProp.Value.First().Split(';');
if (propValues.Length > 0)
{
strLocation = propValues[0];
}
// WriteLog("zipcode is " + strzipcode);
string locname = strLocation.Trim();
if(locname.ToUpper() == "CALCUTTA")
{
LocProp.Value.Add("KOLKATA");
processedItemHolder.ItemProperties.Add(LocProp);
}
}
}
<bindings>
<basicHttpBinding>
<!-- The service will accept a maximum blob of 8 MB. -->
<binding maxReceivedMessageSize = "8388608">
<readerQuotas maxDepth="32"
maxStringContentLength="2147483647"
maxArrayLength="2147483647"
maxBytesPerRead="2147483647"
maxNameTableCharCount="2147483647" />
<security mode="None" />
</binding>
</basicHttpBinding>
</bindings>
$ssa = Get-SPEnterpriseSearchServiceApplication
$config = New-SPEnterpriseSearchContentEnrichmentConfiguration
$config.Endpoint = http://Site_URL/<service name>.svc
$config.InputProperties = "Location"
$config.OutputProperties = "Location"
$config.SendRawData = $True
$config.MaxRawDataSize = 8192
Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication
$ssa –ContentEnrichmentConfiguration $config
Points of Interest
I have considered theLocation
property here which is a default managed property in SharePoint Search but if you want to process a very unique property name then you need to crawl your content source first, find out your property name in the crawled properties section, which might be of type "ows_<your custom field name>", and then create a managed property of your field name and map this managed property to the "ows_<your custom field name>" property to make it available in the content processing pipeline.This is the first part of the content enrichment call out, I will post more advanced aspects in my later articles.
Using Content Enrichment Web Service Callout in SharePoint 2013 Preview
SharePoint 2013 Preview release intoduced a new functionality called content enrichment web service callout. It provides the ability to inspect and manipulate managed property values for each item before it’s added to the search index. Prior to SharePoint 2013, the only way to accomplish something similar was in FAST Search for SharePoint by extending the item processing pipeline. Clients using SharePoint server search were out of luck as the functionality was not available to them.
The process of building and configuring a web service callout is relatively straight forward. These are the high-level steps to follow:
- Build a web service by implementing the IContentProcessingEnrichmentServiceinterface. Add logic to manipulate managed property values.
- Run PowerShell commands to configure the callout web service endpoint address, input and output managed properties, trigger condition and a few other things.
- Execute a full crawl.
In this blog post I’ll show an example of developing a web service that populates a new managed property value which is then used as a refiner on the search results page. Let’s say we have a number of project sites in SharePoint where each site contains information about a specific bike model.
Each bike model belongs to a product category such as Mountain Bikes, Road Bikes and Touring Bikes. We’d like to be able to refine search results by product category but unfortunately that metadata is not available in SharePoint at this point. What we are going to do next is create a new managed property called
ProductCategory
and build a web service to populate the managed property values based on our custom business logic. The ProductCategory
managed property can then be used as a refiner on the search results page.
To create the managed property, navigate to Central Administration > Search Service Application > Search Schema > New Managed Property.
- Property name: ProductCategory
- Type: Text
- Searchable: checked
- Queryable: checked
- Retrievable: checked
- Refinable: Yes – active
- Token Normalization: checked
In Visual Studio 2012, create the web service: New Project > Visual C# > WCF > WCF Service Application.
Delete the
Service1
created by default or rename it to EnrichmentService
. Delete the IService1
orIEnrichmentService
interface.
Add an assembly reference to C:\Program Files\Microsoft Office Servers\15.0\Search\Applications\External\microsoft.office.server.search.contentprocessingenrichment.dll.
Open
EnrichmentService.svc.cs
, add the following using
statements:using Microsoft.Office.Server.Search.ContentProcessingEnrichment; using Microsoft.Office.Server.Search.ContentProcessingEnrichment.PropertyTypes; |
Replace the class implementation:
public class EnrichmentService : IContentProcessingEnrichmentService { private Dictionary< string , string > productModels = new Dictionary< string , string >() { { "mountain-100" , "Mountain Bikes" }, { "mountain-500" , "Mountain Bikes" }, { "road-150" , "Road Bikes" }, { "road-450" , "Road Bikes" }, { "touring-1000" , "Touring Bikes" }, { "touring-2000" , "Touring Bikes" } }; public ProcessedItem ProcessItem(Item item) { ProcessedItem processedItem = new ProcessedItem(); processedItem.ItemProperties = new List<AbstractProperty>(); AbstractProperty pathProperty = item.ItemProperties.Where(p => p.Name == "Path" ).FirstOrDefault(); if (pathProperty != null ) { Property< string > pathProp = pathProperty as Property< string >; if (pathProp != null ) { foreach ( var productModel in productModels) { if (pathProp.Value.Contains(productModel.Key)) { Property< string > modelProp = new Property< string >() { Name = "ProductCategory" , Value = productModel.Value }; processedItem.ItemProperties.Add(modelProp); } } } } return processedItem; } } |
Now the web service is ready and the next step is to configure SharePoint to call the web service during the crawl. That is done using PowerShell. To minimize the performance impact of the web service callout, we only want it to be called under a certain condition – this condition is defined in the Trigger property. More information about the syntax can be found in the Trigger expression syntax article on MSDN. The expected input and output managed properties are configured via the InputProperties and OutputProperties. When debugging the web service, the DebugMode property value can be set to
$true
in which case SharePoint will ignore the InputProperties value and will send all available managed properties for each item to the service. Any managed property values returned by the web service in debug mode are ignored by SharePoint.$ssa = Get-SPEnterpriseSearchServiceApplication $config = New-SPEnterpriseSearchContentEnrichmentConfiguration $config .DebugMode = $false $config .FailureMode = "WARNING" $config .InputProperties = "Path" $config .OutputProperties = "ProductCategory" $config .SendRawData = $false Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication $ssa –ContentEnrichmentConfiguration $config |
Finally, launch the web EnrichmentService created earlier and start a new full crawl. Once the crawl is complete, the ProductCategory managed property should be populated and searchable:
The final step is to add a Product Category search refiner. Edit the search results page, edit theRefinement web part, click the Choose Refiners… button within the Properties for Search Refinement section, select the ProductCategory managed property in the Available refinerslist and press the Add > button. Move the ProductCategory to the top of the Selected refinerslist, then scroll down and set the Display name to Product Category and save your changes.
Run a search for “bike” and you should now be able to refine the search results by the product categories:
References:
- Custom content processing with the Content Enrichment web service callout
- Trigger expressions syntax in SharePoint 2013
Using Content Enrichment Web Service Callout in SharePoint 2013 Preview
SharePoint 2013 Preview release intoduced a new functionality called content enrichment web service callout. It provides the ability to inspect and manipulate managed property values for each item before it’s added to the search index. Prior to SharePoint 2013, the only way to accomplish something similar was in FAST Search for SharePoint by extending the item processing pipeline. Clients using SharePoint server search were out of luck as the functionality was not available to them.
The process of building and configuring a web service callout is relatively straight forward. These are the high-level steps to follow:
- Build a web service by implementing the IContentProcessingEnrichmentServiceinterface. Add logic to manipulate managed property values.
- Run PowerShell commands to configure the callout web service endpoint address, input and output managed properties, trigger condition and a few other things.
- Execute a full crawl.
In this blog post I’ll show an example of developing a web service that populates a new managed property value which is then used as a refiner on the search results page. Let’s say we have a number of project sites in SharePoint where each site contains information about a specific bike model.
Each bike model belongs to a product category such as Mountain Bikes, Road Bikes and Touring Bikes. We’d like to be able to refine search results by product category but unfortunately that metadata is not available in SharePoint at this point. What we are going to do next is create a new managed property called
ProductCategory
and build a web service to populate the managed property values based on our custom business logic. The ProductCategory
managed property can then be used as a refiner on the search results page.
To create the managed property, navigate to Central Administration > Search Service Application > Search Schema > New Managed Property.
- Property name: ProductCategory
- Type: Text
- Searchable: checked
- Queryable: checked
- Retrievable: checked
- Refinable: Yes – active
- Token Normalization: checked
In Visual Studio 2012, create the web service: New Project > Visual C# > WCF > WCF Service Application.
Delete the
Service1
created by default or rename it to EnrichmentService
. Delete the IService1
orIEnrichmentService
interface.
Add an assembly reference to C:\Program Files\Microsoft Office Servers\15.0\Search\Applications\External\microsoft.office.server.search.contentprocessingenrichment.dll.
Open
EnrichmentService.svc.cs
, add the following using
statements:using Microsoft.Office.Server.Search.ContentProcessingEnrichment; using Microsoft.Office.Server.Search.ContentProcessingEnrichment.PropertyTypes; |
Replace the class implementation:
public class EnrichmentService : IContentProcessingEnrichmentService { private Dictionary< string , string > productModels = new Dictionary< string , string >() { { "mountain-100" , "Mountain Bikes" }, { "mountain-500" , "Mountain Bikes" }, { "road-150" , "Road Bikes" }, { "road-450" , "Road Bikes" }, { "touring-1000" , "Touring Bikes" }, { "touring-2000" , "Touring Bikes" } }; public ProcessedItem ProcessItem(Item item) { ProcessedItem processedItem = new ProcessedItem(); processedItem.ItemProperties = new List<AbstractProperty>(); AbstractProperty pathProperty = item.ItemProperties.Where(p => p.Name == "Path" ).FirstOrDefault(); if (pathProperty != null ) { Property< string > pathProp = pathProperty as Property< string >; if (pathProp != null ) { foreach ( var productModel in productModels) { if (pathProp.Value.Contains(productModel.Key)) { Property< string > modelProp = new Property< string >() { Name = "ProductCategory" , Value = productModel.Value }; processedItem.ItemProperties.Add(modelProp); } } } } return processedItem; } } |
Now the web service is ready and the next step is to configure SharePoint to call the web service during the crawl. That is done using PowerShell. To minimize the performance impact of the web service callout, we only want it to be called under a certain condition – this condition is defined in the Trigger property. More information about the syntax can be found in the Trigger expression syntax article on MSDN. The expected input and output managed properties are configured via the InputProperties and OutputProperties. When debugging the web service, the DebugMode property value can be set to
$true
in which case SharePoint will ignore the InputProperties value and will send all available managed properties for each item to the service. Any managed property values returned by the web service in debug mode are ignored by SharePoint.$ssa = Get-SPEnterpriseSearchServiceApplication $config = New-SPEnterpriseSearchContentEnrichmentConfiguration $config .DebugMode = $false $config .FailureMode = "WARNING" $config .InputProperties = "Path" $config .OutputProperties = "ProductCategory" $config .SendRawData = $false Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication $ssa –ContentEnrichmentConfiguration $config |
Finally, launch the web EnrichmentService created earlier and start a new full crawl. Once the crawl is complete, the ProductCategory managed property should be populated and searchable:
The final step is to add a Product Category search refiner. Edit the search results page, edit theRefinement web part, click the Choose Refiners… button within the Properties for Search Refinement section, select the ProductCategory managed property in the Available refinerslist and press the Add > button. Move the ProductCategory to the top of the Selected refinerslist, then scroll down and set the Display name to Product Category and save your changes.
Run a search for “bike” and you should now be able to refine the search results by the product categories:
References:
- Custom content processing with the Content Enrichment web service callout
- Trigger expressions syntax in SharePoint 2013
Using Content Enrichment Web Service Callout in SharePoint 2013 Preview
SharePoint 2013 Preview release intoduced a new functionality called content enrichment web service callout. It provides the ability to inspect and manipulate managed property values for each item before it’s added to the search index. Prior to SharePoint 2013, the only way to accomplish something similar was in FAST Search for SharePoint by extending the item processing pipeline. Clients using SharePoint server search were out of luck as the functionality was not available to them.
The process of building and configuring a web service callout is relatively straight forward. These are the high-level steps to follow:
- Build a web service by implementing the IContentProcessingEnrichmentServiceinterface. Add logic to manipulate managed property values.
- Run PowerShell commands to configure the callout web service endpoint address, input and output managed properties, trigger condition and a few other things.
- Execute a full crawl.
In this blog post I’ll show an example of developing a web service that populates a new managed property value which is then used as a refiner on the search results page. Let’s say we have a number of project sites in SharePoint where each site contains information about a specific bike model.
Each bike model belongs to a product category such as Mountain Bikes, Road Bikes and Touring Bikes. We’d like to be able to refine search results by product category but unfortunately that metadata is not available in SharePoint at this point. What we are going to do next is create a new managed property called
ProductCategory
and build a web service to populate the managed property values based on our custom business logic. The ProductCategory
managed property can then be used as a refiner on the search results page.
To create the managed property, navigate to Central Administration > Search Service Application > Search Schema > New Managed Property.
- Property name: ProductCategory
- Type: Text
- Searchable: checked
- Queryable: checked
- Retrievable: checked
- Refinable: Yes – active
- Token Normalization: checked
In Visual Studio 2012, create the web service: New Project > Visual C# > WCF > WCF Service Application.
Delete the
Service1
created by default or rename it to EnrichmentService
. Delete the IService1
orIEnrichmentService
interface.
Add an assembly reference to C:\Program Files\Microsoft Office Servers\15.0\Search\Applications\External\microsoft.office.server.search.contentprocessingenrichment.dll.
Open
EnrichmentService.svc.cs
, add the following using
statements:using Microsoft.Office.Server.Search.ContentProcessingEnrichment; using Microsoft.Office.Server.Search.ContentProcessingEnrichment.PropertyTypes; |
Replace the class implementation:
public class EnrichmentService : IContentProcessingEnrichmentService { private Dictionary< string , string > productModels = new Dictionary< string , string >() { { "mountain-100" , "Mountain Bikes" }, { "mountain-500" , "Mountain Bikes" }, { "road-150" , "Road Bikes" }, { "road-450" , "Road Bikes" }, { "touring-1000" , "Touring Bikes" }, { "touring-2000" , "Touring Bikes" } }; public ProcessedItem ProcessItem(Item item) { ProcessedItem processedItem = new ProcessedItem(); processedItem.ItemProperties = new List<AbstractProperty>(); AbstractProperty pathProperty = item.ItemProperties.Where(p => p.Name == "Path" ).FirstOrDefault(); if (pathProperty != null ) { Property< string > pathProp = pathProperty as Property< string >; if (pathProp != null ) { foreach ( var productModel in productModels) { if (pathProp.Value.Contains(productModel.Key)) { Property< string > modelProp = new Property< string >() { Name = "ProductCategory" , Value = productModel.Value }; processedItem.ItemProperties.Add(modelProp); } } } } return processedItem; } } |
Now the web service is ready and the next step is to configure SharePoint to call the web service during the crawl. That is done using PowerShell. To minimize the performance impact of the web service callout, we only want it to be called under a certain condition – this condition is defined in the Trigger property. More information about the syntax can be found in the Trigger expression syntax article on MSDN. The expected input and output managed properties are configured via the InputProperties and OutputProperties. When debugging the web service, the DebugMode property value can be set to
$true
in which case SharePoint will ignore the InputProperties value and will send all available managed properties for each item to the service. Any managed property values returned by the web service in debug mode are ignored by SharePoint.$ssa = Get-SPEnterpriseSearchServiceApplication $config = New-SPEnterpriseSearchContentEnrichmentConfiguration $config .DebugMode = $false $config .FailureMode = "WARNING" $config .InputProperties = "Path" $config .OutputProperties = "ProductCategory" $config .SendRawData = $false Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication $ssa –ContentEnrichmentConfiguration $config |
Finally, launch the web EnrichmentService created earlier and start a new full crawl. Once the crawl is complete, the ProductCategory managed property should be populated and searchable:
The final step is to add a Product Category search refiner. Edit the search results page, edit theRefinement web part, click the Choose Refiners… button within the Properties for Search Refinement section, select the ProductCategory managed property in the Available refinerslist and press the Add > button. Move the ProductCategory to the top of the Selected refinerslist, then scroll down and set the Display name to Product Category and save your changes.
Run a search for “bike” and you should now be able to refine the search results by the product categories:
References:
- Custom content processing with the Content Enrichment web service callout
- Trigger expressions syntax in SharePoint 2013
Using Multiple Endpoints as a Content Enrichment Web Service in SharePoint 2013 Search
Introduction
SharePoint 2013 Search enables users to modify the managed properties of crawled items before they are indexed by calling out to an external content enrichment web service. The ability to modify managed properties for items during content processing is helpful when performing tasks such as data cleansing, entity extraction, classification and tagging.
The content processing component is designed over a fixed pipeline, which in turn is made of several processing stages arranged in sequence to perform distinct activities while processing a document for indexing.
While the content processing component provides several improvements over SharePoint 2010 enterprise search (not FAST Search for SharePoint 2010), it has introduced a bottleneck where custom processing is needed in the pipeline. For custom processing SharePoint 2013 has provided mechanism “Content Enrichment Web Service (CEWS) (shown as Web Service Callout in above diagram).” This is in principle a hook in the pipeline for an external WCF service.
The major two drawbacks of this process are:
- Whatever custom processing we need, must be performed within this single WCF service call. There is only one registration of a CEWS allowed per pipeline (and there is only one pipeline allowed per Search Service Application). This introduced a bottleneck where we have requirements for multiple external processing of documents passing through the pipeline.
- After we register the CEWS it will be applicable for all “Content Sources” in a specific Search Service Application. In practical scenario if we might have multiple Search Content Sources for a Search Service Application and have different requirements for each of the Content Sources there is no way to achieve this. This is explained below.
Content Source 1 ->required to process Managed Property values from Repository 1
Content Source 2 -> do not need to process any values from external repository
Content Source 3 -> required to process Managed Property values from Repository 2
Content Source 2 -> do not need to process any values from external repository
Content Source 3 -> required to process Managed Property values from Repository 2
There is no need for a Content Enrichment Web Service call for Content Source 2 and Content Source 1 and Content Source 2 needs to call two completely different repository to get the managed property values. Using a single Web Service call for both of them will not resolve the problem here. Registering a WCF Routing Service as CEWS can resolve this problem which we’ll discuss in below section.
Solution Overview
In our demo solution. There are 2 different Content Sources specific to a Search Service Application. The requirement is
- We need to generate the document preview using the Longitude Preview Service from BA-Insight and at the same time we’ll populate some managed property values coming from a SQL server database for one of the Content Sources. Let’s name it “Content Source CEWS Multiple”. The BA-Insight uses their own Content Enrichment Web Service to generate document preview.
- For another Content Source we need to generate the document preview only. Let’s name it “Content Source CEWS Single”. This only needs to call the BA-Insight preview generator Content Enrichment Service only.
Introducing WCF Workflow Service
The WCF Workflow service has the ability to call more than one WCF services. Instead of registering a simple WCF service as an endpoint for a Content Enrichment web service we can register a WCF workflow service as the endpoint and then call our custom WCF services from the Workflow Service. The WCF Workflow Service can call the BA-Insight preview generator service first to generate the preview. Then it’ll call our Custom WCF service which gets the values of the Managed Properties from SQL Server database. After getting the values, the Workflow Service will create the Output Properties and send it back to SharePoint Pipeline where SharePoint will populate the Managed Property values. But this will be applicable for both of the Content Sources which is not desired as mentioned earlier. The second Content Source “Content Source CEWS Single” needs to call the BA-Insight preview service only to generate the document previews. To resolve this we need the help of WCF Routing service which is described below.
WCF Routing Service
WCF 4.0 introduces a new service called the Routing Service. The purpose of routing service is to pick up the request from client and based on the routing logic direct the request to proper endpoints or downstream services. These Downstream services may be hosted on the same machine or distributed across several machines in a server farm. So instead of registering the WCF Workflow Service as the endpoint in our CEWS we need to register the WCF Routing Service as the endpoint for our CEWS. The SharePoint pipeline will call the WCF Routing Service during crawl with some Input and Output properties. Based on the Input property parameter the routing service will then redirect the request to either the WCF Workflow Service or the BA-Insight Preview Service. To understand in details we need to discuss some of the details on SharePoint Content Enrichment Service.
SharePoint Content Enrichment Web Service Components
Following are some key components of the Content Enrichment Web Service Parameters which can be defined during the registration of the Service.
1. InputProperties: The InputProperties parameter specifies the managed properties sent to the service.
2. OutputProperties: The OutputProperties specifies the managed properties returned by the service
2. OutputProperties: The OutputProperties specifies the managed properties returned by the service
Note, that both are case sensitive. All managed properties referenced need to be created in advance.
3. Trigger: A trigger condition that represents a predicate to execute for every item being processed. If a trigger condition is used, the external web service is called only when the trigger evaluates to true. If no trigger condition is used, all items are sent to the external web service.
4. SendRawData: A SendRawData switch that sends the raw data of an item in binary form. This is useful when more metadata is required than what can be retrieved from the parsed version of the item. In our case we need to set it to true since the BA
5. TimeOut: The amount of time until the web service times out in milliseconds. Valid range 100 – 30000. In our case we’ll set it to a higher value since we are using multiple services at at some point it’ll be heavily loaded.
The detailed of configuration options and Content Enrichment Web Service can be found from MSDN. Following is a sample of PowerShell script to deploy the CEWS.
$ssa = Get-SPEnterpriseSearchServiceApplication
$config = New-SPEnterpriseSearchContentEnrichmentConfiguration
$config.Endpoint = http://Site_URL/<service name>.svc
$config.InputProperties = “OriginalPath,Body”
$config.OutputProperties = “OpProp1,OpProp2,OpProp3,OpProp4″
$config.SendRawData = $True
$config.MaxRawDataSize = 8192
$config.TimeOut = 10000
Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication
$ssa –ContentEnrichmentConfiguration $config
Putting It All Together
Schematic flow diagram for Overall Search Enrichment Process
Above we explained the entire logic of the Search Enrichment process through a schematic diagram.
The WCF routing service is configured as the endpoint of the Content Enrichment configuration. Only the contents in the “Content Source CEWS Multiple” Content Source needs to be updated with the managed property values from the SQL Server database, and therefore it behooves us to only forward our content processing request to the WCF Workflow Service when the document being crawled exists in the aforementioned Content Source. For documents in the other content sources, we only need to generate the document preview from BA-Insight. Therefore, instead of routing the request to the WCF Workflow Service, we are simply sending the request to the BA- Insight Longitude Preview Generation Service.
The routing service routes the request based on the routing filter. In this case the filter is configured on basis of the managed property named “ContentSource” and the value of the ContentSource. This concept can be implemented if there are more Content Sources and needs different repositories to populate the managed property values. The only thing needs to remember that the code needs to be very efficient as there are some very heavy processing involved during the processing of the documents and the SharePoint Search (noderunner.exe) Service itself is very memory hungry.
-----------------------------------------------------------------------------------
In this article we can explore an Advanced Search feature of SharePoint 2013.
Content Enrichment Web Service
SharePoint 2013 allows developers to add a custom step between content processing. A SharePoint content source crawl will call our custom web service. We can create or change the metadata properties associated with the crawl item.
Under the hood, the web service is actually implementing a pre-defined interface. Each crawl item will be passed to the interface method where the item URL and properties can be accessed.
Scenarios
We can use a Content Enrichment Web Service for the following scenarios:
SharePoint 2013 allows developers to add a custom step between content processing. A SharePoint content source crawl will call our custom web service. We can create or change the metadata properties associated with the crawl item.
Under the hood, the web service is actually implementing a pre-defined interface. Each crawl item will be passed to the interface method where the item URL and properties can be accessed.
Scenarios
We can use a Content Enrichment Web Service for the following scenarios:
- You have multiple web content sources configured. You need to create new Managed Properties based on each URL. Example: Blog, Website and so on.
- You have a content source with limited metadata information. You need to add a new Managed Properties based on the content.
- You have a content source with metadata information. You need to correct the Managed Properties.
- You have a SharePoint list where an Age column exists. You need to classify the content as Minor and Major using Managed Properties.
Infrastructure
The following is the Infrastructure needed:
The following is the Infrastructure needed:
- Visual Studio 2012
- IIS website to host the service
- WCF service
Practical
To perform the Practical, we need to set up the following.
- SharePoint 2013 Enterprise site
- Create Enterprise Search Center site
- Central Administration > Search > Create Content Source > Add jeanpaulva
- Central Administration > Search > Create Content Source > Add sharepointcto
- Central Administration > Create Crawl Rules > Include query strings (?) for both URLs
- Server > IE Enhanced Security Configuration > Off
- Perform Full Crawl for both content sources
After the preceding steps, try searching for sharepointcto from the Enterprise Search Center site and you should be able to see results from both the URLs as shown below.
Our purpose is to provide refiners as in the following.
Steps
Step 1: Create Service
Create a new WCF service application project.
Delete the existing Service files and create a new Service.
You can delete the interface file as well. The Solution Explorer looks as in the following.
Add Reference to the following assembly:
15HIVE\Search\Applications\External\Microsoft.Office.Server.Search.ContentProcessingEnrichment.dll
Step 2: Implement Interface
Open the SVC file and implement a pre-defined interface IContentProcessingEnrichmentService as shown below.
The only method named ProcessItem passes an Item argument and returns a ProcessedItem argument.
The interface IContentProcessingEnrichmentService belongs to the ContentProcessingEnrichment assembly.
The Item type has a property named ItemProperties that contains pre-defined Name-Value pairs of:
ContentType | Access the Content Type of the item |
Name | Access the Name of the item |
Title | Access the Title of the item |
Path | Access the Path of the item |
ContentSource | Access the ContentSource name |
We can specify the input and output properties when registering the service. In our case we are using the Path property.
Step 3: Create Method
Replace the method with the following code. Right-click and Resolve type name errors.
public ProcessedItem ProcessItem(Item item)
{
ProcessedItem processedItem = new ProcessedItem();
processedItem.ItemProperties = new List<AbstractProperty>();
var p = item.ItemProperties.Where(n => n.Name == "Path").First();
string url = p.ObjectValue.ToString();
if (!string.IsNullOrEmpty(url))
if (url.StartsWith("http://www.sharepointcto.com"))
{
Property<string> property = new Property<string>()
{
Name = "WebType",
Value = "Site"
};
processedItem.ItemProperties.Add(property);
}
else if (url.StartsWith("http://www.jeanpaulva.com"))
{
Property<string> property = new Property<string>()
{
Name = "WebType",
Value = "Blog"
};
processedItem.ItemProperties.Add(property);
}
return processedItem;
}
The preceding code performs the following:
- Get the Path property value
- If the Path value is of www.sharepointcto.com then create a Managed Property WebType and set the value to Site
- If a Path value is of www.jeanpaulva.com then create a Managed Property WebType and set the value toBlog
- The processedItem object will be returned by the method
Step 4: Run the Service
Save and Run the Service.
Step 5: Create Managed Property
Open Central Administration > Service Applications > Search Service Application > Search Schema > New Managed Property page. Enter the following details.
- Name as WebType
- Type as Text
- Searchable enable
- Queryable enable
- Refinable yes-active
Step 6: Register Service using PowerShell
Open the PowerShell ISE editor and copy and paste the following code.
if ((Get-PSSnapin "Microsoft.SharePoint.PowerShell" -ErrorAction SilentlyContinue) -eq $null)
{
Add-PSSnapin "Microsoft.SharePoint.PowerShell"
}
{
Add-PSSnapin "Microsoft.SharePoint.PowerShell"
}
# Get SearchServiceApplication
$ssa = Get-SPEnterpriseSearchServiceApplication
Remove-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication $ssa
$c = New-SPEnterpriseSearchContentEnrichmentConfiguration
$c.Endpoint = "http://localhost:52156/WebsiteEnrichmentService.svc"
$c.DebugMode = $false
$c.SendRawData = $false
$c.InputProperties = "Path"
$c.OutputProperties = "WebType"
Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication $ssa –ContentEnrichmentConfiguration $c
Remove-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication $ssa
$c = New-SPEnterpriseSearchContentEnrichmentConfiguration
$c.Endpoint = "http://localhost:52156/WebsiteEnrichmentService.svc"
$c.DebugMode = $false
$c.SendRawData = $false
$c.InputProperties = "Path"
$c.OutputProperties = "WebType"
Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication $ssa –ContentEnrichmentConfiguration $c
Please remember to use the correct URL for the service.
Save and Run. If no errors, your service is installed successfully.
Step 7: Perform Full Crawl
Open Central Administration and Perform a Full Crawl for the following content sources.
- Web content source for sharepointcto
- Web content source for jeanpaulva
Step 8: Add Refiners to the Result Page
Open the Enterprise Search Center result page, bring the page to edit mode, edit the refiner web part and add the WebType refiner. You can view the References section on how to add Search Refiners.
Step 9: Test Search
Open the Enterprise Search Center Site and try searching for content. You can see the Web Type refiner appearing and try clicking on the values.
If the result is filtered based on Blog or Site values then it means the Refiners are working correctly.
Since the Content Enrichment Web Service will be called for all the Content Sources, performance can become sluggish since there is a delay involved in the service invocation. It is recommended that we deploy the service closer to the SharePoint system. Alternatively, one can try Asynchronous mode too.
We can also debug the service. Ensure the Service is running and set breakpoints.
References
Summary
In this article we have explored Content Enrichment Web Service. The source code and script is attached for download.
In this article we have explored Content Enrichment Web Service. The source code and script is attached for download.
ReplyDeleteThanks for sharing Good Information
.Net Online Training Bangalore
Your work is very good and I appreciate you and hopping for some more informative posts
ReplyDeleteDot Net Training in Chennai | Dot Net Training in anna nagar | Dot Net Training in omr | Dot Net Training in porur | Dot Net Training in tambaram | Dot Net Training in velachery