This article discusses how the Microsoft Bot Framework SDK v4 and Microsoft ComputerVision SDK can be used together to create a fun bot which can analyze the image, generate a thumbnail and extract text from the printed and handwritten text on images (OCR). The bot discussed here performs following options.
↑Back To Top
Following tools and frameworks while developing the bot.
Some of the basic information which is required before the code is presented is discussed here.
Computer Vision API is a cognitive Service offered by Microsoft under their Azure umbrella. This API helps perform various tasks like analyzing the images, extracting text from handwritten and printed images, recognize important faces, provide the break up of images to extract out information based upon the co ordinates of various facial aspects, determine gender etc. More information about the Computer Vision API is available at Computer Vision API- Home
Following steps show how the computer Vision API can be subscribed to from the Azure Portal.
The Microsoft Bot Framework version 4 SDK is used to develop the the bot. This framework uses the ASP.NET core 2 and above and is a beautiful framework to develop the chat bots. The framework however does require some time and effort to learn and implement it, but once done, it is very easy to do so. Official documentation for the bot framework can be found at Azure Bot Service Documentation
Following is the flow of conversation that we want to achieve.
The Echo Bot template in Visual Studio can be used to create a skeleton bot for the project and then make the changes as required. The skeleton bot is nothing more than a ASP.NET core MVC based web application with classes attributed to the bot. Before the coding the Nuget packages for the Microsoft Bot Framework and Microsoft Azure Computer Vision API need to be added to the project. It can be done by adding following Package references to the .csproj file. And the solution will restore the Nuget packages.
<
ItemGroup
>
PackageReference
Include
=
"Microsoft.AspNetCore"
Version
"2.1.3"
/>
"Microsoft.AspNetCore.All"
"2.0.9"
"AsyncUsageAnalyzers"
"1.0.0-alpha003"
PrivateAssets
"all"
"Microsoft.Bot.Builder"
"4.0.8"
"Microsoft.Bot.Builder.Integration.AspNet.Core"
"4.0.6"
"Microsoft.Bot.Configuration"
"Microsoft.Bot.Connector"
"Microsoft.Bot.Schema"
"Microsoft.Bot.Builder.Dialogs"
"Microsoft.Extensions.Logging.AzureAppServices"
"2.1.1"
"Microsoft.AspNetCore.App"
"Microsoft.Azure.CognitiveServices.Vision.ComputerVision"
"3.2.0"
</
Next step is to set up the appsettings.json file to include the Computer Vision API end point and the subscription key. The appsettings.json file will look something like below.
{
"botFilePath"
:
"ImageProcessingBot.bot"
,
"botFileSecret"
""
"computerVisionKey"
"Enter Key Here"
"computerVisionEndpoint"
"https://southeastasia.api.cognitive.microsoft.com"
}
These values are required at the runtime to call the ComputerVision API through the SDK. These can be made available during the runtime by Injecting the IConfiguration in the ConfigureServices method of the Startup class. Example as below.
public
IConfiguration Configuration {
get
; }
services.AddSingleton<IConfiguration>(Configuration);
The code needs to store the command given by the user so that the image can be passed to particular functions. For this the bot Accessors class is used.
#region References
using
System;
System.Collections.Generic;
Microsoft.Bot.Builder;
Microsoft.Bot.Builder.Dialogs;
#endregion
namespace
ImageProcessingBot
class
ImageProcessingBotAccessors
ImageProcessingBotAccessors(ConversationState conversationState, UserState userState)
ConversationState = conversationState ??
throw
new
ArgumentNullException(nameof(ConversationState));
UserState = userState ??
ArgumentNullException(nameof(UserState));
static
readonly
string
CommandStateName = $
"{nameof(ImageProcessingBotAccessors)}.CommandState"
;
DialogStateName = $
"{nameof(ImageProcessingBotAccessors)}.DialogState"
IStatePropertyAccessor<
> CommandState {
set
IStatePropertyAccessor<DialogState> ConversationDialogState {
ConversationState ConversationState {
UserState UserState {
These accessors are added to the bot by injecting them at the runtime (using ConfigureServices method of the Startup.cs class)
services.AddSingleton<ImageProcessingBotAccessors>(sp =>
var options = sp.GetRequiredService<IOptions<BotFrameworkOptions>>().Value;
if
(options ==
null
)
InvalidOperationException(
"BotFrameworkOptions must be configured prior to setting up the state accessors"
);
var conversationState = options.State.OfType<ConversationState>().FirstOrDefault();
(conversationState ==
"ConversationState must be defined and added before adding conversation-scoped state accessors."
var userState = options.State.OfType<UserState>().FirstOrDefault();
(userState ==
"User State mjust be defined and added befor the conversation scoping"
// Create the custom state accessor.
// State accessors enable other components to read and write individual properties of state.
var accessors =
ImageProcessingBotAccessors(conversationState, userState)
ConversationDialogState = userState.CreateProperty<DialogState> (ImageProcessingBotAccessors.DialogStateName),
CommandState = userState.CreateProperty<
>(ImageProcessingBotAccessors.CommandStateName)
};
return
accessors;
});
A wrapper helper class is created to consume the Computer Vision API SDK in the bot. The SDK works on token based authorization to authenticate to the Computer Vision API and hence requires the API endpoint and the subscription key. These keys are available in the appsettings.json file and can be picked up during the run time as IConfiguration interface is added already as a singleton in previous step. The reference to the Computer Vision SDK can be made by including following namespaces in the class.
Microsoft.Azure.CognitiveServices.Vision.ComputerVision;
Microsoft.Azure.CognitiveServices.Vision.ComputerVision.Models;
Following are the properties and constructors to access the values from appsetting.json and the Computer Vision Client to communicate with the API.
ComputerVisionHelper
private
IConfiguration _configuration;
ComputerVisionClient _client;
List<VisualFeatureTypes> features =
List<VisualFeatureTypes>()
VisualFeatureTypes.Categories, VisualFeatureTypes.Description,
VisualFeatureTypes.Faces, VisualFeatureTypes.ImageType,
VisualFeatureTypes.Tags
ComputerVisionHelper(IConfiguration configuration)
_configuration = configuration ??
ArgumentNullException(nameof(configuration));
_client =
ComputerVisionClient(
ApiKeyServiceClientCredentials(_configuration[
].ToString()),
System.Net.Http.DelegatingHandler[] {}
_client.Endpoint = _configuration[
].ToString();
The functions to analyze the image, generate the thumbnail and extract the text from the images are as follows
async Task<ImageAnalysis> AnalyzeImageAsync(Stream image)
ImageAnalysis analysis = await _client.AnalyzeImageInStreamAsync(image, features);
analysis;
async Task<
> GenerateThumbnailAsync(Stream image)
Stream thumbnail = await _client.GenerateThumbnailInStreamAsync(100, 100, image, smartCropping:
true
byte
[] thumbnailArray;
(var ms =
MemoryStream())
thumbnail.CopyTo(ms);
thumbnailArray = ms.ToArray();
System.Convert.ToBase64String(thumbnailArray);
async Task<IList<Line>> ExtractTextAsync(Stream image, TextRecognitionMode recognitionMode)
RecognizeTextInStreamHeaders headers = await _client.RecognizeTextInStreamAsync(image, recognitionMode);
IList<Line> detectedLines = await GetTextAsync(_client, headers.OperationLocation);
detectedLines;
async Task<IList<Line>> GetTextAsync(ComputerVisionClient client,
operationLocation)
_client = client;
operationId = operationLocation.Substring(operationLocation.Length - 36);
TextOperationResult result = await _client.GetTextOperationResultAsync(operationId);
// Wait for the operation to complete
int
i = 0;
maxRetries = 5;
while
((result.Status == TextOperationStatusCodes.Running ||
result.Status == TextOperationStatusCodes.NotStarted) && i++ < maxRetries)
await Task.Delay(1000);
result = await _client.GetTextOperationResultAsync(operationId);
var lines = result.RecognitionResult.Lines;
lines;
These helper methods are called based upon the command given by the user.
The configuration and the bot accessors that were injected during the startup are accessible using the constructor for the bot code. It can be as follows.
ImageProcessingBot : IBot
ImageProcessingBotAccessors _accessors;
DialogSet _dialogs;
ImageProcessingBot(ImageProcessingBotAccessors accessors, IConfiguration configuration)
_accessors = accessors ??
ArgumentNullException(nameof(accessors));
_dialogs =
DialogSet(_accessors.ConversationDialogState);
To help the user to select the command a hero card is sent to them. It is created as follows.
async Task<Activity> CreateReplyAsync(ITurnContext context,
message)
var reply = context.Activity.CreateReply();
var card =
HeroCard()
Text = message,
Buttons =
List<CardAction>()
CardAction {Text =
"Process Image"
, Value =
"ProcessImage"
, Title =
, DisplayText =
, Type = ActionTypes.ImBack},
"Get Thumbnail"
"GetThumbnail"
"Extract Printed Text"
"printedtext"
"Extract Hand Written Text"
"handwrittentext"
, Type = ActionTypes.ImBack}
reply.Attachments =
List<Attachment>(){card.ToAttachment()};
reply;
The Bot Class implements the IBot interface and must implement the OnTurnAsync method. This method is invoked by the bot framework every time either a user or the bot has to send messages, events etc to each other. We send out the welcome message to the user using the ConversationUpdate event in our code as follows.
case
ActivityTypes.ConversationUpdate:
foreach
(var member
in
turnContext.Activity.MembersAdded)
(member.Id != turnContext.Activity.Recipient.Id)
reply = await CreateReplyAsync(turnContext,
"Welcome. Please select and operation"
await turnContext.SendActivityAsync(reply, cancellationToken:cancellationToken);
break
The logic to read the commands and ask the user to upload the images is as follows
ActivityTypes.Message:
attachmentCount = turnContext.Activity.Attachments !=
? turnContext.Activity.Attachments.Count() : 0;
var command = !
.IsNullOrEmpty(turnContext.Activity.Text) ? turnContext.Activity.Text : await _accessors.CommandState.GetAsync(turnContext, () =>
.Empty, cancellationToken);
command = command.ToLowerInvariant();
(attachmentCount == 0)
(
.IsNullOrEmpty(command))
"Please select operation before uploading the image"
else
await _accessors.CommandState.SetAsync(turnContext, turnContext.Activity.Text, cancellationToken);
await _accessors.UserState.SaveChangesAsync(turnContext, cancellationToken: cancellationToken);
await turnContext.SendActivityAsync(
"Please upload the image using upload button"
, cancellationToken: cancellationToken);
HttpClient client =
HttpClient();
Attachment attachment = turnContext.Activity.Attachments[0];
(attachment.ContentType ==
"image/jpeg"
|| attachment.ContentType ==
"image/png"
Stream image = await client.GetStreamAsync(attachment.ContentUrl);
(image !=
ComputerVisionHelper helper =
ComputerVisionHelper(_configuration);
IList<Line> detectedLines;
switch
(command)
"processimage"
ImageAnalysis analysis = await helper.AnalyzeImageAsync(image);
await turnContext.SendActivityAsync($
"I think the Image you uploaded is a {analysis.Tags[0].Name.ToUpperInvariant()} and it is {analysis.Description.Captions[0].Text.ToUpperInvariant()} "
, cancellationToken:cancellationToken);
"getthumbnail"
thumbnail = await helper.GenerateThumbnailAsync(image);
reply = turnContext.Activity.CreateReply();
reply.Text =
"Here is your thumbnail."
List<Attachment>()
Attachment()
ContentType =
Name=
"thumbnail.jpg"
ContentUrl =
.Format(
"data:image/jpeg;base64,{0}"
, thumbnail)
await turnContext.SendActivityAsync(reply, cancellationToken: cancellationToken);
detectedLines = await helper.ExtractTextAsync(image, TextRecognitionMode.Printed);
sb =
StringBuilder(
"I was able to extract following text. \n"
(Line line
detectedLines)
sb.AppendFormat(
"{0}.\n"
, line.Text);
await turnContext.SendActivityAsync(sb.ToString(), cancellationToken: cancellationToken);
await _accessors.CommandState.DeleteAsync(turnContext, cancellationToken: cancellationToken);
"Please select an operation and Upload the image"
//Clear out the command as the task for this command is finished.
"Incorrect Image. /n Please select an operation and Upload the image"
"Only Image Attachments(.jpeg or .png) are supported. /n Please select an operation and Upload the image"
Above bot code will work in tandem with the Computer Vision SDK helper class to process the images as per users direction.
Following are the test results for various operations.
Following screen shots show show the bot recognized the famous personalities and also deducted the type of image that was used.
Bot recognizes the portrait of Dr APJ Abdul Kalam (11th President of India)
Bot recognizes Daniel Craig (6th Actor To Play James Bond)
In case the person is not a famous, the bot will share the attributes of the person in the image. Sample as following.
This article successfully describes how to create a fun Image Processing bot using the Microsoft Bot Framework and Computer Vision SDK
A working sample of the code discussed in this article can be downloaded from the TechNet gallery from Image Processing Bot using MS Bot Framework and Computer Vision SDK . To work with the sample, following are the steps
Following articles on the TechNet wiki discuss the bot framework and can be referred for additional reading.
Microsoft Bot Framework is a beautiful framework which makes it easy to create chat bots. That being said, the bot framework requires a lot of initial learning and the developers need to invest quality time to understand the concept and nuances of the bot framework. It is highly suggested that the readers of this article familiarize themselves with the concepts. Following resources will prove invaluable in the learning process.
To learn more about adaptive card and to play around with them, following resources will prove useful.
To learn about how ASP.NET core uses the in built dependency injection, refer Dependency Injection in ASP.NET core
Following articles were referred while writing this article.