{"id":15600,"date":"2024-08-09T00:00:00","date_gmt":"2024-08-09T07:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/ise\/?p=15600"},"modified":"2024-08-09T06:57:58","modified_gmt":"2024-08-09T13:57:58","slug":"ai-ad-image-differential","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/ai-ad-image-differential\/","title":{"rendered":"Measuring Differentials of Product Images in AI-generated Ads"},"content":{"rendered":"<h2>Introduction<\/h2>\n<p>During one of our recent projects, we were working with an advertising customer. They were aiming for 1:1 ad personalization, where each content piece is tailored to each unique customer, but that&#8217;s done in a repeatable and measurable way.<\/p>\n<p>Our defined experiments aimed to prove that we can create AI-generated ads that contain real images of products, without editing the supplied image of the product so that it is properly represented.<\/p>\n<p>The following hypothesis is considered a crucial prerequisite in order to be able to reach their goal around effectively scaling image personalization, while maintaining the integrity of the item being advertised.<\/p>\n<p><em>\u201cGiven an image of a product and a text description of its surroundings, we can generate a high-fidelity image with visual elements representing the described environment that includes an unmodified version of the supplied product&#8217;s image\u201d<\/em><\/p>\n<p>The most promising technology that can accomplish this AI-generated background is \u2018inpainting\u2019, a feature of multi-modal text-to-image models. With inpainting, the inputs are:<\/p>\n<ol>\n<li>An image of the product<\/li>\n<li>A mask representing the area of the background to be generated<\/li>\n<li>A textual prompt describing the background to be generated.<\/li>\n<\/ol>\n<p>The output is an image where the masked area has been generated as a background to the product, according to the textual prompt. The product content may have been slightly modified to fit into the background, especially if the masked areas are close to the product content.<\/p>\n<p>Since the output of inpainting cannot guarantee that the product was not modified, we needed a way to measure if the generated images altered the product.<\/p>\n<h2>Experiments<\/h2>\n<p>Three measurements techniques were used in determining the delta between the base product image and the generated ad image:<\/p>\n<h2>Mean Squared Error (MSE)<\/h2>\n<p>MSE measures the average squared difference between the estimated values (predicted values) and the actual values (ground truth). We calculate squared differences pixel by pixel. But this works well only if we want to generate an image with the best pixel colors that conform with the ground truth image.<\/p>\n<p><!--- cspell:disable --><\/p>\n<pre><code class=\"language-python\">import numpy as np  \r\n\r\nproduct_image_pixels = [[1, 2, 3, 4],[5, 6, 7, 8]]  \r\ngenerated_image_pixels = [[1, 1, 3, 5],[5, 6, 7, 9]] \r\n\r\n# Mean Squared Error  \r\nmse = np.square(np.subtract(product_image_pixels, generated_image_pixels)).mean()   <\/code><\/pre>\n<p><!--- cspell:enable --><\/p>\n<h2>Peak Signal to Noise Ratio (PSNR)<\/h2>\n<p>To use this estimator, we must transform all values of pixel representation to bit form. If we have 8-bit pixels, then the values of the pixel channels must be from 0 to 255. The red, green, blue, or RGB, color model fits best for the PSNR. PSNR shows a ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation.<\/p>\n<p><!--- cspell:disable --><\/p>\n<pre><code class=\"language-python\">import numpy as np  \r\n\r\nproduct_image_pixels = [[1, 2, 3, 4],[5, 6, 7, 8]]  \r\ngenerated_image_pixels = [[1, 1, 3, 5],[5, 6, 7, 9]] \r\n\r\nmse = np.square(np.subtract(product_image_pixels, generated_image_pixels)).mean() \r\n\r\n# Peak Signal to Noise Ratio \r\npsnr = 100 if mse == 0 else 20 * log10(255.0 \/ sqrt(mse))<\/code><\/pre>\n<p><!--- cspell:enable --><\/p>\n<h2>Cosine Similarity<\/h2>\n<p>Cosine similarity can be used to measure the likeness between different images or shapes. Vector embeddings for images are typically generated using Convolutional Neural Networks (CNN) or other deep learning techniques to capture the visual patterns in the images.<\/p>\n<p>We only use the \u2019feature learning\u2019 part of the CNN flow for the purpose of comparing the vectors representing the features.<\/p>\n<p><!--- cspell:disable --><\/p>\n<pre><code class=\"language-python\">import numpy as np \r\nfrom tensorflow.keras.applications.vgg16 import VGG16 \r\nfrom keras.models import Model \r\nfrom keras.preprocessing import image \r\nfrom keras.applications.imagenet_utils import preprocess_input \r\nfrom scipy import spatial \r\n\r\ndef get_feature_vector(img_path): \r\n    # Name of last layer for feature extraction \r\n    feature_extraction_layer = 'fc2' \r\n    model = VGG16(weights='imagenet', include_top=True) \r\n    model_feature_vect = Model(inputs=model.input, outputs=model.get_layer(feature_extraction_layer).output) \r\n    img = image.load_img(img_path, target_size=self.img_size_model) \r\n    img_arr = np.array(img) \r\n    img_arr = np.expand_dims(img_arr, axis=0) \r\n    processed_img = preprocess_input(img_arr) \r\n    feature_vect = model_feature_vect.predict(processed_img)  \r\n    return feature_vect \r\n\r\n# Compute feature vector extracted \r\nfea_vec_img1 = get_feature_vector(\u2018product.png\u2019) \r\nfea_vec_img2 = get_feature_vector(\u2018generated-image.png\u2019) \r\n\r\n# Flatten the array from 2d to 1d for cosine processing \r\nfea_vec_img1 = fea_vec_img1.flatten()  \r\nfea_vec_img2 = fea_vec_img2.flatten()  \r\n\r\n# Cosine Similarity \r\ncos_sim = 1-spatial.distance.cosine(fea_vec_img1, fea_vec_img2)<\/code><\/pre>\n<p><!--- cspell:enable --><\/p>\n<h2>Template Matching<\/h2>\n<p>There were also experiments run on detecting where the product exists within the generated image, to compare same-sized images containing the product. This was done using template matching, which is a method used to locate a smaller image (the \u2018template\u2019) within a larger image. It works by sliding the template image across the larger image and calculating a similarity score at each position to find a match.<\/p>\n<p><!--- cspell:disable --><\/p>\n<pre><code class=\"language-python\">import cv2 as cv\r\n\r\ntemplate_img = cv.imread('product.png', cv.IMREAD_UNCHANGED) \r\ngenerated_img = cv.imread('generated-image.png', cv.IMREAD_UNCHANGED) \r\n\r\n# Template Match - Find the location of the product within the generated image \r\nres = cv.matchTemplate(generated_img, template_img, cv.TM_CCOEFF) \r\n_, max_loc = cv.minMaxLoc(res) \r\ntop_left = max_loc \r\nh, w = template_img.shape[:2] \r\nbottom_right = (top_left[0] + w, top_left[1] + h) \r\n\r\n# Draw a rectangle around the matched region. \r\ncv.rectangle(generated_img, top_left, bottom_right, (255, 0, 0), 5)<\/code><\/pre>\n<p><!--- cspell:enable --><\/p>\n<h2>Results<\/h2>\n<h2>Comparators<\/h2>\n<p>To test the capabilities of each image comparator technique, a series of benchmark images were used. These are images where the delta between the images is known, such that we can accurately compare expected vs. actual results. The images include tests such as various levels of image fill, transparency fill, content rotation, translation, coloring, and image scaling.<\/p>\n<h3>Mean Squared Error<\/h3>\n<p>MSE performed well in accurately detecting changes in exact pixel differences, which is to be expected. It considers all pixel RGBA values when evaluating the difference between the images. This means that it performed well in detecting product color and disproportionate scale differences, assuming the product is in the same location in each image.<\/p>\n<p>MSE isn\u2019t useful when the product was translated unless a template matcher is used. This is expected, as it is simply comparing each pixel value of the image. If the product was rotated or proportionately scaled, again MSE will simply find the differences between the pixel values, even if those differences are acceptable.<\/p>\n<h3>Peak Signal-to-Noise Ratio<\/h3>\n<p>PSNR performed well in detecting changes of pixel differences like MSE, albeit due to the logarithmic nature of the measurements the results can be more difficult to discern. For example, comparing a fully white image against a half white half transparent image one might expect a value of <code>47<\/code> since the range of PSNR is between <code>-6.02<\/code> and <code>100<\/code>. However, it is <code>-3.01<\/code> due to that log part of the calculation. The advantage though of a logarithmic result is that it is easy to tell how many orders of magnitude different the images are. For example if MSE was <code>10<\/code>, the result of PSNR is <code>38.13<\/code>. If MSE is <code>100<\/code>, the result is <code>28.13<\/code> and so on. It performs well in detecting product color and scale differences, assuming the product is in the same location in each image.<\/p>\n<p>Like MSE, PSNR wasn\u2019t useful when the product was translated, unless a template matcher is used. It also can\u2019t discern a product rotation or proportionate scale and will simply report the raw pixel differences like MSE.<\/p>\n<h3>Cosine Similarity<\/h3>\n<p>Cosine similarity performed well in detecting changes in the edges and curves that define the product. This means that it has less need for a template matcher. So, if the image was translated or proportionately scaled, it will accurately determine if there were changes made to the product itself rather than the whole image.<\/p>\n<p>Like MSE and PSNR though, it isn\u2019t useful when the product is rotated, unless detecting such rotation differences is a desired result as a detected discrepancy against the baseline product image. With a disproportionate scaling comparison, it determines the images to be similar, which is not desired for our use case. This is expected due to how feature matching works, but not particularly useful in detecting when a change was made to a product. It also won\u2019t detect pixel color changes as well as MSE or PSNR due to it only comparing features.<\/p>\n<h2>Template Matching<\/h2>\n<p>We used OpenCV to find a template in a target image. The template can be located if it is the same size or smaller than the target image and has the same resolution. If the target image and the template have different resolutions, then there is no guarantee the template can be found without some resizing techniques. OpenCV does not keep track of resolution so implementing this functionality would require additional knowledge about the image resolutions. We recommend keeping the resolution the same between the template and the target image. The resolution of the target image will be dictated by the particular GenAI model used so several resolutions of the same template image may be necessary to match the output resolution of the GenAI models.<\/p>\n<h2>Experiment Screenshots<\/h2>\n<h2>Comparator<\/h2>\n<p>Here are a few of the results of comparator color, translation, rotation, and scale delta experiments:<\/p>\n<h3>Color<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/color.png\" alt=\"\" \/>\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/color-chart.png\" alt=\"\" \/><\/p>\n<h3>Translation<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/translation.png\" alt=\"\" \/>\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/translation-chart.png\" alt=\"\" \/><\/p>\n<h3>Rotation<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/rotation.png\" alt=\"\" \/>\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/rotation-chart.png\" alt=\"\" \/><\/p>\n<h3>Scale<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/scale.png\" alt=\"\" \/>\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/scale-chart.png\" alt=\"\" \/><\/p>\n<h2>Template Matching<\/h2>\n<h3>Template image and AI generated image<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/teapot-template.png\" alt=\"\" \/>\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/teapot-generated-background.png\" alt=\"\" \/><\/p>\n<h3>Template match detected the bounding box at which the template image was found within the AI generated image<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/template-matcher.png\" alt=\"\" \/><\/p>\n<h3>Side-by-side comparison of original product image against the detected product in the generated image<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/teapot-detected.png\" alt=\"\" \/><\/p>\n<h3>Visualization of the differences between them<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2024\/08\/teapot-mse-comparison.png\" alt=\"\" \/><\/p>\n<h2>Conclusion<\/h2>\n<p>With the combination of template matching, MSE, PSNR, and Cosine Similarity, we can achieve a strong system of image comparison to determine whether the product has been edited in the AI-generated image output.<\/p>\n<p>A combination of MSE (or PSNR) and Cosine Similarity comparators can give us meaningful metrics to determine undesired changes between the baseline product image and the generated image. The strength of combining multiple image comparators is that we can leverage the benefits of each, and they complement each other&#8217;s shortcomings. With MSE and PSNR, the key benefit is in determining color differences and disproportionate scaling. With Cosine Similarity, the key benefit is determining differences in the edges and curves that define the product, regardless of its location or proportionate scale in the generated image.<\/p>\n<p>Template matching serves to provide a baseline of which to compare the images. By finding where the product is in the generated image, we can scope the comparison of only the relevant pixels. This helps when running MSE and PSNR due to the nature of how those comparison techniques are performed and gives us a more meaningful result from those comparators.<\/p>\n<h2>Further Considerations<\/h2>\n<p>Further considerations include handling scaled and rotated images as part of the template matching, as well as researching methods of detecting additions to the image outside of the bounding box of the template. For example regarding the latter, in the template matching generated image there is a slight, aesthetically pleasing, yet inaccurate addition to the top of the teapot, which unfortunately wouldn&#8217;t be included in the comparison. Solving for this would require further research. For the scaling problem, the recommendation is to keep the template and generated image&#8217;s resolutions the same else resort to some resizing techniques. Regarding the rotation template matching it can be solved with some additional code to test various angles of product rotation against the generated image, though would take longer depending on how many degrees of rotation to be tried per position.<\/p>\n<h2>References<\/h2>\n<ul>\n<li><a href=\"https:\/\/pyimagesearch.com\/2014\/09\/15\/python-compare-two-images\">MSE<\/a><\/li>\n<li><a href=\"https:\/\/www.geeksforgeeks.org\/python-peak-signal-to-noise-ratio-psnr\">PSNR<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/lbrejon\/Compute-similarity-between-images-using-CNN\">Cosine Similarity<\/a><\/li>\n<li><a href=\"https:\/\/docs.opencv.org\/3.4\/d4\/dc6\/tutorial_py_template_matching.html\">Template Matching<\/a><\/li>\n<\/ul>\n<p><em>All product related images in this post were AI generated by OpenAI via ChatGPT.<\/em>\n<em>Benchmark images were manually created in Gimp<\/em>\n<em>Chart images were generated by matplotlib library in jupyter notebook<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Various methodologies of measuring differentials between a product image and an AI-generated ad for the purpose of product representation integrity in serving AI-generated ads.<\/p>\n","protected":false},"author":162076,"featured_media":15601,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,3451],"tags":[33,3547,3545,3546,206],"class_list":["post-15600","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cse","category-ise","tag-ai","tag-cnn","tag-computervision","tag-cv","tag-image"],"acf":[],"blog_post_summary":"<p>Various methodologies of measuring differentials between a product image and an AI-generated ad for the purpose of product representation integrity in serving AI-generated ads.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/15600","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/162076"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=15600"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/15600\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/15601"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=15600"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=15600"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=15600"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}