{"id":3161,"date":"2023-04-20T11:59:36","date_gmt":"2023-04-20T18:59:36","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/surface-duo\/?p=3161"},"modified":"2024-01-03T16:26:06","modified_gmt":"2024-01-04T00:26:06","slug":"android-openai-chatgpt-4","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-4\/","title":{"rendered":"Does OpenAI on Android dream of electronic sheep"},"content":{"rendered":"<p>\n  Hello prompt engineers,\n<\/p>\n<p>\n  We\u2019re back with another blog post on using OpenAI in Android applications!\n<\/p>\n<p>\n  So far in this blog series, we\u2019ve covered:\n<\/p>\n<ul>\n<li><a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-1\/\">OpenAI developer assistance<\/a>\n  <\/li>\n<li>\n     <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-2\/\">ChatGPT on Android with OpenAI<\/a>\n  <\/li>\n<li><a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-3\/\">OpenAI API endpoints<\/a>\n  <\/li>\n<\/ul>\n<p>\n  Last week, we talked about the different API endpoint options and showed you some examples of how to use the <code>Edits<\/code> API. Today, we\u2019ll be focusing more on the <code>Images<\/code> API and how you can set up some interesting interactions in your Android apps, such as a camera filter with editing capabilities.\n<\/p>\n<h2>Images API overview<\/h2>\n<p>\n   The <a href=\"https:\/\/platform.openai.com\/docs\/guides\/images\">Images API<\/a> uses <a href=\"https:\/\/openai.com\/product\/dall-e-2\">DALL-E models<\/a> to interact with or generate images based on user prompts. You can use the API in three ways:\n<\/p>\n<ol>\n<li><strong>Generations<\/strong>: provide a text prompt to generate a new image\n<\/li>\n<li><strong>Edits<\/strong>: provide a text prompt and an existing image to generate an edited image\n<\/li>\n<li><strong>Variations<\/strong>: provide an existing image to generate random variations of the image\n<\/li>\n<\/ol>\n<p>\n  If you\u2019re curious about the output from the <code>Images<\/code> API, you can check out the <a href=\"https:\/\/labs.openai.com\/\">DALL-E preview app<\/a> and provide some of your own prompts. You can describe image style and content with lots of details, or provide more abstract concepts to see what the model will come up with! The preview app shows examples of prompts and generated images to give you an idea of where to start \u2013 here are some image prompts we tested out ourselves:\n<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image1.jpg\"><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image1.jpg\" alt=\"four different images generated by DALL-E: a 3D cartoonish dog, a coffee shop, an snowglobe, and a cartoon sasquatch poster\" width=\"1016\" height=\"1252\" class=\"alignnone size-full wp-image-3165\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image1.jpg 1016w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image1-243x300.jpg 243w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image1-831x1024.jpg 831w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image1-768x946.jpg 768w\" sizes=\"(max-width: 1016px) 100vw, 1016px\" \/><\/a><\/p>\n<p>\n  In this blog post, we\u2019ll be focusing specifically on the <a href=\"https:\/\/platform.openai.com\/docs\/api-reference\/images\/create-edit\">image edits endpoint<\/a>. As mentioned above, this endpoint accepts two inputs: a text prompt and an image. The text prompt describes what the final edited image should look like, while the image provided should contain a masked transparent area to show which parts of the image can be edited. As with all the image APIs, the image input must be square and less than 4MB. To learn more, check out the <a href=\"https:\/\/platform.openai.com\/docs\/api-reference\/images\/create-edit\">API reference<\/a> and use the <a href=\"https:\/\/labs.openai.com\/editor\">editor portion of preview app<\/a>.\n<\/p>\n<h2>Integrating image editing into an Android app<\/h2>\n<p>\n  Now that we understand what the image editing endpoint requires for inputs, let\u2019s try to build an Android app that targets this endpoint. Our end goal will be to create a camera filter that lets users capture and edit their photos with OpenAI.\n<\/p>\n<p>\n  We\u2019ll need to write code to:\n<\/p>\n<ul>\n<li>\n    Capture images\n  <\/li>\n<li>\n    Apply a transparent mask to an image\n  <\/li>\n<li>\n    Build an OpenAI request to the image editing endpoint\n  <\/li>\n<\/ul>\n<h3>Capture images<\/h3>\n<p>\n  There are multiple ways you can do this in Android, but today we\u2019ll be using CameraX. CameraX is a Jetpack library designed to make camera app development easier, with lots of <a href=\"https:\/\/developer.android.com\/training\/camerax\">documentation<\/a>, <a href=\"https:\/\/github.com\/android\/camera-samples\/\">samples<\/a>, and <a href=\"https:\/\/developer.android.com\/codelabs\/camerax-getting-started\">codelabs<\/a> available for use.\n<\/p>\n<p>\n  To capture images with CameraX, you must:\n<\/p>\n<ul>\n<li>\n    Request the necessary permissions &#8211; <code>android.hardware.camera.any<\/code>, <code>android.permission.CAMERA<\/code>, <code>android.permission.RECORD_AUDIO<\/code> and <code>android.permission.WRITE_EXTERNAL_STORAGE<\/code>\n  <\/li>\n<li>\n    Add a <code>PreviewView<\/code> so users can see a preview of their photo \u2013 this requires setting up a <code>ProcessCameraProvider<\/code>, a <code>CameraSelector<\/code>, and a <code>Preview<\/code> object with a <code>SurfaceProvider<\/code>\n  <\/li>\n<li>\n    Add a capture button and call <code>takePhoto<\/code> with the appropriate image file metadata\n  <\/li>\n<\/ul>\n<p>\n  For more detailed implementation information, check out the <a href=\"https:\/\/developer.android.com\/training\/camerax\/take-photo\">documentation on image capture with CameraX<\/a>.\n<\/p>\n<h3>Create image mask<\/h3>\n<p>\n  The next step is to let users apply a transparent mask to part of the captured image, which will allow OpenAI to fill in the masked area based on the text prompt.\n<\/p>\n<p>\n  We can do this by creating a simple custom view that overrides the <code>onDraw<\/code> and <code>onTouchEvent<\/code> methods to let the user color in the desired mask area. To learn more about setting up custom views for drawing input, check out the documentation on <a href=\"https:\/\/developer.android.com\/develop\/ui\/views\/layout\/custom-views\/custom-drawing\">custom drawing with Canvas<\/a> and <a href=\"https:\/\/developer.android.com\/develop\/ui\/views\/touch-and-input\/gestures\/movement\">tracking touch input<\/a>.\n<\/p>\n<p>\n  Once the user is satisfied with the mask they\u2019ve drawn, then all that\u2019s left to do is combine the original image with the mask drawing. We can accomplish this with the help of the <a href=\"https:\/\/developer.android.com\/reference\/android\/graphics\/PorterDuff.Mode#SRC_OUT\"><code>PorterDuff.Mode.SRC_OUT<\/code><\/a>, which will delete pixels from the original image wherever the mask overlaps it.\n<\/p>\n<pre>\/\/ customView \u2013 reference to custom view that tracks user input\r\n\/\/ imageBitmap \u2013 reference to (cropped) original image bitmap\r\n\/\/ extract masked area from custom view\r\nval maskBitmap = Bitmap.createBitmap(imageBitmap.width, imageBitmap.height, Bitmap.Config.ARGB_8888)\r\nval canvas = Canvas(maskBitmap)\r\ncustomView?.draw(canvas)\r\n\r\n\/\/ combine mask with original image\r\nval paint = Paint()\r\npaint.xfermode = PorterDuffXfermode(PorterDuff.Mode.SRC_OUT)\r\n\r\ncanvas.drawBitmap(imageBitmap, 0f, 0f, paint)<\/pre>\n<h2>\n  Build OpenAI request\n<\/h2>\n<p>Finally, once we\u2019ve processed the masked image, and collected additional user input for the text prompt, we just need to build an OpenAI request to the image editing endpoint. If you need help getting started with OpenAI API keys and other setup, please refer to our <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-2\/#get-started-with-openai\">previous blog post<\/a>.\n<\/p>\n<p>\n  As described in the <a href=\"https:\/\/platform.openai.com\/docs\/api-reference\/images\/create-edit\">API reference<\/a>, the image editing endpoint accepts 2-7 fields in the request body:\n<\/p>\n<ul>\n<li>\n    image (string) \u2013 <strong>required<\/strong>, must be square PNG &lt; 4MB\n  <\/li>\n<li>\n    prompt (string) \u2013 <strong>required<\/strong>, max length 1000 characters\n  <\/li>\n<li>\n    mask (string) \u2013 must be square PNG &lt; 4MB\n  <\/li>\n<li>\n    n (integer) \u2013 must be between 1 (default) and 10\n  <\/li>\n<li>\n    size (string) \u2013 must be <code>256x256<\/code>, <code>512x512<\/code>, or <code>1024x1024<\/code> (default)\n  <\/li>\n<li>\n    response_format (string) \u2013 must be <code>url<\/code> (default) or <code>b64_json<\/code>\n  <\/li>\n<li>\n    user (string)\n  <\/li>\n<\/ul>\n<p>\n  All of the default values work well for our purposes, so we\u2019ll only need to supply the <code>image<\/code> and <code>prompt<\/code> fields when building our request. Since an image file is included in the request, we will need to build a multipart request.\n<\/p>\n<p>\n  This code snippet shows how you can use the <a href=\"https:\/\/github.com\/square\/okhttp\">OkHttp library<\/a> to build a multipart request with a masked image bitmap and prompt string:\n<\/p>\n<pre>val maskedImage = ByteArrayOutputStream().use {\r\n    maskBitmap.compress(Bitmap.CompressFormat.PNG, 100, it)\r\n    it.toByteArray().toRequestBody(\"image\/png\".toMediaType())\r\n}\r\n\r\nval requestBody = MultipartBody.Builder()\r\n    .setType(MultipartBody.FORM)\r\n    .addFormDataPart(\"prompt\", prompt)\r\n    .addFormDataPart(\"image\", \"image.png\", maskedImage)\r\n    .build()\r\n\r\nval request = Request.Builder()\r\n    .url(\"https:\/\/api.openai.com\/v1\/images\/edits\")\r\n    .addHeader(\"authorization\", \"Bearer ${OPENAI_KEY}\")\r\n    .post(requestBody)\r\n    .build()\r\n\r\nval client = OkHttpClient.Builder().build()\r\nval response = client.newCall(request).execute()<\/pre>\n<p>\n  The image edits endpoint response will send back <code>n<\/code> images in the specified <code>response_format<\/code>. It\u2019s not uncommon for the requests to take 5-10 seconds, so don\u2019t worry if it takes a bit. Since we used the default values for the request, we\u2019ll only be getting back one image in url format. To parse this from the response body, we can add just a few simple lines of code:\n<\/p>\n<pre>val jsonContent = response.body?.string() ?: \"\"\r\nval data = Json.parseToJsonElement(jsonContent).jsonObject[\"data\"]?.jsonArray\r\nval imageUrl = data?.map { it.jsonObject[\"url\"] }?.get(0).toString()<\/pre>\n<p>\n  And just like that, you have your edited image! You can display this in your app with coil\u2019s <a href=\"https:\/\/coil-kt.github.io\/coil\/compose\/#asyncimage\">AsyncImage<\/a> if you\u2019re using Jetpack Compose, or open an <code>InputStream<\/code> to download the image bitmap from the url \u2013 depending on which method you choose, you may also need to add the <code>android.permission.INTERNET<\/code> permission to your Android manifest. \n<\/p>\n<p>\n  Here are some examples of the image editing camera filter in action:\n<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image2.jpg\"><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image2.jpg\" alt=\"Four images showing different edits applied to a photo of a coffee mug\" width=\"1429\" height=\"1429\" class=\"alignnone size-full wp-image-3166\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image2.jpg 1429w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image2-300x300.jpg 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image2-1024x1024.jpg 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image2-150x150.jpg 150w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image2-768x768.jpg 768w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image2-24x24.jpg 24w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image2-48x48.jpg 48w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image2-96x96.jpg 96w\" sizes=\"(max-width: 1429px) 100vw, 1429px\" \/><\/a><\/p>\n<p><br\/><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image3.jpg\"><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image3.jpg\" alt=\"Two images showing before and after applying a DALL-E image edit\" width=\"1430\" height=\"726\" class=\"alignnone size-full wp-image-3167\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image3.jpg 1430w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image3-300x152.jpg 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image3-1024x520.jpg 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/04\/image3-768x390.jpg 768w\" sizes=\"(max-width: 1430px) 100vw, 1430px\" \/><\/a><\/p>\n<h2>Additional thoughts on crafting prompts<\/h2>\n<p>\n  As we have discussed in <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-2\/\">other blogs<\/a> related to ChatGPT and Dalle-2, crafting the perfect prompt for generating content is as much of an art as it is science. When working with the image editing endpoint in particular, there are some general tips that can help improve the quality of generated images.\n<\/p>\n<ol>\n<li>\n<p>Describe the entire image, including the part that you want replaced and how it interacts with the rest of the image.\n<\/p>\n<p>\n  In the images above, prompts like \u201cCat with beret in mug\u201d produced better results than \u201cDrawing of a man climbing a mountain\u201d. Specifically because the mug was not part of the masked-out area, the model had more context on how to fit the cat into the image as a whole.\n<\/p>\n<p>\n  Imagine someone handing you a drawing of a computer and saying, \u201cnow draw a robot\u201d. With all that artistic freedom you may draw a robot typing on the computer, or a robot movie playing on the computer screen. In comparison, if someone came to you with the computer drawing and requested \u201cnow draw a robot carrying a computer like a backpack\u201d you have much more information on what the person wants. The image editing model operates in a similar fashion, producing better results with more context not just on what you want to create, but also how it interacts with everything that is already there.\n<\/p>\n<\/li>\n<li>\n<p>Sometimes the model will ignore prompts if they are too far from expectation.\n<\/p>\n<p>\n  Say for example, instead of giving the prompt \u201cCat with beret in mug\u201d for the images above, it is replaced with \u201cMug of coffee, digital art\u201d. It is very likely that the model cannot reconcile the realistic photograph with the digital art request. Additionally, the mug is at an angle where coffee would not even be visible.\n<\/p>\n<p>\n  In this case, no part of the prompt can be used to fill in the masked area of the image. The masked area becomes open for interpretation and is often just filled with what Dalle-2 determines is a reasonable background.\n<\/p>\n<\/li>\n<\/ol>\n<h2>Resources and feedback<\/h2>\n<p>\n  To learn more about OpenAI and DALL-E, check out these resources:\n<\/p>\n<ul>\n<li><a href=\"https:\/\/platform.openai.com\/docs\/introduction\">OpenAI documentation<\/a>\n  <\/li>\n<li><a href=\"https:\/\/platform.openai.com\/docs\/api-reference\/introduction\">OpenAI API reference<\/a>\n  <\/li>\n<li><a href=\"https:\/\/openai.com\/product\/dall-e-2\">DALL-E Product Description<\/a>\n  <\/li>\n<\/ul>\n<p>\n  If you have any questions, use the <a href=\"http:\/\/aka.ms\/SurfaceDuoSDK-Feedback\">feedback forum<\/a> or message us on <a href=\"https:\/\/twitter.com\/surfaceduodev\">Twitter @surfaceduodev<\/a>.\n<\/p>\n<p>\n  We won\u2019t be livestreaming this week, but you can check out the <a href=\"https:\/\/youtube.com\/c\/surfaceduodev\">archives on YouTube<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello prompt engineers, We\u2019re back with another blog post on using OpenAI in Android applications! So far in this blog series, we\u2019ve covered: OpenAI developer assistance ChatGPT on Android with OpenAI OpenAI API endpoints Last week, we talked about the different API endpoint options and showed you some examples of how to use the Edits [&hellip;]<\/p>\n","protected":false},"author":90683,"featured_media":3167,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[741],"tags":[734,735,733],"class_list":["post-3161","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-chatgpt","tag-dalle","tag-openai"],"acf":[],"blog_post_summary":"<p>Hello prompt engineers, We\u2019re back with another blog post on using OpenAI in Android applications! So far in this blog series, we\u2019ve covered: OpenAI developer assistance ChatGPT on Android with OpenAI OpenAI API endpoints Last week, we talked about the different API endpoint options and showed you some examples of how to use the Edits [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3161","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/users\/90683"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/comments?post=3161"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3161\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media\/3167"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media?parent=3161"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/categories?post=3161"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/tags?post=3161"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}