Sign up
Log in
Sign up
Log in

Businesses can generate customizable avatars using OctoAI’s Photo Merge feature

Blog Author - Janisha Anand
Blog Author - Josh Fromm
Blog Author - Brunno Goldstein

Feb 16, 2024

6 minutes
OctoAI Image Gen Solution introduces Photo Merge, allowing you to seamlessly integrate a photo’s subject into high-quality AI-generated output. It eliminates the need to create time-consuming custom facial fine-tunes with numerous tuning images and 15-30 minutes typically associated with SDXL LoRAs. OctoAI's Photo Merge simplifies this process, requiring only 1-4 images and delivering precise results within a few seconds. Businesses can now easily apply GenAI powered imagery for needs ranging from realistic CGI characters, to personalized product recommendations, to digital avatars.

Photo Merge can be accessed through the "transfer_images" parameter within OctoAI’s Image Generation API. This parameter accepts a key-value pair consisting of a trigger word and an array of up to 4 images. It operates exclusively with SDXL models and seamlessly harmonizes with style presets, controlnets, checkpoints, and LoRAs when utilized with SDXL models, thereby amplifying its adaptability and functionality.

In this post, we will walk you through how to use the Photo Merge functionality within OctoAI Image Gen API, utilizing the new transfer_images parameter. We also show how it compares to using custom fine-tuning.

Solution overview

For our use case, we are assuming the role of a Retail Media Marketing & eCommerce Advertising Platform. With access to a real model's photo, our goal is to seamlessly integrate it into a variety of products. First, we'll generate AI-powered images of our human model. For this, we will try both the traditional approach of creating a custom fine-tune with the human model’s images and the new approach of using Photo Merge functionality. We will compare the results and latency of the two approaches. Next, we will create a custom fine-tune for the products our model will represent and lastly, we will showcase the seamless integration of the model’s face with the corresponding product.

Workflow steps

  1. Create a custom facial fine tune of the human model with 10-12 portrait images.

  2. Leverage Photo Merge (transfer_image parameter in OctoAI SDXL Image Gen API) with only 1-4 images of the human model instead of custom facial fine tune created in step 1.

  3. Compare the results between the two approaches.

  4. Create custom fine tunes (SDXL LoRAs) for a retail product.

  5. Integrate the human model images (generated in step 2) with the images of retail product (generated in step 4).


For this walkthrough, make sure you have generated an OctoAI API token and have it set in your environment. You may use any of our supported languages: Python SDK, Typescript SDK, CLI or curl to avail OctoAI’s Image Generation API. Refer to our API documentation.


Create custom fine tune (SDXL LoRAs) of a human model: We have taken 10 images of OctoAI’s CEO, Luis Ceze as our tuning image dataset.

Next, we will create a custom fine tune from OctoAI’s WebUI. Navigate to Image Generation → Tuning & Datasets.

Click the ‘+New Tune’ button to begin. Adjust fine-tuning settings and upload your images. This involves selecting a base checkpoint (in our case, default SDXL), assigning a trigger word (to customize images with your subject), and specifying the number of steps. A range of 400 to 1,200 steps generally yields optimal results. Upload the image dataset and submit the fine tuning job.

With 800 steps and 10 tuning images, it will approximately take between 20-30 mins to complete. Once it completes, let’s evaluate the effectiveness of our custom facial fine-tune.

Navigate to Image Generation → Image Tools and click on Text to Image tile card. Here, let’s use the following parameters:

It's evident that the output images don't entirely resemble our tuning dataset's human model, Luis Ceze. While the man in the output bears some resemblance to the tuning dataset, he doesn't closely resemble Luis.

Achieving a closer resemblance would require a larger tuning dataset (64-100 images) and/or increasing the number of steps, which would be both time and cost-intensive and not scalable.

Utilize Photo Merge feature: Let’s now try the new Photo Merge feature and compare the output results from both approaches. We’ll use the transfer_image parameter in OctoAI’s Image Gen API to show case this functionality.

Let us start with uploading 4 images of our human model — Luis Ceze.

Next, we utilize the transfer_images={"triggerword": list of images} parameter within the payload of OctoAI’s SDXL Image Gen API at

In the given example, we employ the trigger word ‘luis’ and link it with the dataset comprising the four images mentioned earlier. Subsequently, we structure the prompt to incorporate the trigger word.

Prompt: A man luis sitting in a coffee shop.

The remaining parameters remain consistent with approach 1. It's worth noting that in this instance, no LoRA is utilized. Additionally, we utilize a checkpoint named ‘RealVisXL’, an OctoAI asset checkpoint specifically optimized for the Photo Merge feature. However, it's important to mention that the Photo Merge feature is functional even if the base SDXL checkpoint is utilized.

The request take approximately 8.8 secs and generates the following output:

Pretty accurate, isn’t it? Let’s try it with few different prompts and combine it with other style presets, LoRAs and checkpoints to confirm whether we consistently get the accurate results.

Let us use transfer_images parameter in conjunction with ‘Graffiti’ style preset. We are keeping all other parameter values similar to the payload above.

The request take approximately 8.7 secs and generates the following output:

Let’s now use transfer_images parameter with a pre-trained Style LoRA. We have already imported a pre-trained style based LoRA into OctoAI’s Asset Library. In the payload below, we are using the corresponding asset’s asset id and assigning it a weight of 1.0.

The request take approximately 16.9 secs and generates the following output:

You'll notice that the AI-generated images of our human model closely resemble his actual images. The results of PhotoMerge are significantly more precise and do not require the additional time of fine-tuning a custom LoRA for 20-30 minutes to achieve the desired outcome.

Comparison between custom fine tunes for faces (SDXL LoRAs) vs Octo AI’s Photo Merge
ApproachesTuning image datasetSteps in fine-tuneTime for fine-tuneInference latencyResults

Custom Fine Tune for Face (SDXL LoRAs)



20-30 minutes, increasing linearly with more tuning data and num of steps

Few seconds

Poor to mediocre quality

Photo Merge




Few seconds

Precise and accurate

Now that we've determined the best approach for generating accurate images of our human model, let's bring it all together. We'll create a custom fine-tune for our retail product and seamlessly integrate our AI-generated human model's image with it.

Create custom fine tunes (SDXL LoRAs) for retail product: The steps to create a custom fine tune are similar to what was showcased earlier in the blog. We will upload 10-12 images of our product, which in our case are different colored Lacoste polo shirts for men.

We will then create a custom fine tune by configuring the appropriate fine tuning parameters (as shown earlier), assign it a different trigger word and and upload our tuning dataset.

After approximately 20-30 mins, our custom LoRA fine tuned on our branded polo-shirts will be available.

We are now ready to bring everything together. Let us use transfer_images parameter (Photo Merge) to generate accurate images of our human model, Luis and apply ‘lacosteshirt-finetune’ LoRA to the shirt he is wearing.

"prompt": "A man luis wearing a pink T-shirt lacosteshirt1:1, sitting in a coffee shop"
"loras": {"asset_01hp5hsn6mfh6b0zf47q862a6b": 1.0}

"transfer_images": {"luis": luis_b64_images}

Please note that "luis" serves as the trigger word associated with Luis’s images in transfer_images. We position this trigger word immediately after the subject, "man," enabling our human model to inherit the facial attributes of Luis’s images. Additionally, we input the asset ID of the custom LoRA tuned for Lacoste polo shirts for men, which is associated with the trigger word "lacosteshirt1." This trigger word is placed immediately after the word "T-shirt" in our prompt, ensuring that the required attributes are applied to the shirt.

The request takes seconds and generates the following output:

Voila! The generated output seems to seamlessly integrate our human model’s face - in this case, Luis's — with the corresponding product: a Lacoste pink polo T-shirt.

This blog showcases just one facet of OctoAI’s Photo Merge feature's possibilities. Photo Merge offers endless potential - whether in entertainment, gaming, marketing agencies, or fashion and retail sectors, it can help craft personalized avatars, advertisements, and brand ambassador representations. It can also enable virtual try-ons and lifelike digital product showcases. To learn more, refer to our documentation.

Get started using Photo Merge today

Sign up and try Photo Merge for free on the OctoAI Image Gen Solution today.

Please join us on Discord to engage with the team and our community. We’ll use the Discord channel to share about upcoming features, promotions and competitions. Stay tuned to learn more, and I look forward to see the applications and imagery you build using OctoAI Image Gen Solution.