Contact Us

AI Solution

industry

AI Generated Captions for Photobooks

Client Background

Customizable photo books for digital and physical experiences

Mixbook, a company specializing in creating customizable photo books, sought to enhance its product offerings and create a more engaging user experience. The company aimed to transition from a purely physical product to an integrated digital and physical experience.

Challenge

Enhancing shareability and user engagement

Mixbook faces intense competition in a highly seasonal market. To differentiate themselves, they need to innovate their product and provide added value to their customers. Their goal is to develop a digital companion product that complements their physical photo books, enhancing shareability and user engagement.

Solution

Developing AI-driven functionalities

KUNGFU.AI was tasked with developing AI-driven functionalities to meet Mixbook's needs. The project involved two major phases:

AI-Powered Caption Generation:
  • Implemented a Visual Question Answering AI model (InstructBlip) to create an auto-suggest feature for image captions in the photo book creation tool.
  • Conducted extensive prompt engineering to ensure that the captions generated were fun and inspiring, aligning with Mixbook’s brand tone.
Intelligent Digital Experience Creation:
  • Developed a tool to automatically generate intelligent digital experiences from photo books.
  • Developed a way to automatically segment user photo books into memories and moments within said chapters using LLMs to synthesize image captions along with associated image metadata
  • Utilized a BiRefNet model for salient object detection to identify focal points in images, enabling the system to emphasize key elements in photos (e.g., zooming into the Eiffel Tower in a picture of people pointing at it)
  • Implemented video orchestration techniques, aligning digital experience moments with musical beats and incorporating simple camera effects.
  • Developed a method for automatically assessing image significance by combining photo book metadata with outputs of vision models trained to predict human-rated aesthetic scores (NIMA) and memorability scores (VitMem)

Implementation

A collaborative implementation

The collaboration involved Mixbook's engineers focusing on video production, while the KUNGFU.AI team concentrated on AI approaches for story understanding.

Conclusion

AI-driven innovation

The collaboration between KUNGFU.AI  and Mixbook successfully integrated AI-driven features into Mixbook’s platform, significantly enhancing the user experience and setting the stage for future growth. This case study exemplifies how leveraging advanced technologies can drive business innovation and competitive advantage.

“It definitely added to the creativity of the book without me having to think about it. Both captions I used were right on target."

AI
Machine Learning
Generative

AI Generated Captions for Photobooks

Client Background

Customizable photo books for digital and physical experiences

Mixbook, a company specializing in creating customizable photo books, sought to enhance its product offerings and create a more engaging user experience. The company aimed to transition from a purely physical product to an integrated digital and physical experience.

Challenge

Enhancing shareability and user engagement

Mixbook faces intense competition in a highly seasonal market. To differentiate themselves, they need to innovate their product and provide added value to their customers. Their goal is to develop a digital companion product that complements their physical photo books, enhancing shareability and user engagement.

Solution

Developing AI-driven functionalities

KUNGFU.AI was tasked with developing AI-driven functionalities to meet Mixbook's needs. The project involved two major phases:

AI-Powered Caption Generation:
  • Implemented a Visual Question Answering AI model (InstructBlip) to create an auto-suggest feature for image captions in the photo book creation tool.
  • Conducted extensive prompt engineering to ensure that the captions generated were fun and inspiring, aligning with Mixbook’s brand tone.
Intelligent Digital Experience Creation:
  • Developed a tool to automatically generate intelligent digital experiences from photo books.
  • Developed a way to automatically segment user photo books into memories and moments within said chapters using LLMs to synthesize image captions along with associated image metadata
  • Utilized a BiRefNet model for salient object detection to identify focal points in images, enabling the system to emphasize key elements in photos (e.g., zooming into the Eiffel Tower in a picture of people pointing at it)
  • Implemented video orchestration techniques, aligning digital experience moments with musical beats and incorporating simple camera effects.
  • Developed a method for automatically assessing image significance by combining photo book metadata with outputs of vision models trained to predict human-rated aesthetic scores (NIMA) and memorability scores (VitMem)

Implementation

A collaborative implementation

The collaboration involved Mixbook's engineers focusing on video production, while the KUNGFU.AI team concentrated on AI approaches for story understanding.

Conclusion

AI-driven innovation

The collaboration between KUNGFU.AI  and Mixbook successfully integrated AI-driven features into Mixbook’s platform, significantly enhancing the user experience and setting the stage for future growth. This case study exemplifies how leveraging advanced technologies can drive business innovation and competitive advantage.

“It definitely added to the creativity of the book without me having to think about it. Both captions I used were right on target."

AI
Machine Learning
Generative

Download the Case Study

More case studies
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
X Icon