AI Weekly 019

🆕 What's New?

New tutorials:

Warming Up Your Logo:How to Create a Plush Texture with ComfyUI?

Weekly‘s AI highlights

🪐Workflow worth trying

IMAGE TO CLAY STYLE (opens in a new tab)

You only need to upload any character image to this workflow, and it can help you generate images in clay art style.

cool Ice style logo v0.3 (opens in a new tab)

This is a workflow that can generate cool ice-style icons, suitable for gradient, single subject, solid color logos. If the main subject is blue or white, there is a high probability that it will have a transparent texture. Uploading an image only requires 10 steps to quickly generate.

You can subscribe to our newsletter (opens in a new tab), or join our Discord (opens in a new tab) to get the latest tutorials.

🏗️ Plugins worth trying

ComfyUI_VisualStylePrompting (opens in a new tab)

This plugin is an image generation and style transformation tool, allowing users to import an image and have the AI generate content similar to the style of the imported image. For example, as shown below, a picture of an origami rabbit was imported, and finally, the AI generated an orange origami-style fox based on that picture.

ComfyUI-post-processing-nodes (opens in a new tab)

The ComfyUI-post-processing-nodes plugin is a style intervention plugin that allows you to use the plugin to have AI generate images with specific visual styles, such as motion blur, frosted glass effects, etc., to enhance the visual quality and artistic expressiveness of the images.

📄 Noteworthy papers and technic

StoryDiffusion (opens in a new tab)

StoryDiffusion is a platform capable of generating long-range images and videos. Through a consistent self-attention mechanism, it can produce unified and coherent comic and cartoon characters, maintaining consistency in character style and clothing, to achieve coherent story narration. It is suitable for creating long narrative content.

MaPa (opens in a new tab)

MaPa can automatically design realistic material effects for 3D models based on your text descriptions. Instead of using traditional methods for texturing, it employs a procedural approach to generate materials. This not only looks more realistic, but also allows you to freely adjust the details of the materials.

B-LoRA (opens in a new tab)

This paper mainly introduces a method called B-LoRA, which utilizes LoRA (Low-Rank Adaptation) technology to implicitly separate style and content in a single image. This approach can significantly improve style manipulation and overcome the common overfitting issues associated with model fine-tuning. Once trained, these two B-LoRAs can be used as independent components to perform various image stylization tasks, including image style transfer, text-based image stylization, consistent style generation, and style-content mixing.

MagicDance (opens in a new tab)

This paper proposes a diffusion model-based technique called MagicPose for human pose and facial expression redirection. This technique aims to generate new images by manipulating the pose and facial expressions of characters, while keeping the pose consistent. As shown in the diagram below, by inputting the character's pose image in the upper left corner and importing the corresponding reference images (first row), the AI can generate images consistent with the style of the reference images, and with the same pose as in the pose image (second row).

Visual Fact Checker (opens in a new tab)

VisualFactChecker is a visual model launched by Nvidia that can read 2D or 3D images, thereby generating detailed image descriptions. Moreover, compared to other models available in the market like GPT-4V and Cap3D, its generated text descriptions are more precise.

Capabilities of Gemini Models in Medicine (opens in a new tab)

Med-Gemini is a multimodal artificial intelligence model specifically designed for the medical field, built on the Gemini architecture. It enhances performance in applications involving text, multimodality, and long-text contexts through self-training, integrated web search, and customized encoders. Additionally, Med-Gemini can perform medical question answering, analyze images such as X-rays, and support surgical videos, genomics, extensive health records, electrocardiograms, etc., to assist doctors in diagnosis.

🛠️ Products you should try

Amazon Q (opens in a new tab)

Amazon Q is a high-performance generative artificial intelligence (AI) assistant capable of generating highly accurate code, and it has testing, debugging, multi-step planning, and reasoning capabilities. By connecting to corporate data warehouses, Amazon Q can help employees logically summarize data, analyze trends, and engage in dialogue about data, simplifying queries about company policies, product information, business outcomes, code repositories, employees, and other topics.

Logo Diffusion (opens in a new tab)

Logo Diffusion is a platform that creates unique, customized logos. Users can generate original designs through simple text prompts, ranging from basic sketches to detailed logo designs, and can even convert 2D images or logos into 3D illustrations. Logo Diffusion offers a range of tools, such as AI-to-vector file conversion, background removal, and an integrated browser editor, eliminating the need for Photoshop or Illustrator.

Subscribe for free to receive new posts and support my work. Or join our Discord.