kaggle竞赛-Stable Diffusion数据分析与baseline

原创

已于 2023-04-17 00:19:07 修改 · 3.7k 阅读

标签

#kaggle #人工智能竞赛 #图像 #文本生成

于 2023-04-16 22:21:46 首次发布

该竞赛旨在创建一个模型，能根据生成的图像预测文本提示。参赛者需利用StableDiffusion2.0生成的图像和提示词数据集，评估预测提示的嵌入与实际嵌入的平均余弦相似度。文章提供了数据集的探索、模型构建示例以及训练和推理代码片段。

请添加图片描述

你的目的是来预测我们生成图像的提示词

1.比赛目标

这个竞赛的目标不是从文本提示生成图像，而是创建一个模型，可以在给定生成图像的情况下预测文本提示（你有一堆提示词，你预测是否该提示词参与了图像的生成）?您将在包含由Stable Diffusion 2.0生成的各种(提示，图像)对的数据集上进行预测，以了解潜在关系的可逆程度。

2.内容

文本到图像模型的流行已经摒弃了提示工程的一个全新领域。一部分是艺术，一部分是悬而未决的科学，ML从业者和研究人员正在迅速努力理解提示和它们生成的图像之间的关系。在提示符上添加“4k”是使其更逼真的最佳方法吗?提示中的小扰动会导致高度发散的图像吗?提示关键字的顺序如何影响生成的场景?这个竞赛的任务是创建一个模型，可以可靠地反转生成给定图像的扩散过程。

为了以一种稳健的方式计算提示的相似性——这意味着“史诗猫”的得分与“威严的小猫”相似（meaning that “epic cat” is scored as similar to “majestic kitten” in spite of character-level differences），尽管它们在字符级别上存在差异——你将提交你预测的提示的嵌入。是直接建模嵌入，还是先预测提示，然后转换为嵌入，都取决于您!祝你好运，并愿你在此创建“高质量、锐利焦点、复杂、详细、不真实的健壮交叉验证风格”的模型。

3评价指标

使用预测和实际提示嵌入向量之间的平均余弦相似度评分来评估提交。如何为groundtruth提示计算嵌入的精确细节见

数据

images/ - 是一些从提示词中产生的图像;你的任务是预测是哪些提示词用来产生这个图像.隐藏的测试数据集包含大约16000张图片。
prompts.csv - 用来产生图像的提示词。These are provided as illustrative examples only. It is up to each competitor to develop their own strategy of creating a training set of images, using pre-trained models, etc. Note that this file is not contained in the re-run test set, and thus referencing it in a Notebook submission will result in a failure.
sample_submission.csv - 一个正确的示范 The values found in this file are embeddings of the prompts in the prompts.csv file and thus can be used validate your embedding pipeline. This notebook demonstrates how to calculate embeddings.

探索性数据分析（Exploratory Data Analysis，EDA）

import os
import glob
import math
import random

import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import cv2
import matplotlib.pyplot as plt

    df_prompts = pd.read_csv("../input/stable-diffusion-image-to-prompts/prompts.csv")
df_prompts

在这里插入图片描述

图像id转路径

def image_id2path(
    img_id: str, 
    folder: str = "stable-diffusion-image-to-prompts"
) -> str:
    return f"../input/{
     
     folder}/images/{
     
     img_id}.png"

图像展示

def show_images_and_prompts(
    df: pd.DataFrame, 
    folder: str = "stable-diffusion-image-to-prompts",
    n: int = 10,
) -> None:
    if n == -1:
        n = df.shape[0]
    for ind, row in df[:n].iterrows():
        img_id = row["imgId"]
        prompt = row["prompt"]
        path = image_id2path(img_id, folder)
        image = cv2.imread(path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
        if ind % 2 == 0:
            plt.figure(figsize=(16, 8))
            plt.subplot(1, 2, 1)
        else:
            plt.subplot(1, 2, 2)
            
        plt.imshow(image)
        list_prompt_words = prompt.split()
        if len(prompt) > 100:
            _len = len(list_prompt_words)
            prompt = "{}\n{}\n{}".format(
                " ".join(list_prompt_words[:_len // 3]),
                " ".join(list_prompt_words[_len // 3 : 2 * _len // 3]),
                " ".join(list_prompt_words[2 * _len // 3:]),
            )
        elif len(prompt) > 50:
            _len = len(list_prompt_words)
            prompt = "{}\n{}".format(
                " ".join(list_prompt_words[:_len // 2]),
                " ".join(list_prompt_words[_len // 2:])
            )
        plt.title(prompt, fontsize=14)
        plt.axis("off")

if df_prompts is not None:
    show_images_and_prompts(df_prompts, n=7)

请添加图片描述

左上到右下分别意思为
超级逼真的照片，非常友好和反乌托邦的陨石坑
拉面用分形的玫瑰乌木雕刻而成，以哈德逊河学派的风格
超龙在树林里拿着一个黑豆卷，旁边是一只一模一样的角龙。
一个轰鸣的复古机器人起重机与一只无精打采的法国斗牛犬在羊皮纸上作画

import sys

sys.path.append('../input/sentence-transformers-222/sentence-transformers')
from sentence_transformers import SentenceTransformer, models
EMB_SIZE = 384
df_sample_submission = pd.rea