AI4VA @ CVPR 2026 | The 3rd AI for Visual Arts Workshop

Abstraction

Heritage

Narrative

About

Where machines learn to perceive abstraction

The 3rd AI for Visual Arts workshop at CVPR 2026 explores how computer vision perceives, interprets, and models abstraction across artistic and cultural imagery—paintings, comics, sculptures, and installations that challenge assumptions of conventional models.

These stylized domains expose weaknesses in robustness, generalization, and interpretability, providing a principled framework for evaluating perception beyond realism. Can models "see" abstraction like humans, recognizing the dog-ness in a Picasso or inferring motion from comic lines?

02 Research Areas

Topics of Interest

Visual Arts SegmentationDepth Estimation in Visual ArtsSaliency Detection in Narrative Visual FormsVision-Language Alignment in Arts / Multimodal LearningGenerative Models and 3D Modeling in Visual ArtsProvenance TrackingComputer Vision for Museum Analysis and RestorationAI Ethics in ArtImage Composition Visual Arts SegmentationDepth Estimation in Visual ArtsSaliency Detection in Narrative Visual FormsVision-Language Alignment in Arts / Multimodal LearningGenerative Models and 3D Modeling in Visual ArtsProvenance TrackingComputer Vision for Museum Analysis and RestorationAI Ethics in ArtImage Composition

001Robust segmentation & depth estimation in stylized imagery

002Saliency detection in narrative visual forms

003Vision–language alignment in art interpretation

004Generative models & 3D modeling from artworks

005Provenance tracking & authenticity verification

006Watermark robustness in AI-generated art

007Computer vision for museum analysis & digitization

008Art restoration & preservation using computer vision

009AI ethics, authorship & responsible AI in creative contexts

010Multimodal reasoning in art & cultural heritage

011Art & animation

012Image composition for visual arts

013Any AI × Visual Art intersection

★ Congratulations

Award Winners

Congratulations to the recipients of the Best Paper and Best Poster awards at the 3rd AI for Visual Arts Workshop. We thank all authors for their outstanding contributions, and our sponsor Meitu for making these awards possible.

🏆

Best Paper Award

$1,000 · Sponsored by Meitu

Form and Void: Entangled Composition through an Autonomous AI Agent

Shiwen Wang, Jian Yang, Xu Wang, Xincan Wang, Weiming Dong

🎖️

Best Poster Award

$500 · Sponsored by Meitu

Attention-Enhanced Multi-ControlNet for Artist-Aligned Manga Background Generation

Louis King

03Timeline

Important Dates

June32026 · 8:30 AM – 12:30 PM

Mile High 4AB
Colorado Convention Center
Denver, Colorado, USA

Full Papers (Archival) — Deadlines Passed

Mar 13, 2026

Full Paper Submission Deadline

Friday, 11:59 PM AoE

Mar 30, 2026

Acceptance Notification

Announced

Mar 30, 2026

Camera-Ready Site Opens

Monday — for accepted full papers only

Apr 10, 2026

Camera-Ready & Copyright Due

Friday — final papers and forms

Extended Abstracts (Non-Archival) — Deadlines Passed

Apr 25, 2026

Extended Abstract Submission

Saturday, 11:59 PM AoE — new deadline

May 05, 2026

Acceptance Notification

Tuesday, 11:59 PM AoE — no camera-ready required

PortraitCraft Challenge — Accepting Submissions

Apr 5, 2026

Challenge Start

Competition opens on CodaBench

May 17, 2026

Challenge End

Final submission deadline

Jun 3, 2026

Workshop Day

8:30 AM – 12:30 PM · Room Mile High 4AB, Colorado Convention Center, Denver

All deadlines are AoE (Anywhere on Earth). Full papers appear in CVPR Workshop Proceedings. Extended abstracts are non-archival.

04Schedule

Workshop Program

08:30 – 08:35	Welcome and Introduction
08:35 – 09:00	Keynote 1 — Luba Elliott 20 min talk + 5 min Q&A
09:00 – 09:25	Keynote 2 — Hadar Averbuch-Elor 20 min talk + 5 min Q&A
09:25 – 09:55	Sponsor Presentation & Challenge Winner Presentations 4 × 5 min talks + 5 min shared Q&A
09:55 – 10:05	Award Ceremony
10:05 – 11:25	Coffee Break & Poster Session (Exhibit Hall A)
11:25 – 11:50	Keynote 3 — Pinar Yanardag 20 min talk + 5 min Q&A
11:50 – 12:10	Archival Paper Presentations 2 papers · 7 min presentation + 3 min Q&A each
12:10 – 12:30	Panel Discussion and Closing Remarks

05Call for Papers

Submissions Closed

Track I — Archival

Full Papers — Submissions Closed

Novel, previously unpublished research. Accepted papers appear in the official CVPR 2026 Workshop Proceedings.

Length: 5–8 pages
Format: CVPR 2026 Style
Review: Double-blind
Supplementary: Optional

Track II — Non-Archival

Extended Abstracts — Submissions Closed

New, previously, or concurrently published research or work-in-progress. Will not appear in proceedings.

Length: 2 pages
Format: CVPR 2026 Style
Review: Double-blind
Camera-Ready: Not required

Submissions for both tracks are now closed. Thank you to all authors who submitted their work.

06Proceedings

Accepted Papers

Congratulations to all authors whose work was accepted to the 3rd AI for Visual Arts Workshop. Full papers (archival) appear in the official CVPR 2026 Workshop Proceedings; extended abstracts are non-archival. A total of 10 submissions have been accepted: 4 full papers and 6 extended abstracts.

Full Papers — Archival

Enhancing Spatial Understanding in Vision-Language Segmentation via Diffusion-Based PipelinesFilippo Santiano, Zejun Zhang, Deblina Bhattacharjee
Form and Void: Entangled Composition through an Autonomous AI AgentShiwen Wang, Jian Yang, Xu Wang, Xincan Wang, Weiming Dong
Conversational Inquiry vs. Explanatory Narration: How AI Mediation Shapes Viewer Engagement with ArtBlessin Varkey, Shrayon Bose, Akash Mohan, Kalipatnapu Sarma, Vikram Jamwal, Anil Sharma
Attention-Enhanced Multi-ControlNet for Artist-Aligned Manga Background GenerationLouis King

Extended Abstracts — Non-Archival

The Abstraction Gap: A Spectral Theory of Vision Encoder Robustness to Artistic StylizationKaustubh Bukkapatnam, Siddharth Karuturi
The RenAIssance: Style-Consistent Generation of Classical Aesthetics using BLIP and LoRA-Adapted Stable DiffusionTUO Zhanyu
The Artist's Mandate: Human-Aligned Adversarial Protection for ProvenanceNese Alyuz, Sinem Aslan, Ilke Demir
Teaching an Agent to Sketch One Part at a TimeXiaodan Du, Ruize Xu, David Yunis, Yael Vinker, Greg Shakhnarovich
Sculpting Equations, Not Pixels: Data-Free Generative Aesthetics with StackGPArtRuchika Gupta, Wolfgang Banzhaf
PainterBench: Benchmarking Controllability in Image Generation with Painter-Aligned MetricsJeffrey Liu, Surendra Pathak

07Competition

PortraitCraft Challenge- Accepting Submissions

The AI4VA Image Composition Challenge (PortraitCraft) features two competitive tracks that push the boundaries of AI-driven portrait composition understanding and generation. Participants are invited to develop novel methods for analysing and composing portrait imagery in artistic contexts.

PortraitCraft Challenge

Image CompositionPortrait Understanding & Generation

Dataset Schedule Tracks Overview Track 1 — Understanding Track 2 — Generation Prizes

Benchmark

Dataset Introduction

PortraitCraft is a benchmark dataset for portrait composition understanding and portrait composition generation. It is designed to support models in learning and evaluating composition quality in portrait images. The dataset focuses on images with humans as the primary subjects and covers a wide range of scenarios, including single-person and multi-person portraits, as well as half-body and full-body compositions. It emphasizes key composition factors such as subject prominence, pose quality, image layout, and overall visual atmosphere.

PortraitCraft supports two task directions: portrait composition understanding and portrait composition generation. The dataset is constructed through large-model-assisted filtering combined with evaluation by professional designers, ensuring high-quality samples and fine-grained composition annotations.

Dataset & Baseline

Download. The benchmark is hosted on Hugging Face: zijielou/PortraitCraft.
Baseline code. Official Qwen-VL fine-tuning framework: github.com/yytang25/qwen-vl-ft.

Timeline

Challenge Period

Key dates for the PortraitCraft challenge. See the workshop timeline for full paper and workshop dates.

Start: April 5, 2026
End: May 17, 2026

Competition

Two Challenge Tracks

The challenge is organized into two complementary tracks. Participants may choose to focus on structured analysis of existing portraits, generation from composition-oriented specifications, or both.

Challenge Track 1

Portrait Composition Understanding

Given a portrait image, predict the overall composition quality score, provide fine-grained attribute judgments, and answer a challenging visual question.

CodaBench: Track 1 competition →

Challenge Track 2

Portrait Composition Generation

Given structured composition descriptions, generate portrait images that accurately realize the specified layout and aesthetic intent.

CodaBench: Track 2 competition →

Track 1

Portrait Composition Understanding

Participate: Track 1 on CodaBench

Introduction

Portrait Composition Understanding aims to evaluate a model's ability to understand portrait composition in a structured and interpretable way. Given a portrait image, participants are required to produce three types of outputs: a predicted overall composition score, ternary judgments on dozens of predefined fine-grained composition attributes, and an answer to a carefully designed visual question that tests detailed understanding of the image content. Unlike traditional aesthetic assessment tasks that focus only on global quality prediction, this track emphasizes both attribute-level composition analysis and fine-grained visual comprehension. The goal is to encourage models that not only estimate how good a portrait is, but also explain why it is good and demonstrate that they truly understand the image.

Model Input and Output

In this track, the model takes a portrait image as input and is required to perform structured composition analysis. The task consists of a training stage and a testing stage.

(1) Training stage. The training data for Track 1 are provided in the form of image-text pairs. Each training sample consists of a portrait image and a corresponding text description. The text includes an overall composition score, the scores of 13 composition attributes, and explanations of these attribute-level judgments.

(2) Test stage. During testing, the model receives a single portrait image as input and is required to produce three types of outputs:

Overall composition score. The model should predict a single score that reflects the overall quality of the portrait composition.
Fine-grained attribute judgments. The model should provide ternary judgments for dozens of predefined composition attributes, indicating whether each attribute is good or poor for the given image.
Visual question answering. The model should answer a carefully designed multiple-choice question related to the image, requiring genuine understanding of the image content.

Together, these three outputs assess global composition evaluation, attribute-level composition reasoning, and detailed visual understanding within a unified framework.

Evaluation Metrics

Submissions will be evaluated using a unified score that jointly considers performance across all three required outputs. The exact weighting and implementation details of the evaluation protocol are not disclosed to ensure fairness and robustness of the benchmark.

Example

The pair below illustrates an input portrait alongside a visualization of annotation results, highlighting the kinds of fine-grained composition cues the challenge considers.

This figure is for illustration purposes only. Please refer to the official challenge documentation for detailed evaluation specifications.

Example input portrait for Track 1 — Figure 1 — Input Image

Annotation results for Track 1 — Figure 2 — Annotation Results

Track 2

Portrait Composition Generation

Participate: Track 2 on CodaBench

Introduction

Portrait Composition Generation evaluates a model's ability to generate portrait images from structured composition-oriented descriptions. Participants are provided with training data consisting of portrait images paired with composition-focused annotations. At test time, only the structured descriptions are given, and models must generate corresponding portraits. This track emphasizes whether models can accurately interpret and realize composition requirements in generated portraits.

Model Input and Output

In this track, the model takes structured composition-oriented descriptions as input and is required to generate corresponding portrait images. The task consists of two stages:

(1) Training stage. Participants are provided with portrait images paired with structured textual descriptions that focus on composition and aesthetic attributes such as subject placement, spatial organization, visual center, negative space, and overall composition style.

(2) Test stage. During testing, only structured composition descriptions are provided. Participants are required to generate portrait images that reflect the specified composition requirements.

Evaluation Metrics

The final score is computed based on the consistency between the generated images and the target structured composition descriptions. The evaluation focuses on whether the generated results accurately reflect the key composition and aesthetic characteristics specified in the descriptions.

Example

Figure 3 shows a reference portrait; Figure 4 shows an image regenerated from a structured composition description.

This figure is for illustration purposes only. Please refer to the official challenge documentation for detailed evaluation specifications.

Original reference portrait for Track 2 — Figure 3 — Original Image

Regenerated portrait — Figure 4 — Regenerated Image

Structured Description (Excerpt)

Full-body portrait photograph of a young woman dancing gracefully on a long wooden pier extending into shallow turquoise sea water at sunset.

Strong central perspective composition. The pier forms leading lines toward the horizon. The subject is positioned near the center axis of the frame. She stands on one foot with the other leg lifted and bent, arms extended outward in an expressive balanced pose.

A soft, faint rainbow appears diagonally across the sky from the upper-right corner to the lower-left region of the frame, forming a subtle diagonal compositional structure that enhances visual flow without dominating the scene.

…

professional photography, cinematic lighting, diagonal composition, leading lines, central perspective symmetry, elegant movement, balanced composition, subtle rainbow accent, high aesthetic quality, sharp details, realistic style.

Prizes & Presentation

Challenge Awards

Each challenge track will recognise its top two leaderboard submissions with cash prizes. Winners and runners-up across both tracks will also receive a dedicated presentation slot during the workshop.

🏆 Winner · Per Track

1st Place

$1,000

Awarded to the top leaderboard submission in each track

🥈 Runner-up · Per Track

2nd Place

$500

Awarded to the second-place submission in each track

Presentation slot. Each challenge winner and runner-up will present their work at the workshop — a 3-minute talk followed by a 2-minute Q&A per presenter.

Co-organised byMeitu

08Recognition

Workshop Awards

🏆

Best Paper Award

$1,000

Awarded to the highest-quality full paper submission demonstrating novelty, rigour, and impact at the intersection of AI and visual arts.

🎖️

Best Poster Award

$500

Awarded for an outstanding poster presentation that effectively communicates innovative research to the workshop audience.

All winners will be felicitated in the workshop award ceremony with certificates.

09Keynotes

Invited Speakers

Confirmed

Hadar Averbuch-Elor

Cornell University & Cornell Tech

Assistant Professor in Computer Science. Her research combines images, language and 3D geometry for building multimodal perception systems that can handle the full complexity of the real 3D world.

Confirmed

Luba Elliott

Curator & Researcher, Creative AI

Curator, producer and researcher specialising in AI in the creative industries. Organising Committee of CVPR leading the CVPR Art Gallery each year.

Confirmed

Pinar Yanardag

Virginia Tech

Assistant Professor of Computer Science at Virginia Tech, leading GEMLAB on controllable, personalized generative AI. Previously a Fulbright PhD Fellow at Purdue and postdoc at MIT Media Lab. NSF CAREER awardee; Emmy-nominated Creative Director for HBO's Westworld.

10Attending

Logistics for Presenters

Registration requirement: Each accepted archival paper must be registered under an author registration by May 5, 2026 or the paper will be eliminated from the proceedings. No exceptions. Non-archival papers do not need to be captured under an author registration.

◎

Presentation Guidelines

All posters: 42″ × 21″ (W × H, aspect ratio 2:1, landscape format). All accepted papers present a poster. Selected oral spotlight presenters will deliver 7 min presentation + 3 min Q&A, plus poster presentation.

Poster Board Assignments

All posters are in Exhibit Hall A. The boards will be available during 10:00 to 11:00. A reminder that each board face is for two posters. The assignments are as follows:

Board	Poster(s)
100	Enhancing Spatial Understanding in Vision-Language Segmentation via Diffusion-Based Pipelines Form and Void: Entangled Composition through an Autonomous AI Agent
101	Conversational Inquiry vs. Explanatory Narration: How AI Mediation Shapes Viewer Engagement with Art Attention-Enhanced Multi-ControlNet for Artist-Aligned Manga Background Generation
102	The Abstraction Gap: A Spectral Theory of Vision Encoder Robustness to Artistic Stylization
103	The RenAIssance: Style-Consistent Generation of Classical Aesthetics using BLIP and LoRA-Adapted Stable Diffusion
104	The Artist's Mandate: Human-Aligned Adversarial Protection for Provenance
105	Teaching an Agent to Sketch One Part at a Time
106	Sculpting Equations, Not Pixels: Data-Free Generative Aesthetics with StackGPArt
107	PainterBench: Benchmarking Controllability in Image Generation with Painter-Aligned Metrics

General Poster Information

Posters will be: 42″ × 21″ (W × H, aspect ratio 2:1, landscape format).

Logos and poster templates for Main + Findings & Workshops can be found at Google Drive →

Feel free to use your own artwork, but we recommend a 3 or 4 column layout, and to use little text and few large but expressive figures on your poster. The poster should not be a copy-paste of your paper but provide you the "tools" to give a 5–10 minute presentation of your work to any attendee. We recommend looking at posters from previous years for inspiration. Templates and logos posted above.

Poster Printing Information

CVPR 2026 offers a poster printing service for attendees who would like to collect their printed poster onsite at the Denver Convention Center. The link is active now:

https://cvprus.myprintdesk.net/login

You will be prompted during the order process to enter the presenter's name and contact information; make sure you have this information correct. It is very important to enter accurate information for proper distribution of the posters. Please list the Workshop and your paper ID number along with your full name.

All posters are printed on 8mil Satin Poster Paper, Full Colour and Single Sided. Posters are delivered rolled and delivery to the convention center is included. A PDF file is preferred. 100 DPI or vector art at full size is recommended but the file will automatically be scaled if needed. Please do not include bleed.

Early Bird

$55

Standard rate

Rush

$77

After May 17, 2026 5 PM EST

Express

$121

After May 24, 2026 5 PM EST

Online orders will close Friday, May 29 at 12:00 PM EST.

Local orders for Workshop Posters from ARC Denver are not available.

Put your FULL NAME, Workshop acronym, and PAPER ID Number at the Job Name when submitting the poster. Please note the earliest date you need your poster.

UPLOADED FILE IS FINAL. FILE CHANGES NOT POSSIBLE!

Poster Pickup Hours — Exhibit Hall A

Dates	Hours
Wednesday, June 3 & Thursday, June 4	7:30 AM – 3:00 PM
Friday, June 5 – Sunday, June 7	7:30 AM – 5:00 PM

Once you pick up your poster, it is your responsibility. CVPR will not hold your poster for you. Please remember to remove your poster at the end of your session or it will be discarded.

Questions & Support

Pre-Submission Questions

Marissa — marissa@ctocevents.com

For questions before submission only. Printer questions will not be answered at this address.

Order & Production Questions

RiotColor — riotmax@riotcolor.com

Phone: 410-992-9898

Please have your order number ready.

Website & Tech Support

DSF Team — dsf.team@e-arc.com

Phone: 866-414-6967

11Organizers

Organizing Committee

Lead Organizer

Deblina Bhattacharjee

Assistant Professor, University of Bath, UK (Publicity Co-Chair, CVPR)

Organizer

Questions?

db2466[at]bath[dot]ac[dot]uk

AI for Visual ArtsThe 3rd Edition

Where machines learn to perceive abstraction

Perception Under Abstraction

Provenance & Authenticity

Cross-Disciplinary Exchange

Topics of Interest

Award Winners

Important Dates

Full Papers (Archival) — Deadlines Passed

Full Paper Submission Deadline

Acceptance Notification

Camera-Ready Site Opens

Camera-Ready & Copyright Due

Extended Abstracts (Non-Archival) — Deadlines Passed

Extended Abstract Submission

Acceptance Notification

PortraitCraft Challenge — Accepting Submissions

Challenge Start

Challenge End

Workshop Day

Workshop Program

Submissions Closed

Full Papers — Submissions Closed

Extended Abstracts — Submissions Closed

Accepted Papers

Full Papers — Archival

Extended Abstracts — Non-Archival

PortraitCraft Challenge- Accepting Submissions

PortraitCraft Challenge

Dataset Introduction

Dataset & Baseline

Challenge Period

Two Challenge Tracks

Portrait Composition Understanding

Portrait Composition Generation

Portrait Composition Understanding

Introduction

Model Input and Output

Evaluation Metrics

Example

Portrait Composition Generation

Introduction

Model Input and Output

Evaluation Metrics

Example

Structured Description (Excerpt)

Challenge Awards

1st Place

2nd Place

Workshop Awards

Best Paper Award

Best Poster Award

Invited Speakers

Hadar Averbuch-Elor

Luba Elliott

Pinar Yanardag

Logistics for Presenters

Presentation Guidelines

Poster Board Assignments

General Poster Information

Poster Printing Information

Poster Pickup Hours — Exhibit Hall A

Questions & Support

Pre-Submission Questions

Order & Production Questions

Website & Tech Support

Organizing Committee

Deblina Bhattacharjee

Iris (Yin) Zhang

Bingchen Zhao

Haoxiang Li

Luoqi Liu

Sponsors

Questions?