Like a lot of folks, I’ve been messing about with the various AI image generators as they open up.
While at Google I got to play with language model work quite a bit, and we worked on a series of projects looking at AI tools as ‘thought partners’ – but mainly in the space of language with some multimodal components.
As a result perhaps – the things I find myself curious about are not so much the models or the outputs – but the interfaces to these generator systems and the way they might inspire different creative processes.
For instance – Midjourney operates through a discord chat interface – reinforcing perhaps the notion that there is a personage at the other end crafting these things and sending them back to you in a chat. I found a turn-taking dynamic underlines play and iteration – creating an initially addictive experience despite the clunkyness of the UI. It feels like an infinite game. You’re also exposed (whether you like it or not…) to what others are producing – and the prompts they are using to do so.
Dall-e and Stable Diffusion via Dreamstudio have more of a ‘traditional’ tool UI, with a canvas where the prompt is rendered, that the user can tweak with various settings and sliders. It feels (to me) less open-ended – but more tunable, more open to ‘mastery’ as a useful tool.
All three to varying extents resurface prompts and output from fellow users – creating a ‘view-source’ loop for newbies and dilettantes like me.
Gerard Serra – who we were lucky to host as an intern while I was at Google AIUX – has been working on perhaps another possibility for ‘co-working with AI’.
While this is back in the realm of LLMs and language rather than image generation, I am a fan of the approach: creating a shared canvas that humans and AI co-work on. How might this extend to image generator UI?