Hareesh Ravi



avatar



FB profile      LinkedIn profile      Google Scholar Profile      GitHub page

               
Twitter page         Email address
I am a Senior Applied Research Scientist at Adobe's Applied Research team working on language-vision research and Generative AI. I completed my PhD from the CS department at the Intelligent Visual Interfaces lab in Rutgers University . My PhD thesis was on Multimodal Story Comprehension, under the supervision of Dr. Mubbasir Kapadia and Dr. Gerard De Melo. My research interests are on vision-language understanding text with applications to multimodal story comprehension including, story illustration, visual storytelling, image captioning and text-to-image retreival/generation. My current work is particulaly focused on text to image generation with diffusion/flow based models, interactive editing and large multimodal models.

Internships


We are always on the lookout for summer PhD research interns to work on text to image generation, interactive and multi-turn editing and Multimodal models. If you are interested, reach out to me via email or linkedin with your CV. Below is a list of current and past summer interns.

2024
Maitreya Jitendra Patel (ASU) - Disentangled Representation Learning
Song Wen (Rutgers) - Object Centric Representation Learning
Manuel Brack (TU Darmstadt) - Text to Image Generation
Xianghao Kong (UC Riverside) - Image Generation as Scene Composition

2023
Wonwoong Cho (Purdue) - Controllability of Diffusion Models (ECCV 2024)



Research Projects / Industry


Highlighting projects and softwares that I developed or led the development of.

avatar
Adobe's latest text to image generative model, FireFly Image Model 3 with higher quality and text to image alignment. News coverage in tech radar, PC Mag, Venture Beat and The Verge. I was the technical lead of this project and contributed to the core architecture, training and inference improvements.


avatar
State of the art zero shot stylized image generation enables generating an image conditioned on text prompt and a style reference image released in FireFly Image Model 2. Selected news articles that talk about the model and its capabilities are CG Channel, Tech crunch, AI Business. The feature is based on my patent on zero-shot stylized image generation on Diffusion models.


avatar
Structure Match enables generating an image conditioned on a text prompt and a reference image to match the structure released in FireFly Image Model 2. Structure and Style match both were rated as best in class by ZDNet. I was the technical lead of this project.


avatar
Photographic and content type control on Adobe's FireFly Image Model 2 based on our research. Here is the release blog from Adobe. Project was part of the FireFly 2 project that I co-led.


avatar
FireFly Image Model 2, Adobe's text to image generative model that gives high quality, aesthetics and reaslism and trained only on copyrighted data. Selectd news articales that covered the model include CNET, Tech crunch, AI Business. I contributed to the core architectural design choices that improved realism, text to image alignment and aesthetics and was the co technical lead of the project.





Selected Research Publications


avatar
Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods
Wonwoong Cho, Hareesh Ravi, Midhun Harikumar, Vinh Khuc, Krishna Kumar Singh, Jingwan Lu, David I. Inouye, Ajinkya Kale
ECCV 2024

avatar
PREDITOR: Text Guided Image Editing with Diffusion Prior
Hareesh Ravi, Sachin Kelkar, Midhun Harikumar, Ajinkya Kale
Preprint 2023

avatar
Controlled and Conditional Text to Image Generation with Diffusion Prior
Pranav Aggarwal, Hareesh Ravi, ..., Ajinkya Kale
Preprint 2023

avatar
Cross-Modal Coherence Model for Text to Image Retrieval
Hareesh Ravi, Malihe Alikhani, Fangda Han, Mubbasir Kapadia Vladimir Pavlovic Mathew Stone
AAAI 2022

avatar
AESOP: Abstract Encoding of Storied Objects and Pictures
Hareesh Ravi, Kushal Kafle, Scott Cohen, Jonathan Brandt Mubbasir Kapadia
ICCV 2021


avatar
Show Me a Story: Towards Coherent Neural Story Illustration
Hareesh Ravi, Lezi Wang, Carlos Muniz, Leonid Sigal Dimitris Metaxas Mubbasir Kapadia
CVPR 2017




Work Experience


May, 2020 - Aug, 2020
Deep Learning Research Intern
Advisers: Dr. Scott Cohen, Dr. Kushal Kafle, Dr. Jonathan Brandt
June, 2017 - Sep, 2017
Project Associate Intern
Adviser: Dr. Mubbasir Kapadia
Nov, 2013 - June, 2016
Research Associate
Adviser: Dr. A. V. Subramanyam




Teaching Experience


Teaching Assistant (Fall 2016)
Intro to Discrete Structures
Teaching Assistant (Spring 2017)
Principles of Programming Languages
Teaching Assistant (Fall 2017)
Topics in AI - Data StoryTelling
Teaching Assistant (Spring 2021)
Intro to Discrete Structures




Professional Experience


Program Committee
CVPR 2024
Program Committee
AAAI 2023, CVPR 2023
Program Committee
AAAI 2022, CVPR 2022, WACV 2022, ECCV 2022
Program Committee
NAACL 2021, ACL 2021
Reviewer
EMNLP 2020