Twitter #multimodal hashtag • TwiCopy

Shreya Gupta

3 weeks ago

Exciting AI trends for 2024! 🌟 Quantum AI shaping the future 🌐 AI jobs on the rise 🤖 Multimodal AI integration 🧠 Health care AI advancements 🏥 AI in customer service improving 🤝 Deep learning progress 📚 AI and robotics innovation 🤖 Ethical AI practices evolving 🌱 #AI …

Exciting AI trends for 2024! 🌟 Quantum AI shaping the future 🌐 AI jobs on the rise 🤖 Multimodal AI integration 🧠 Health care AI advancements 🏥 AI in customer service improving 🤝 Deep learning progress 📚 AI and robotics innovation 🤖 Ethical AI practices evolving 🌱 #AI…

thumb_up_off_alt1

chat_bubble_outline0

account_circle

Shreya Gupta

2 weeks ago

The Memory-Augmented Large Multimodal Model (MA-LMM) by AI at Meta is designed to enhance long-term video understanding by overcoming the memory and context limitations of previous models. Unlike traditional methods that struggle with large data sets, MA-LMM utilizes a memory bank…

thumb_up_off_alt0

chat_bubble_outline0

account_circle

Juan Nathaniel

2 weeks ago

We present our #CVPR2024 work on #EarthVision : Deep Generative Data Assimilation in Multimodal Setting that calibrates Earth system model state🌎 with diverse observations🛰️📡 using diffusion

Paper: arxiv.org/abs/2404.06665
Code: github.com/yongquan-qu/SL…

🧵1/6

We present our #CVPR2024 work on #EarthVision: Deep Generative Data Assimilation in Multimodal Setting that calibrates Earth system model state🌎 with diverse observations🛰️📡 using diffusion

Paper: arxiv.org/abs/2404.06665
Code: github.com/yongquan-qu/SL…

🧵1/6

thumb_up_off_alt52

chat_bubble_outline0

account_circle

Shreya Gupta

2 weeks ago

🧠 Multimodal Large Language Models (MLLMs) are revolutionizing AI by integrating various data types, such as text, images, and audio, for more human-like interactions.

thumb_up_off_alt0

chat_bubble_outline0

account_circle

AK

2 weeks ago

Apple presents Ferret-UI

Grounded Mobile UI Understanding with Multimodal LLMs

Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with

Apple presents Ferret-UI

Grounded Mobile UI Understanding with Multimodal LLMs

Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with

thumb_up_off_alt2,3K

chat_bubble_outline0

account_circle

Shreya Gupta

2 weeks ago

Comparative advantages:

MA-LMM outperforms other models by integrating advanced memory management and multimodal learning, which are crucial for tasks like video summarization and event detection where understanding extended sequences is key.

thumb_up_off_alt0

chat_bubble_outline0

account_circle

Shreya Gupta

2 weeks ago

Meta AI at Meta to launch small versions of Llama 3 next week

1. Meta Platforms is set to launch two small non-multimodal versions of Llama 3 next week, with the biggest Llama 3 model expected to be multimodal and released later this summer.

2. There's a general buzz around…

thumb_up_off_alt0

chat_bubble_outline0

account_circle

Poe

2 weeks ago

Now on Poe: Gemini 1.5 Pro! This powerful new multimodal model from Google introduces an industry-leading context window of 1 million tokens (~700,00 words) with near-perfect retrieval, enabling complex tasks that require long-context understanding. (1/4)

Now on Poe: Gemini 1.5 Pro! This powerful new multimodal model from Google introduces an industry-leading context window of 1 million tokens (~700,00 words) with near-perfect retrieval, enabling complex tasks that require long-context understanding. (1/4)

thumb_up_off_alt218

chat_bubble_outline0

account_circle

Bindu Reddy

2 weeks ago

Grok 1.5 Vision Preview

Very cool! Grok 1.5 Vision is a cool multimodal model that is competitive with GPT-4 in multimodal capabilities, including image and document understanding.

Here is an example of translating a sketch to Python code...

This model is a baby step in…

Grok 1.5 Vision Preview

Very cool! Grok 1.5 Vision is a cool multimodal model that is competitive with GPT-4 in multimodal capabilities, including image and document understanding.

Here is an example of translating a sketch to Python code...

This model is a baby step in…

thumb_up_off_alt186

chat_bubble_outline0

account_circle

Gabriel Ilharco

@gabriel_ilharco

2 weeks ago

Grok is going multimodal!

It’s incredible to see how fast a small, focused team can move. Kudos to the amazing team @xAI that made this possible

x.ai/blog/grok-1.5v

Grok is going multimodal!

It’s incredible to see how fast a small, focused team can move. Kudos to the amazing team @xAI that made this possible

x.ai/blog/grok-1.5v

thumb_up_off_alt233

chat_bubble_outline0

account_circle

AI Breakfast

2 weeks ago

Grok goes multimodal with Grok-1.5V

Sample imagery analysis:

Grok goes multimodal with Grok-1.5V

Sample imagery analysis:

thumb_up_off_alt80

chat_bubble_outline0

account_circle

The Limiting Factor

2 weeks ago

xAI is crushing it 🤯

Multimodal ❤

xAI is crushing it 🤯

Multimodal ❤

thumb_up_off_alt91

chat_bubble_outline0

account_circle

Adrian Dittmann

@AdrianDittmann

2 weeks ago

Grok 1.5 is xAI's first-generation multimodal model with a wide array of capabilities such as Real-World Understanding.

(See example images and link to blog post)

Grok 1.5 is xAI's first-generation multimodal model with a wide array of capabilities such as Real-World Understanding.

(See example images and link to blog post)

thumb_up_off_alt158

chat_bubble_outline0

account_circle

Nuke

@CryptonianNuke

2 weeks ago

Have a read through this. If you are not bullish on what xAI is developing and how they are championing open source Ai, I can't help you.

If you need help, ask @GROK what the significance of Multimodal.

It will be able to process a wide variety of visual information and…

thumb_up_off_alt86

chat_bubble_outline0

account_circle

Amna Al- Busaidi

2 weeks ago

When no gold standard exist, Yas Moayedi , Shelley Hall , Farhana Latif, Jeff Teuteberg are setting the silver standards in cardiac allograft surveillance with multimodal molecular testing! 🔬💓 #ISHLT2024

When no gold standard exist, @YasMoayedi , @shelleyhallmd , Farhana Latif, @JeffTeuteberg are setting the silver standards in cardiac allograft surveillance with multimodal molecular testing! 🔬💓 #ISHLT2024

thumb_up_off_alt12

chat_bubble_outline0

account_circle

Tim Zaman

2 weeks ago

Grok 1.5 announcement includes the examples that made me feel like self-driving can eventually be done (better) as a subset of a more generic AI, eg a multimodal LLM as shown here. When I was at Autopilot, this made me feel a bit worried.
Such models can do examples like the…

Grok 1.5 announcement includes the examples that made me feel like self-driving can eventually be done (better) as a subset of a more generic AI, eg a multimodal LLM as shown here. When I was at Autopilot, this made me feel a bit worried.
Such models can do examples like the…

thumb_up_off_alt169

chat_bubble_outline0

account_circle

Zhenhailong Wang

2 weeks ago

Large multimodal models often lack precise low-level perception needed for high-level reasoning, even with simple vector graphics. We bridge this gap by proposing an intermediate symbolic representation that leverages LLMs for text-based reasoning. mikewangwzhl.github.io/VDLM 🧵1/4

thumb_up_off_alt93

chat_bubble_outline0

account_circle

Sasha Sheng 🫶🏼

2 weeks ago

Super pumped for AI Engineer Foundation's hackathon this Saturday (April 13th) on Realtime Voice and Multimodal AI. Grateful towards Cloudflare as our location sponsor.

Prizes include: 4090 GPU and Apple Vision Pro or cash equivalent. Thanks to our sponsors: Daily, Oracle Cloud,…

Super pumped for @aiengfoundation's hackathon this Saturday (April 13th) on Realtime Voice and Multimodal AI. Grateful towards @Cloudflare as our location sponsor.

Prizes include: 4090 GPU and Apple Vision Pro or cash equivalent. Thanks to our sponsors: @trydaily, @OracleCloud,…

thumb_up_off_alt66

chat_bubble_outline0

account_circle

LEAP-STC

2 weeks ago

#EarthVision : Deep Generative Data Assimilation in Multimodal Setting, a thread.

thumb_up_off_alt11

chat_bubble_outline0

account_circle