Shreya Gupta(@ShreyaByte) 's Twitter Profile Photo

Exciting AI trends for 2024! 🌟 Quantum AI shaping the future 🌐 AI jobs on the rise 🤖 Multimodal AI integration 🧠 Health care AI advancements 🏥 AI in customer service improving 🤝 Deep learning progress 📚 AI and robotics innovation 🤖 Ethical AI practices evolving 🌱

Exciting AI trends for 2024! 🌟 Quantum AI shaping the future 🌐 AI jobs on the rise 🤖 Multimodal AI integration 🧠 Health care AI advancements 🏥 AI in customer service improving 🤝 Deep learning progress 📚 AI and robotics innovation 🤖 Ethical AI practices evolving 🌱 #AI…
account_circle
Shreya Gupta(@ShreyaByte) 's Twitter Profile Photo

The Memory-Augmented Large Multimodal Model (MA-LMM) by AI at Meta is designed to enhance long-term video understanding by overcoming the memory and context limitations of previous models. Unlike traditional methods that struggle with large data sets, MA-LMM utilizes a memory bank…

account_circle
Juan Nathaniel(@juannat7) 's Twitter Profile Photo

We present our work on : Deep Generative Data Assimilation in Multimodal Setting that calibrates Earth system model state🌎 with diverse observations🛰️📡 using diffusion

Paper: arxiv.org/abs/2404.06665
Code: github.com/yongquan-qu/SL…

🧵1/6

We present our #CVPR2024 work on #EarthVision: Deep Generative Data Assimilation in Multimodal Setting that calibrates Earth system model state🌎 with diverse observations🛰️📡 using diffusion

Paper: arxiv.org/abs/2404.06665
Code: github.com/yongquan-qu/SL…

🧵1/6
account_circle
Shreya Gupta(@ShreyaByte) 's Twitter Profile Photo

🧠 Multimodal Large Language Models (MLLMs) are revolutionizing AI by integrating various data types, such as text, images, and audio, for more human-like interactions.

account_circle
AK(@_akhaliq) 's Twitter Profile Photo

Apple presents Ferret-UI

Grounded Mobile UI Understanding with Multimodal LLMs

Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with

Apple presents Ferret-UI

Grounded Mobile UI Understanding with Multimodal LLMs

Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with
account_circle
Shreya Gupta(@ShreyaByte) 's Twitter Profile Photo

Comparative advantages:

MA-LMM outperforms other models by integrating advanced memory management and multimodal learning, which are crucial for tasks like video summarization and event detection where understanding extended sequences is key.

account_circle
Shreya Gupta(@ShreyaByte) 's Twitter Profile Photo

Meta AI at Meta to launch small versions of Llama 3 next week

1. Meta Platforms is set to launch two small non-multimodal versions of Llama 3 next week, with the biggest Llama 3 model expected to be multimodal and released later this summer.

2. There's a general buzz around…

account_circle
Poe(@poe_platform) 's Twitter Profile Photo

Now on Poe: Gemini 1.5 Pro! This powerful new multimodal model from Google introduces an industry-leading context window of 1 million tokens (~700,00 words) with near-perfect retrieval, enabling complex tasks that require long-context understanding. (1/4)

Now on Poe: Gemini 1.5 Pro! This powerful new multimodal model from Google introduces an industry-leading context window of 1 million tokens (~700,00 words) with near-perfect retrieval, enabling complex tasks that require long-context understanding. (1/4)
account_circle
Bindu Reddy(@bindureddy) 's Twitter Profile Photo

Grok 1.5 Vision Preview

Very cool! Grok 1.5 Vision is a cool multimodal model that is competitive with GPT-4 in multimodal capabilities, including image and document understanding.

Here is an example of translating a sketch to Python code...

This model is a baby step in…

Grok 1.5 Vision Preview

Very cool! Grok 1.5 Vision is a cool multimodal model that is competitive with GPT-4 in multimodal capabilities, including image and document understanding.

Here is an example of translating a sketch to Python code...

This model is a baby step in…
account_circle
Gabriel Ilharco(@gabriel_ilharco) 's Twitter Profile Photo

Grok is going multimodal!

It’s incredible to see how fast a small, focused team can move. Kudos to the amazing team @xAI that made this possible

x.ai/blog/grok-1.5v

Grok is going multimodal!

It’s incredible to see how fast a small, focused team can move. Kudos to the amazing team @xAI that made this possible

x.ai/blog/grok-1.5v
account_circle
Adrian Dittmann(@AdrianDittmann) 's Twitter Profile Photo

Grok 1.5 is xAI's first-generation multimodal model with a wide array of capabilities such as Real-World Understanding.

(See example images and link to blog post)

Grok 1.5 is xAI's first-generation multimodal model with a wide array of capabilities such as Real-World Understanding.

(See example images and link to blog post)
account_circle
Nuke(@CryptonianNuke) 's Twitter Profile Photo

Have a read through this. If you are not bullish on what xAI is developing and how they are championing open source Ai, I can't help you.

If you need help, ask @GROK what the significance of Multimodal.

It will be able to process a wide variety of visual information and…

account_circle
Amna Al- Busaidi(@AmnaBusaidi) 's Twitter Profile Photo

When no gold standard exist, Yas Moayedi , Shelley Hall , Farhana Latif, Jeff Teuteberg are setting the silver standards in cardiac allograft surveillance with multimodal molecular testing! 🔬💓

When no gold standard exist, @YasMoayedi , @shelleyhallmd , Farhana Latif, @JeffTeuteberg are setting the silver standards in cardiac allograft surveillance with multimodal molecular testing! 🔬💓 #ISHLT2024
account_circle
Tim Zaman(@tim_zaman) 's Twitter Profile Photo

Grok 1.5 announcement includes the examples that made me feel like self-driving can eventually be done (better) as a subset of a more generic AI, eg a multimodal LLM as shown here. When I was at Autopilot, this made me feel a bit worried.
Such models can do examples like the…

Grok 1.5 announcement includes the examples that made me feel like self-driving can eventually be done (better) as a subset of a more generic AI, eg a multimodal LLM as shown here. When I was at Autopilot, this made me feel a bit worried.
Such models can do examples like the…
account_circle
Zhenhailong Wang(@zhenhailongW) 's Twitter Profile Photo

Large multimodal models often lack precise low-level perception needed for high-level reasoning, even with simple vector graphics. We bridge this gap by proposing an intermediate symbolic representation that leverages LLMs for text-based reasoning. mikewangwzhl.github.io/VDLM 🧵1/4

account_circle
Sasha Sheng 🫶🏼(@hackgoofer) 's Twitter Profile Photo

Super pumped for AI Engineer Foundation's hackathon this Saturday (April 13th) on Realtime Voice and Multimodal AI. Grateful towards Cloudflare as our location sponsor.

Prizes include: 4090 GPU and Apple Vision Pro or cash equivalent. Thanks to our sponsors: Daily, Oracle Cloud,…

Super pumped for @aiengfoundation's hackathon this Saturday (April 13th) on Realtime Voice and Multimodal AI. Grateful towards @Cloudflare as our location sponsor. 

Prizes include: 4090 GPU and Apple Vision Pro or cash equivalent. Thanks to our sponsors: @trydaily, @OracleCloud,…
account_circle