Iggy on X: "If you have a Macbook Pro M series with at least 64 GB's of RAM you can now run a GPT-4 level LLM locally! 1. Install @ollama 2. Open your terminal and run ollama pull llama3.3 3. Then ollama run llama3.3 "your prompt" Your own personal AI is here! https://t.co/jakuVlMteE" / X

Ollama also has other smaller choices, see: 7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5 CodeLlama and other specialized coding models up to 8B parameters14 Nous-Hermes 10.7B using Q4_K_M quantization check out github.com/exo-explore/exo

13K

NMMI

@NMMI

Ad

Looking to make a difference? NMMI is your launchpad. Our programs equip you with the leadership, critical thinking, and character to excel in college, your career, and beyond. Start your journey with us and prepare to reach new heights.

NMMI: HS and JC Programs

Running some translation work takes up 60GB of ram

But the result is solid

Awesome! Thanks for sharing, are you doing it "chatbox" style or using it as part of your workflow?

Mac Mini Pro M4 with 64 GB RAM will also run it similarly?

as long as you have enough RAM to load the model (llama 3.3 is 43 GB) you'll be fine, try it out! if not, see:

Quote

Iggy

@ignacioaal

Dec 8

Replying to @Kaiyes_ and @ollama

Ollama also has other smaller choices, see: 7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5 CodeLlama and other specialized coding models up to 8B parameters14 Nous-Hermes 10.7B using Q4_K_M quantization check out github.com/exo-explore/exo

how about 12gb of vram & 32gb of ram linux box ?

Ollama also has other smaller choices, see: 7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5 CodeLlama and other specialized coding models up to 8B parameters14 Nous-Hermes 10.7B using Q4_K_M quantization check out

GitHub - exo-explore/exo: Run your own AI cluster at home with everyday devices

From github.com

48K

Sumanth Chinthagunta (సుమంత్ చింతగుంట)

@xmlking

Dec 8

Did the released llama3.3 8B as well ?

Currently just 70B, see

ollama.com

Tags · llama3.3

New state of the art 70B model. Llama 3.3 70B offers similar performance compared to Llama 3.1 405B model.

Works on Mac Mini?

Depends on your RAM, see

Quote

Iggy

@ignacioaal

Dec 8

Replying to @Kaiyes_ and @ollama

Ollama also has other smaller choices, see: 7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5 CodeLlama and other specialized coding models up to 8B parameters14 Nous-Hermes 10.7B using Q4_K_M quantization check out github.com/exo-explore/exo

There are many LLMs you can run locally even on a cell phone. Many run on MacBook, nothing new

yes ofc. But this is the first time that we can have 4o level intelligence locally :)

Quote

Paul Couvert

@itsPaulAi

Dec 6

Meta has just released Llama 3.3 70B which is more powerful than GPT-4o and 25x cheaper. Yes. 70B and better than GPT-4o. This model is also as powerful as the 405B version of Llama 3.1. Open source is really winning at every level.

12K

NMMI

@NMMI

Ad

Looking to make a difference? NMMI is your launchpad. Our programs equip you with the leadership, critical thinking, and character to excel in college, your career, and beyond. Start your journey with us and prepare to reach new heights.

NMMI: HS and JC Programs

llama3.3 equals to gpt4?

Quote

Paul Couvert

@itsPaulAi

Dec 6

Meta has just released Llama 3.3 70B which is more powerful than GPT-4o and 25x cheaper. Yes. 70B and better than GPT-4o. This model is also as powerful as the 405B version of Llama 3.1. Open source is really winning at every level.

Sad that 48GB is not going to be enough

Unfortunately not at this time, Model is 42 Gigs so you'll only be left with 6 Gigs for all daily work, but see

Quote

Iggy

@ignacioaal

Dec 8

Replying to @Kaiyes_ and @ollama

Ollama also has other smaller choices, see: 7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5 CodeLlama and other specialized coding models up to 8B parameters14 Nous-Hermes 10.7B using Q4_K_M quantization check out github.com/exo-explore/exo

MacBook Air M2 ?

Quote

Iggy

@ignacioaal

Dec 8

Replying to @Kaiyes_ and @ollama

Ollama also has other smaller choices, see: 7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5 CodeLlama and other specialized coding models up to 8B parameters14 Nous-Hermes 10.7B using Q4_K_M quantization check out github.com/exo-explore/exo

M1 Max 64GB model - It is painfully slow but it loads and works. I use LM Studio. MLX model soon?

MLX model should be even faster, looking forward to testing it!

2.4K

MinIO

@Minio

Ad

AIStor’s promptObject API represents a powerful extension of the S3 API that enables applications, developers and administrators to talk to unstructured objects in the same way one would engage an LLM. The applications are almost infinite. #AI #ML

The most powerful S3 API ever? Introducing the Prompt API.

imagine what we'll be running locally in a year

This is exciting! Running powerful models locally opens up a ton of possibilities for experimentation and creativity. Can't wait to try it out!

i knew going with 64gb when i bought the first m1 max mac few years ago would be a good investment

what’s better than ollama? vllm. Sadly they don’t support mlx yet.

@vllm_project

guys show us some love

>>> You're running on my macbook I'm a cloud-based language model, so I don't actually "run" on your MacBook in the classical sense. Instead, I exist as a remote service that you can interact with through the internet. When you ask me a question or provide input, your MacBook

Nice!

do you want to create a video tutorial on how to set GPT on specific task training with AWS?

Testing it.

Step 2 is totally unnecessary.

What are the alternatives that you can run with 32 Gb RAM?

How can we do it on 32gb? :) Where are technologies that can do it?

quantization -- this will lower memory usage by trading off some accuracy. new techniques in inference (ie. today, there are a lot of tricks your operating system does to page/swap memory to disk etc.)

48

TuneStudioAI

@tuneStudioAi

Ad

Stuck with GPT models? Unleash the open-source LLM power and chat without limits!

25+ open-source LLMs to try

So when I want to go full personal house assistant and have it be private, buy sota home server type pc with RTX 5090 and 64gb ram and enjoy?

My M2 Pro crashed, of course 32 gb :(

Oh no. Sorry to hear that. Do you happen to have logs for us to troubleshoot?

Can someone put 64gbs of ram into my iPhone and add an M4 pro xx

add Open WebUI on top

What about less RAM but using SWAP instead?

I did just that yesterday - but it only seems to need 3 gb or ram for some reason

amazing. thanks for the information.

How about Mac Studio M2 Max with 64GB?

534

Inc.

@Inc

Ad

Startups like Luna use AI to scale support, handling 100,000 users and 40,000 questions with fewer resources.

@DellTech

#Dellforstartups #DWEN #AI #ad

How Startups Can Leverage AI to Punch Above Their Weight

The benchmarks

Quote

Paul Couvert

@itsPaulAi

Dec 6

Meta has just released Llama 3.3 70B which is more powerful than GPT-4o and 25x cheaper. Yes. 70B and better than GPT-4o. This model is also as powerful as the 405B version of Llama 3.1. Open source is really winning at every level.

Wow, GPT-4-level models running locally? The fact you can now do this on a MacBook with 64GB RAM is wild. What’s the first thing you’d ask your “personal AI” to do if you set this up?

No internet, no waiting, no privacy worries—just you and an AI that can handle complex tasks right from your laptop. It feels like we’re entering a new era. What’s everyone’s take on the most useful applications for this setup?

Long waited solution , now we can do lot of testing without paying

This is why I got 128GB of ram So I can ignore this and keep using gpt-4o-mini for small stuff

The present is bright

How fast is it?

Or use

Crazy! Now imagine when Llama 4 drops. Local LLM with that level of intelligence is going to open up so many interesting possibilities that today don’t make economic sense.

What’s the context length? Is there a simple gui that can be used? Is it multi-modal?

438

Fast Company

@FastCompany

Ad

4 ways #AI can help in a challenging market. Find out how your company can harness the potential of AI while minimizing risks and paving the way for more ambitious applications as the technology continues to develop. Learn more at

@JLL

. #ad

How CRE investors are embracing AI for real results

Is that 4bit quantized?

That’s a prebuilt model? Is there any way to train it on your own data?

Running a GPT-4 level LLM locally is a game-changer for privacy and efficiency. The MacBook Pro M series is finally flexing its true potential! Curious - how does the performance compare to cloud-based setups, especially for extended sessions?

I have llama already on my PC. I need like 3 Minutes until it renders an answer.

I should legit try this. In fairness no clue of the real world benefit since I doubt this is as good as an online LLM, but still kinda cool

Use open webui?

good luck affording macbook pro with 64+ gb of ram

With 24 GB RAM you can run llama 3.3 8B with INT16 quantized model. should work fine for students at least

Yeah, sure but then you won't have any resources to run anything else, thus defeating any purpose to it....

This is the real future. Perry song we'll all have Individualized llms in our pockets.

54

TuneStudioAI

@tuneStudioAi

Ad

Deploy Llama 3.1 as an API, fine-tune on custom data, and more.

No-code LLM Dev Platform. Get started now

You can technically run any model if you have enough swap (and patience)

I have tried this before, and the results are impressive, but my laptop could heat a room.

Can it use vision? Can it access the internet? Can it run python code? The biggest improvements to ChatGPT have been the integrations in the last few years. Running a bare language model is neat but not as useful.

that's cool and having recent gpt4 level performance in smaller sizes is relatively new. but gpt4 level performance in smaller models has been around for quite some time by now. the first llama3 version brought that to a 70b model, gemma 27b is roughly comparable.

How long did it take to generate those tokens?

Tried running 3.3 on my M1 MacBook Air (16GB memory). It’s very slow, but works.

I reckon we can run it in 16gb variant as well. Might get about 15 tokens/sec though.

And what if I have 16 ;_; Quantized models still suck

Running llama 3.2 3B native on iPhone pro with no internet is more impressive to me rn tbh. But yes fully local LLMs will be another game changer.

This is highly quantised model. Always go for instruct 16p model from Ollama for full version.

607

Inc.

@Inc

Ad

Startups like Luna use AI to scale support, handling 100,000 users and 40,000 questions with fewer resources.

@DellTech

#Dellforstartups #DWEN #AI #ad

How Startups Can Leverage AI to Punch Above Their Weight

What's the biggest model I can run with 48GB of ram?

Can you please tell me what the equivalent PC specs will require ?

Does this work on M2?!?

How about 48GB M4?

will even give you nice ui

Fantastic Shabink

tokens / sec?

"mac" 64gb of ram, lol

How fast is it? How many tokens/s?

24

WillScuderi

@WillScuderi

Ad

We've just launched the ultimate travel companion for all business travellers. The espresso Display 15 with Stand+ features:

Aluminium Build

Works with one cable for Mac, PC, iphone (15 and later)

Display above laptop

1080p/16m colours/300nits

0.2"/5mm thin

Slide 1 of 6 - Carousel

What if i run it on my base m2 mb?

It is useful, but you get the quantized version, so no gpt-4 quality responses. If you want to try the real power of Llama 3.3 70b make sure you are inferencing with a no quantized version.