Post

Conversation

I honestly can't believe anyone would fall for this nonsense. But if you are willing to listen for a second to a real explanation, read on...
Quote
Elon Musk
@elonmusk
Replying to @SawyerMerritt and @Tesla
Lidar and radar reduce safety due to sensor contention. If lidars/radars disagree with cameras, which one wins? This sensor ambiguity causes increased, not decreased, risk. Thatโ€™s why Waymos canโ€™t drive on highways. We turned off the radars in Teslas to increase safety.
Show more
David Watson ๐Ÿฅ‘
Post your reply

Sensor fusion theory is all about handling these disagreements. The theory is that *IF* these uncertainties can be modeled properly, then the resultant combined estimate will be more accurate than relying on any one sensor.
Sensor fusion (e.g. Kalman filtering as an example) is used heavily in engineering, and it is USED AT AND ! Tesla estimating what torque to drive the car when the inclination or wind speed changes? That's using sensor fusion of gps, inertial sensors, etc..
You think SpaceX trying to land a rocket is based all on one sensor type? If more than more, how do you put all that information together? Sensor fusion...
This is how a dumb version works: take each sensors estimate, multiply by the confidence you have in the estimate, and add them together. Take Musks example of camera and microphone. In normal situations, you would give camera a high confidence and microphone medium.
So camera will heavily dominate decision usually. However say low-sun is occluding on-coming traffic. Suddenly camera confidence is low. Meanwhile the microphone is picking up a semi-truck honking its horn. Do you want your car to pull out bc camera thinks its safe?
No, you would prefer to use the adjusted uncertainty estimates to hold back, in this case. And because the audio would only dominate in such cases, the net result overall instances is a MORE ACCURATE MODEL.
Can the estimation of uncertainty be difficult? Absolutely. If only you had the most compute and data in the industry to do it. Oh wait...
So what's the deal with blatantly lying and disrespecting his own intelligent engineers? The reality is sensor fusion: 1) Uses more sensors = more $$$ 2) Requires more inference compute 3) Requires more training compute 4) Requires more engineering time
But proper engineering of it means it produces a MORE ACCURATE ALGORITHM. Elon simply does not want to incur the higher costs for that accuracy. And perhaps that makes sense, especially at this stage. But don't lie about it.
Even mobile telematics apps use sensor fusion with kalman filters. If one does not want to rely on more sensors for mission critical ops, itโ€™s either the cost or their infrastructure canโ€™t.
i think the real reason is the first principles stubbornness about humans driving with eyes. that and the advantage of all the cars they have on the road would go away if they need simultaneously recorded data from other sensors to train on.
Elon says he tried it and it made things worse. You say thatโ€™s a lie because your theory is correct and he has a motive to not spend money on lidar. Itโ€™s not a very strong argument.
lol, any bad engineer could say they tried to fuse 2 signals and it made the result worse- does that mean you trust them?
The more correct statement would be "given limited computing resources, it is better to spend those resources on inference than sensor fusion." Now refute that.
Your examples of torque sensing, microphones are sensors other than the ones he mentions: lidar/radar. I think youโ€™re misinterpreting Elonโ€™s point: he isnโ€™t broadly discrediting sensor fusion, just the fusion of those specific sensors which are both effectively visual modalities.
It doesn't make sense. They have multiple redundant cameras and are fusing those. Why can you fuse multiple visual modalities (two cameras) but not camera / lidar?
You have to make a distinction between sensor categories. Camera, lidar and radar are all vision sensors. There is not much point trying to combine sensors of the same category.
There still is a point, I mean overlapping cameras from different angles gives rise to better depth estimation. I wonder if thinks we should discard overlapping cameras too. Afterall it might confuse the algorithm!
No one claimed sensor fusion is not possible or not needed. You seem to be conveniently overlooking the aspect of domain differences. Estimating torque for vehicle, landing rocket, or lining up dragon to space station are relatively structured and predictable where the fusion can
if only there was a scalable learning technique and model structure to handle massive diverse sets of input data...
I honestly can't believe anyone would fall for the nonsense you just wrote. Low complexity, high efficiency and simplicity are the drivers for the best systems working best in nature and in technology and that is why sensor fusion for autonomous driving is a dead end and doomed.
I've seen your posts on TMC many years ago Alex, and you are not a Signal Processing nor Machine Learning expert. Your biased opinion means very little on this topic.
It will become evidently clear in the next 12-18 months. FSD V13 is already much smoother and less robotic than Waymo. And it can operate in other countries that Waymo canโ€™t even think about going to. has many videos of V13 already performing in NYC. Waymo just got
Yeah you can always make a competitive model using MORE information. Although his argumentation is flawed the conclusion holds
Yeah, my point is simply his argument if very flawed. Not that I wouldn't follow the same developmental steps.
This is a dumb argument, sensor fusion is less safe in the end because it requires more parts which makes things more complex and relies less on intelligence/AI. They are both safe but when you get to very complex use cases the less you rely on sensors and the more you rely on
I worked in industrial automation when these sensors and vision were in their initial stages. Musk is correct - and you are a fool to think that Waymo isn't retooling for the exact same solution. Any engineer understands this. Short sellers don't. If you think that the general
Stupid take. Your error is in your second post: "if these uncertainies can be modelled properly". Musk point is they cant and you did not understand that. As a tracking engineer I.agree with Musk. Your "blatantly lying" statement reveils you as a stupid Musk hater.
lol they've been modeled properly manually years ago, let alone utilizing deep learning to improve on that.
Congratulation: Using correct partial statements you built a provable false thesis. While sensor fusion is indeed been studied for more than 70 years, and has reached many goals, that you correctly mentioned, the thesis fails to recognize that the quantity of information that
Laughable. A forward-looking point cloud from a single Lidar is massively less data ingested than the raw camera data that then has to go through many layers before the equivalent density map is generated. People are missing that the camera-to-depth map is a major cog.
In the era of deep learning, scalable auto-labelled dataset is crucial. For FSD, human drivers provide exactly this: implicit labels when they intervene. For lidar/radar, how do you generate large-scale labeled data for the (measured_scene, state) โ†’ action?
It's a trade off, the more in vehicle compute you waste on noisy radar and contention the less you have for vision, which makes your vision worse, that's why removing radar made thing better.
You fail to mention one of the most important factors for self-driving which is latency. How long will it take to (1) get a good enough fusion model AND (2) have it be fast enough for split second decisions on the road?
Just reflect on how long you needed to realise this.. when Valueanalyst was a fanboy too how much you cheerled for Elon ? Never too late anyway
What you are overlooking is that lidar and radar are way less accurate, less resolved and have a much higher noise variance than photon sensors aka cameras have. Hence, any mechanism to unify any accurate sensor with less accurate sensors worsens the result more than it helps.
Tesla's perception stack at this point is most probably an end to end neural network. Not sure if Kalman filters working together with NNs is a thing anymore in Teslas, because their steering wheel is no longer controlled by handwritten code.
no kalman filters, but the concept underpinning it is being used in the NNs even vision only when integrating multiple cameras
It may be the case for the specific application, sensor fusion is worse. Especially with limited compute; 2x compute for video only might be better than adding new input weights from a lidar system. It's not the best explained, but it isn't nonsense
Quote
Raines
@raines1220
Elon always explains this without a concrete example, and thatโ€™s why most people still donโ€™t really get the point, and fail to understand why disagreement would reduce safety. Let me provide one simple real-world example: At night, LiDAR usually will have difficulty โ€œseeingโ€ a x.com/wholemarsblog/โ€ฆ
Show more
Not all wrong, and itโ€™s why many experts agree lidar is needed. I believe itโ€™s more about lidar being a crutch and the constraint of camera only forces innovation into the solution that actually matters, understanding all of the complexities of driving in the real world. You
I think you would have been better off telling people Kalman is a 60+ year old algorithm and was used in the Apollo program and extensively in finance since the 80s. People are duped into thinking this is way too hard or expensive to fuse uncertain sensor data..
Regular cars would also be safer with the lidar, but we don't have those cars in general because of the cost of the equipment. So why the double standards?
The guy who's famous for slave driving his companies to remove people, parts, and processes (if you don't have to add back 10% you didn't cut enough) is also opposed to the cost and complexity of lidar? I'm shocked, shocked!
He doesn't want his engineers to feel that there is any other way than vision, so that they can solve vision the best it can be solved. Also, split second decision making may actually degrade with sensor contention.
Elon is wrong about it the why. Sensor fusion isnโ€™t insurmountable.. it the latency introduced, and the excessive neurons wasted make it a bad idea to include LiDAR long term. But Waymo needs to get on highways before Tesla to prove it.
// I honestly can't believe anyone would fall for this nonsense. But if you are willing to listen for a second to a real explanation, read on... // He's also missing the point that "Radar" can see behind obstacles giving the AI additional data to avoid potential accidents that
I think latency is another issue sensor fusion brings, which is critical for fsd I'm sure ULTIMATELY more sensors is better. But within modern constraints - it is not
After seeing how my Model 3 handled storm-level rain at nightโ€”when I couldnโ€™t see a thing, even with the wipers maxed at highway speedโ€”my confidence in cameras shot up. While my eyes only picked up blurry lights, the car on v12 with HW3 kept mapping lanes and vehicles perfectly.
Musk certainly knows and uses sensor fusion, maybe better than anyone. One should assume he is informed on the subject and has concluded that, for self-driving, the error is higher with multiple sensor modalities than with pure vision.
The beauty of capitalism is competition where each company pursues what they think works best - Tesla bets on cameras, while others on lidar/radar This clash drives innovation - Society gains knowledge and customers get the safest, most economical option in the end
Everybody knows that the rockets at SpaceX only have one single sensor on them for ultimate safety and efficiency..
It's like wandering in those mirror mazes in the tivoli where you use your stretched out hand because the vision system is corrupt? But once in a while the mirrors are dirty and you can use your vision.
It seems cameras are way ahead here. Itโ€™s almost as if LiDAR/Radar is the desktop prior to the laptop, or the television channel book prior to the visual guide on the tv. It seems so good, so promising, that change isnโ€™t necessary, until itโ€™s too late. Only time will tell
It's a intuitive concept actually. How many humans would like to rely on sight alone to live out their lives? No touch, taste, or sound - imagine how difficult it would be to navigate on a busy sidewalk or even tie your shoes.
I agree with you, but itโ€™s easier to โ€œquicklyโ€ achieve consistent good result with focusing on 1 input. The only relevant question is, is this enough to already be safer than human? If you nailed the one sensor version and exhausted all the capabilities, you have a strong
The proof is in the product! I've been driven around by my personal chauffeur for the last 6 months on V13 ...it works phenomenally and is getting better with every release. The game is over, my friend. You just don't know it yet.
With driving being entirely about what drivers can see (and hear), what is the benefit trying to solve for sensor fusion in this use case?
the problem is for complex decision making, best way is to use learning system to map from pixel input from camera to decision, end to end. current use of ladar are mostly not plug the raw data into the neural net, but it's output of objects from the ladar independently, then use
Except that I use FSD everyday and it is SENSATIONAL๐Ÿ˜„๐Ÿ‘ If we believe the experts rather than the doers we would still be in an ICE car paradigm.
This works if you can model uncertainty. If not, you're screwed. Active sensors likely can't scale. Imagine dozens or hundreds of Lidars and Radars interfering at a busy intersection.
the ambiguity comes such that camera is superior in most cases. Lidar - not relevant as long as good cameras with proper camera placement. I think radar is where it's at. but resolution is poor. I believe that's why Tesla considered HD radar. Curious if cost was still
So, why does Waymo have to map every parking lot, freeway, side road or private road before it operates on them?
so it's a guesstimate at the end of the day and whoever cracks it to a workable model until any accident happens is likely to be trusted and used more
LiDAR doesnโ€™t have the resolution to read signs. That means cameras are needed. If you solve the range estimates with vision the. LiDAR has no need. Trying to engineer multiple sensors together is a nightmare and why Waymo is moving so slow.
As someone who knows a little about lidar, I am falling for what he says way more than what youโ€™re saying. Maybe the lasers see through moisture now. Idk. Lucky us, heโ€™s the only one that has to execute. He seems ok with his approach.
Yes, they can be fused for increased accuracy, but at the cost of reaction time. That split second delay can be life and death and the increased accuracy is not necessarily needed. Obviously, the increase in reaction time may not be issue very often, b know which is th good
Voting redundancy is fundamental in spaceflight where you need failsafe To think there's something about cars where voting disagreement would suddenly make everything dangerous because what if the car can't decide what to do is BS yeah If lidar and cameras disagreed so much to
Love the thread and I agree with your points. Side question: why do you use Mark Zuckerbergโ€™s eye in your profile picture ๐Ÿซฃ
I think weโ€™ll listen to the person with actual credibility behind themโ€ฆ Appreciate your thoughts though.
If Tesla can get their system to drive 10x better than humans it won't matter if Waymo's system can do 15-20x when the ride costs more and can't go everywhere. You don't need a perfect system.
In simplifying your stance: youโ€™re saying that the car is under engineered. His stance: is that adding more sensors would put it in the over engineered category.โ€, and ad needless costs. My stance: the answer is probably somewhere in the middle of more software and hardware -
Elon doesnโ€™t disagree with sensor fusion. Heโ€™s just pushing for the most scalable and simplistic method that can achieve the same results. The goal is ultimately safety.
I know of Elon credentials and his achievements disrupting several industries and doing what others can't. What are your credential?
How is it nonsense? What happens when your inner ear disagrees with your field of vision? I know because it happened to me.... You get freaking dizzy. Your body doesn't know what is right. My inner ear had to be returned because it was damaged by a virus. This is similar.
The actual question is: ARE THEY NEEDED? Since countless millions of humans drive every day without a spinning KFC bucket on their head, the answer is obviously: NO. Using the same inputs humans use, but instead of two gimbaling cameras, FSD uses a constant, unblinking,
yeah, i wonder how they integrate sensory data from more than one camera - this may be contradictory too what an idiotic statement to make
I have the BMW LiDar. Itโ€™s a pain in the ass in real world daily practical driving. Disconnected the entire annoying autonomous driving functions.
If thatโ€™s true then why donโ€™t we put more sensors into everything? Washer, dryer and microwave could use more sensors too. Obviously there are so many variables youโ€™re ignoring, and youโ€™re taking Muskโ€™s statement at face value which shows lack of intelligence.
Generally agree. As you point out, there are two real issues here: 1. Lidar is expensive (~$1000?) 2. Lidar and camera are BOTH require massive computation As long as camera is good enough, lidar is less necessary. And modern neural nets do the sensor fusion inside a single
Ok so you're acting all situations and solutions should use all possible available sensors? Should SpaceX use hall sensor to dock with ISS? Well probably not because it would give something but something unnecessary.
I think the reason that Tesla dropped radar is because when in conflict they found vision was nearly always correct, which means radar wasn't adding any value. They didn't have lidar, but if they did the results would have been very different, as lidar is much better than radar.
On one hand we have Elon the guy who land rockets, electrical car, neural ink etcโ€ฆproving multiple times right and recognising when is not say vision is enough and on another side someone who make euuuuuuuh, I search but i canโ€™t found. Sorry I will trust Elon just for facts
What is the optimal number of different modal sensor then? In other words, when do you know when to stop adding sensor? More parts, more problemโ€ฆyouโ€™re arguing against a fundamental idea in math and physics
Reading replies to this thread shows how easily Musk can deceive followers. Sensor fusion is key to robotics and enables SpaceX rockets to land (using GPS, inertial units, radar, etc.). It's sad that people ignore math and trust a grifter.
When different sensors disagree, vision provides a single coherent frame of reference, the actual scene itself. Whereas multimodal fusion can only reconcile conflicting abstractions of that same scene. The ground truth still lies in vision.
Contention could be useful if properly implemented, especially if it relates to safety. I like to see how vision sensors in their own work during a Chicago winter with heavy rain and icing.
Agree, more information is better than less. The statement that additional sensors "confuse" the algorithm may be true for an existing system but not as universal truth. Instead we should explore Pareto optimality between performance and cost as number and types of sensors grow.
Thanks for this. It's amazing how people fall for Musk's crap and ever-changing arguments. The inane comments people make in this very thread are also amazing. (I drive a Tesla and I like it, but the more I learn who Musk really is, the less inclined I am to get a new one.)