Behind the end-to-end endgame: Open source becomes a big deal, and the power object becomes the synthetic data touted by Tesla?

#Car ·2024-06-14

端到端终局背后:开源成重头戏,发力对象变特斯拉鼓吹的合成数据?

The author | Hua Wei


With the development of AI and large model technology, automatic driving technology has also entered a new stage, and recently, "end-to-end automatic driving" has become the focus of the automatic driving industry as one of the most important technological evolution trends.

 

On June 12, at the end-to-end Leading the new Era of automatic Driving Summit Forum hosted by Chen Tao Capital and Nanjing University Shanghai Alumni Association Automatic Driving Branch, a number of intelligent driving head enterprise representatives, as well as industry experts from investment institutions and research institutions, expressed their cutting-edge views on the future trend of end-to-end technology and the shortage of data. And a roundtable dialogue based on "Big models and trends in AGI in the physical world."

 

Dr. Liu Yudong, investment manager of Chen Tao Capital, said that the head autonomous driving company has accumulated rich experience in end-to-end research and development, and has emerged technical solutions such as UniAD and FSD that can be mass-produced, and will be mass-produced in the next six months to one year. This year or next year, Oems will have a preliminary end-to-end plan on board.

 

At the scene, Chen Tao Capital also jointly released the 2024 "End-to-end Autonomous Driving Industry Research Report" with the Automatic Driving Branch of Nanjing University Shanghai Alumni Association. According to the survey, 90% of them said that their companies have invested in the development of end-to-end technology, end-to-end has gradually become the consensus of the autonomous driving industry, but it also faces many challenges in terms of landing, including technical routes, data and computing needs, testing and verification, and organizational resource investment.

 

At present, autonomous driving players such as Huawei, Xopeng, Yuanrong Qixing, and Sensetime Jiying have proposed end-to-end mass production planning, and it is expected that the modular end-to-end system will begin to board the car in 2025.

 

End-to-end future evolution

 

"The architecture evolution of autonomous driving is divided into four stages, from modular/rule-based to end-to-end/data-driven, and the end-to-end definition category is the third and fourth stages: modular end-to-end, One model end-to-end. A world model that emphasizes generative power can provide training data and can also be a way to implement one model."

 

Liu Yudong introduced that the current end-to-end facing 6 landing challenges: the technical route is not fully determined; Training data requires high quality; Training computing power requires tens of thousands to hundreds of thousands of Gpus, which will limit our development progress; The test and verification method is not mature, the traditional method is single module test; The focus of organizational resources shifts from engineers to data infrastructure and data input; On-board chip computing and interpretability issues do not limit end-to-end landing.

 

In the future, the open source community will play an important role in end-to-end technological change, just like the evolution of BEV algorithms. Closed-loop simulation will become an important foundation, which is the most important technological change besides the end-to-end technology itself. In terms of chip architecture, the computing power of the chip itself is not a limitation, but more about how the chip design itself supports rapid algorithm iteration, including flexible chip IP and the architecture supporting transformer.

 

At the same time, he pointed out that the end-to-end autonomous driving and robotics industry is highly related, and will go through three stages after: 1. Autonomous driving borrows technology from the robot industry; 2. End-to-end technology feeds back the robot; 3. Autonomous driving and robots compete for the physical world AGI. The advantages of automatic driving are structured scenes and data acquisition paths, while the advantages of robots are low safety requirements.

 

In this regard, Jianzhi Robot co-founder and CTO Du Dalong also made a further explanation. The reason why pan-robotic systems need an end-to-end model of "perceptual decision planning" is that there are endless problems that cannot be solved by rules, but can only be solved end-to-end. In the future, the world model can become an autonomous driving model, but at present it will not because the model is too large, and end-to-end autonomous driving is the final route.

 

Synthetic data vs. real data

 

"Synthetic data is the most effective way to address the end-to-end data shortage." Xie Chen, founder and CEO of Nimbus Intelligence, pointed out that Sora uses a lot of synthetic data for training; About 30% of Tesla uses synthetic data; About 30% of NIO uses synthetic data; About 50% of Cruise uses synthetic data; About 80% of Nvidia self-driving uses synthetic data.

 

Among them, Tesla believes that the autonomous driving paradigm is Transformer and data, to build a data closed loop, and to do end-to-end algorithms through the car end data loop. "A million cars can experience the power of the data loop, change the code and send it to the European fleet, and the data can be back in a day." In addition, Tesla has accumulated synthetic data, first for perception and later for end-to-end training.

 

Xie Chen introduced that end-to-end automatic driving mainly needs three aspects of data, including visual and physical authenticity, Agent interaction and scale efficiency, and traditional synthetic data is difficult to meet these three at the same time. Within three years, synthetic data will be the primary source of large model data.

 

In Du Dalong's view, the BEV doesn't need so much data, and Tesla CEO Elon Musk is exaggerating a bit. Using binocular scheme to do OCC only needs 1% data, first add some reasonable constraints, and use a graph to model the relationship between dynamic targets and static targets, which can improve the efficiency of data utilization and computing power utilization. It is important to ensure that the modeling is derivable and differentiable and can be optimized from end to end.

 

As for the proportion and importance of synthetic data and real data, Zhang Peng, vice president of Zhisquare technology products, said that synthetic data is definitely needed at the moment, but the mode of finding data in the future may be different, and the demand for data is changing. It is a process for people to discover, verify and use laws in nature, and it is possible that the model also needs this process. At the bottom level, it's still how to use data.

 

"High-quality data is the most important, and the ratio between synthetic and real data depends on the scene." Professor Dai Xinyu, deputy dean of the School of Artificial Intelligence at Nanjing University, said that for example, in terms of text, synthetic data may not be a good scene because it does not conform to human values, but synthetic data can simulate more scenes in automatic driving.

 

Wang Panqu, an intelligent driving partner of Zero One Automobile, pointed out that reinforcement learning has played a great role in GPT3.5 and 4, and the introduction of end-to-end closed-loop is an efficient use of simulation data. The combination of simulation data and reinforcement learning is where the effort is needed.

 

Trends in large models and AGI

 

Q: Is Transformer the infrastructure for future large models? Can you briefly share the deduction of the future model evolution?

 

Dai Xinyu, deputy dean of School of Artificial Intelligence, Nanjing University: Transformer has been validated by NLP and multi-mode since it was proposed in 17 years, so it has become a mainstream neural network structure. It's working well, but it hasn't reached its potential yet. The disadvantages of Transformer are high training energy consumption, multiple multiplication operations, and general interpretability. Although Transformer has a thinking chain, it does not have good reasoning ability. Transformer still has a lot of room for development in the next 3-5 years, but there are also other models worthy of exploration by the academic community. At present, attention is paid to whether the neural symbol model, quantum computer and other architectures have potential models other than Transformer.

 

Wang Panqu, partner of 01 Automobile Intelligent Driving: Transformer has strong versatility and generalization, and its advantage is that no matter what mode, image, sound or text can be deeply encoded through query, and the output is also diverse. Its versatility ensures that various tasks can be painlessly migrated and expanded, and the multi-task network is integrated under one model. In the future, Transformer has a lot of potential, but it won't dominate the world. Now Transformer is good at large models and decision making, and in the future models like Diffusion and 3DGS will be more useful for simulation and real world rendering.

 

Zhang Peng, Vice President of Zhisquare Technology Products: Transformer is currently more effective and a variety of modes can be unified output basis, Diffusion or 3DGS has been applied in the subdivision of the field, more advantages in the landing and scene at what cost to achieve what kind of ceiling, Transformer may be just a process.

 

Zhou Chongjie, Investment Director of Honghui Fund: Compared with human brain, Transformer has defects in reasoning efficiency and computing power utilization. Now it has a very impressive performance, whether it is Transformer based optimization or hybrid model or new architecture model, I think there will be some new things out in the future.

 

Q: Will scaling law hit a wall? Can scaling law of a language be copied to multimodality?

 

Wang Panqu: From the point of view of the language itself, perhaps 90% of the data of GPT5 comes from the simulation data, if the simulation data is not capped, then the scaling law has no ceiling. In other areas, whether data can catch up with demand, such as autonomous driving data collection costs are high, and involves safety and so on. The question going forward is whether there will be a bottleneck in data collection that will not validate scaling law.

 

Dai Xinyu: How much data can make the multi-modal leap is a problem, and it is not necessarily better to have a larger amount of data. For example, elephants have a lot more neurons than humans, but their IQ is much lower than humans. The saturation effect may cause the model to reach a certain size and no longer be able to make large leaps.

 

Peng Zhang: scaling law has been verified by everyone before the large language model, and the multi-modal data is more than the language data, but we did not find a multi-modal paradigm to increase the data volume, we must first find this way, and then verify whether the scaling law is valid in multi-modal. In addition, the computing power needs to reach a balance point in a certain scenario, such as automatic driving, and there is no need to put a large model on the car. Therefore, in some scenarios, it may not be necessary to have so much data, and it is more to find the rule before looking for the data.

 

Zhou Chongjie: The model with a large number of parameters is better at present. scaling law can be achieved to a certain extent, but it is limited by data and high quality data, which needs follow-up verification. In addition, computing power and electricity may be far from enough, and the demand for electricity in 2026 May reach 860 billion KWH, which is difficult to support and also poses a challenge to scaling law. So finding better data, cleaning the data or reducing rank and distilling the model are all directions that need to be explored.

  • 13004184443

  • Room 607, 6th Floor, Building 9, Hongjing Xinhuiyuan, Qingpu District, Shanghai

  • gcfai@dongfangyuzhe.com

  • wechat

  • WeChat official account

Quantum (Shanghai) Artificial Intelligence Technology Co., Ltd. ICP:沪ICP备2025113240号-1

friend link