A year to hit 1 billion is just the beginning, by the "myth" end-to-end, in China's autonomous driving circle "best practice" can make money?

#Car ·2024-09-23

一年砸 10 亿只是开始,被“神话”的端到端,在中国自动驾驶圈的“最佳实践”能挣钱吗?

The author | Hua Wei


It is expected that in less than half a year, Tesla FSD will officially enter China. On September 5, Tesla announced that FSD will be launched in China and Europe in the first quarter of 2025.


Not long ago, the end-to-end Tesla FSD V12 version has been praised by many inside and outside the industry after it was pushed. Even He Xiaopeng, chairman of Xopeng Motor, who has publicly "exchanged words" with Tesla many times, issued a message evaluating Tesla's automatic driving "excellent performance" and excitedly said that "2025 will be the ChatGPT moment of full automatic driving!"


The large model represented by GPT is deeply influencing the solution research and development mode in the field of autonomous driving with its unprecedented speed of innovation and technical architecture, and the global industry map is rapidly responding to this upsurge. From the current focus of domestic car companies, end-to-end has also become its new generation of autonomous driving technology route.


Huawei, Xiaopeng, Pony Wisdom, Momenta, Excellent Technology, Horizon and other passenger car autonomous driving companies are actively following up, and have launched end-to-end autonomous driving solutions and models for mass production. In terms of commercial vehicles, Zero One Automobile also announced a clear time plan for end-to-end large model boarding. Li Xiang, the founder and CEO of Ideal Car, also publicly claimed that Ideal Car will achieve L4 autonomous driving in three years by relying on end-to-end and world models.


Even the L4 autonomous driving market, which had previously suffered a "cold wave", has also rebounded due to the arrival of end-to-end technology. Wayve, which has raised $1 billion in funding based on this technology concept, is a good example. "The end-to-end commercialization of L4 opens the second growth curve," said Liu Yudong, investment manager at Chentao Capital.


With an end-to-end leap in FSD capabilities, Tesla also announced that it will launch a Robotaxi model on October 10. He Xiaopeng has also publicly revealed that Xiaopeng Motors will launch a Robotaxi in 2026. However, the recent actions and expectations of car companies and autonomous driving manufacturers to achieve mass production L4 through end-to-end solutions have attracted many autonomous driving practitioners' questions: Is end-to-end being excessively "mythical"?


End to end why jump smart driving circle "top flow"?


End-to-end is not something that has just emerged in the last year or two, with many companies exploring the possibilities of this technological route as early as 2017. This year, "end-to-end" has turned red in the automatic driving circle and is regarded as the killer technology in the industry, in addition to the innovation brought by the large language model such as ChatGPT, and its own "charm" is also inseparable.


"The birth of the end-to-end model is the only way for autonomous driving technology to be commercialized at scale." Lou Tiancheng, co-founder and chief technology officer of Pony Wisdom, said that one of the biggest advantages of the end-to-end model is generalization, and the generalization performance is enough to improve the speed of commercialization of automatic driving and accelerate the popularization of automatic driving.


According to Wang Panquan, head of intelligent driving of Zero One Automobile, compared with end-to-end, traditional non-end-to-end automatic driving systems not only have poor generalization, but also when expanding to new scenarios, many previously used rule-based schemes will fail, and the newly added code will make the system's maintainability worse, resulting in an increase in marginal cost.


In addition, there are two disadvantages of traditional autonomous driving systems. The first is the complexity of the architecture, the multi-module system not only the development cost is higher, because each module is allocated to less computing resources, its performance ceiling is relatively low, the communication between modules will also bring a lot of engineering optimization problems. The second is the high cost of complex architecture, each module needs to do development, maintenance, project management and integration, which is also the reason why the team size of traditional autonomous driving companies is very large.


"In my opinion, end-to-end is a good way to solve these problems." Wang Panqu said that from the perspective of architecture, there is only one module end-to-end, which can solve complex problems of architecture well, and also has the advantage of cost reduction and efficiency increase. The end-to-end generalization based on data or even knowledge is very strong, and it is very possible to quickly achieve mass production, which can not only reduce the cost of adapting L2 to various models, but also help L4 reduce the time to adapt different scenarios.


In addition, Lou Tiancheng pointed out that the biggest benefit of end-to-end is to prevent the loss of information between different modules and functions. However, the end-to-end unified module scheme does not have such information loss, which helps to improve the final algorithm effect.


The first is the module error, because the end-to-end in a module, there is no multi-module error amplification effect, the overall intelligent driving algorithm ability ceiling can also be maximized. Secondly, in the multi-module architecture, each module has its own research and development rhythm and optimization objectives, which can not always strictly align with the global optimization objectives of the entire intelligent driving system, resulting in potential ineffective optimization and waste of R&D resources. The end-to-end architecture has only one module, and the optimization goal is clear and unified, which can effectively avoid this internal friction optimization process.


Another point is that the components of a modular architecture naturally tend to form multiple rule-driven "domains" among themselves, bringing with them a series of maintenance challenges and corner case solutions. As a typical complete data-driven architecture, end-to-end will encourage developers to consider and solve problems from a data-driven and model-driven thinking paradigm, and improve the cognitive level of the entire algorithm team.


"Overall, the end-to-end system is more efficient to develop and consumes less resources." Liu Yudong said that the end-to-end pure data-driven development paradigm will reduce a lot of the original heavy engineering resource investment, and shift the enterprise's resource focus to data-driven high talent density and data accumulation investment.


It is worth mentioning that the user value brought by end-to-end is also concerned. Liu Yudong pointed out that, first, in the processing of long-tail scenes, the end-to-end system can cover more extreme scenes than the original system, such as common sense processing capacity. Second, the behavior of the automated driving system is more anthropomorphic, and it is also more able to establish trust between the consumer and the system, and it is more like a human driver in the scenario of strong game.


The upper limit is high, the lower limit is low, and the "end game" of automatic driving has not yet arrived?


Although the end-to-end technology advantages are significant, and a number of car companies and autonomous driving companies are actively following up the end-to-end application, but for the so-called "final mode", the industry is still divided.


Firm such as Wang Panqu said, "I believe that end-to-end must be the final form of automatic driving, but end-to-end is only a technical framework, but the specific way to achieve it actually has a lot of options, the industry has not reached a consensus."


In popular terms, good results can be achieved if done well, and bad results are worse than traditional solutions. For Level L5 unmanned driving, end-to-end is the only solution; But for L2 and L3, end-to-end is just one of the possible solutions. In addition, the end-to-end application needs to be combined with other technical solutions.


"End-to-end provides a good technical path for the rapid and large-scale popularization of autonomous driving, and it remains to be seen whether it is the end game." Lou Tiancheng also has a similar view, that at present, whether it is L2 or L4 automatic driving has been realized, but the quality of the realization and how much scope to achieve it have different requirements and standards for technology.


For L2 level autonomous driving, end-to-end technology is currently the preferred path; For Level 4 autonomous driving, end-to-end can help it quickly open up new areas. However, L4 has higher safety requirements, to reach more than 10 times that of human drivers, so in addition to using end-to-end, it is also necessary to integrate high-certainty instructions with driving intentions and application scenarios, such as traffic regulations and driving preferences.


Liu Yudong gave a more cautious judgment: "At present, end-to-end is the end of automated driving in the foreseeable future period, but there are various possibilities for the longer cycle of technological evolution." Just as ChatGPT was a technology that we could not have imagined three years ago, a new technology architecture could disrupt ChatGPT two or three years from now."


What is "best practice" when 100% end-to-end is not there yet?


Although it is not clear whether end-to-end is the final solution for autonomous driving, its landing application has clearly become the consensus solution for the intelligent driving industry. However, there are still many controversies in the industry regarding the choice of end-to-end autonomous driving technology path.


At present, the car is moving along the end-to-end route based on multi-modal large language models, not only on some public data sets, but also on the automatic driving International Challenge held by Shanghai Artificial Intelligence Laboratory and CVPR this year, with a pure visual automatic driving solution. Finished second out of 143 international teams on the end-to-end self-driving track.


Wang Panqu believes that modular end-to-end is equivalent to a preliminary exploration, which can be done more quickly, and currently there are relatively mature solutions in academia and industry. The end-to-end technology route based on multimodal large models has the potential to turn autonomous driving into a profitable business, and only a strong generalization of the base model can bring the knowledge injection and integration needed in the field of autonomous driving.


In short, the strong generalization of large models will bring performance advantages to the entire end-to-end system, and it will also make it possible to achieve profitable advanced autonomous driving in mass production in the future. Moreover, these two end-to-end technology routes based on the multimodal large model and the world model, respectively, can be reused in the future.


Liu Yudong said that in principle, one model is closer to the AGI form in other fields, and the world model is mainly a tool for data generation, and it will take longer to see if it can be used as an autonomous driving system. In the next two years, there are two main types of end-to-end solutions: one is the modular end-to-end, typical representative is the UniAD of Shanghai Artificial Intelligence Laboratory; The second is the end-to-end one model based mainly on multimodal large models, such as Wayve's LINGO-2 and Ideal's recently launched DriveVLM.


He believes that the world model is the end-to-end rational solution. Based on the world model, intelligent driving algorithms have the ability to understand the scene and make reasonable predictions about the future, and make decisions based on this information, which is more in line with the logic of human thinking.


Zhu Zheng, co-founder and chief scientist of Excellent Technology, further added that the training of one model is very resource-intensive and time consuming, and has very high requirements for the scale and quality of data. The end-to-end model prediction ability is used to make scene perception and driving behavior decisions, which is more consistent with human driving behavior and habits. According to its introduction, at present, excellent has an end-to-end basic prototype system based on the world model, and is doing joint verification with a car manufacturer, and will soon make some progress.


In August last year, Pony Wisdom will sense, forecast, and control the three traditional modules, unified into one model end-to-end autonomous driving model, has been synchronized to L4 autonomous taxi and L2 level assisted driving passenger car. In Lou Tiancheng's view, both the modular end-to-end and one model are at an early stage and have not yet been verified for over-production delivery. It is expected that in the next 1 to 2 years, the end-to-end technology route will move from disagreement to consensus.


"In the long run, the end-to-end end game will eventually move to one model."


Not long ago, Jiyue Auto CEO Xia Yiping also publicly said, "There is no one in the market is really end-to-end, are marketing gimmicks." It is understood that the current end-to-end intelligent driving program is also a "two-stage" technical architecture.


The "black box" property is a misunderstanding and can be made to resemble a gray box or a white box


A number of advantages of the end-to-end solution come from the architecture that integrates multiple modules together, but this design also makes the system closer to a "black box" than the original understandable "white box", thus having more "uninterpretability".


Lou Tiancheng believes that unexplainability is a natural defect of the end-to-end system, but whether it will limit the development of end-to-end autonomous driving technology depends on the situation. For L2, uninterpretability does not affect the end-to-end application, such as modular end-to-end still retains the main functional modules, the intermediate output features can be further extracted into interpretable data.


For L4, the requirements for safety and certainty are much higher than those for L2. Therefore, it is necessary to integrate regular instructions into the model, such as traffic regulations, driving preferences, etc., to help the end-to-end autonomous driving model better understand driving intentions. At the same time, it is also necessary to upgrade the model capabilities to export driving intentions and further improve interpretability.


In Zhu Zheng's view, although from the product level and the final form of research and development, end-to-end is indeed a black box, but from the perspective of engineers and product design, including users, end-to-end can be made similar to a gray box or a white box.


First, the modular union distinguishes the three modules of perception, prediction and planning in detail from end to end, and any planning result can be associated with an intermediate module in the front. Second, one model can output modular intermediate results. Marking the results for intermediate supervision can make one model converge better, and it can also show the modular intermediate results to engineers or users. Third, the most important thing about the world model is its predictive power, and its predictive results can also be related to the intermediate results of the model.


As long as R&D cognition is presented in a form that can be interpreted externally, it is no longer a black box.


Wang Panqu also believes that the proposal of unexplainability reflects the public's trust in technology, that is, whether the performance of the technology itself has reached an acceptable standard. With the development of data-driven, algorithm design, large model security and other related technologies, there will be a very big leap in end-to-end performance and reliability in the next one to two years. After large-scale testing and full verification of performance, interpretability is no longer the key issue.


The end-to-end boarding "peak" will come, and commercial vehicles will land faster


"Modular end-to-end scale-to-market is just within the last year, and end-to-end based on large language models will take an additional 1 to 2 years." Wang Panqu pointed out that L4 automatic driving of commercial vehicles must be faster than the landing speed of passenger cars, because the high-level automatic driving system that can be mass-produced is very difficult to pick the ground scene, and the commercial vehicle scene is simpler than the passenger car, and a single scene is easy to close the commercial loop, and it is also convenient to do the scene asymptotic.


Liu Yudong is more optimistic, that next year modular end-to-end and one model end-to-end will begin to push more intensively. In addition, Liu Yudong stood on the radical degree of technology development and talent concentration, technology iteration speed and technology application difficulty, said that the end-to-end time for commercial vehicles and passenger cars to really land may be similar, but the landing range of passenger cars will be larger than that of commercial vehicles, and commercial vehicles will slowly rise in the later stage.


"Before the end-to-end measurement, we must cross these several hurdles, the first is the preparation of the end-to-end computing power, the second is the iteration of the end-to-end algorithm, the third is the cloud data scale, the fourth is the computing power scale, and the fifth is the verification scheme."


In his view, at present, Tesla and domestic Wei Xiaoli, Huawei and other head Oems and companies have been fully equipped in the three aspects of car end computing power, cloud data scale, and cloud computing power scale. From the end of this year to the first half of next year, the end-to-end algorithms of several head car companies will be able to achieve large-scale boarding; Since the second half of next year, the industry will usher in a blowout state of end-to-end mass production cars.


Incoming end to end, meaning "start over"?


The development and adoption of end-to-end systems will undoubtedly bring a technological revolution to the overall intelligent driving solution. So, do we have to reinvent the previous technology end-to-end?


Liu Yudong believes that the original autonomous driving technology will not be completely subverted, and it will share some algorithms and software accumulation end-to-end.


One is the perception part, and now many end-to-end front-end camera information processing parts will use BEV practices, such as backbone or encoder. The second is the regulatory part, where some of the knowhow that was previously regulated can be migrated to an end-to-end system. The third is data infrastructure, which is an important ability for enterprises to do end-to-end in the future, and the company data infrastructure that can do a good job of BEV solutions is also relatively strong.


He said that end-to-end pure data-driven multi-modal large model as the core, if a smart driving company's previous technical solution has a lot of rules, then these rules will basically be overturned; If the previous technical solution has been largely model-driven, there is a high probability that this part of the code can be reused in some form.


It needs to be emphasized that the change in the research and development model brought about by the end-to-end algorithm is the focus of every Oems and autonomous driving companies, and it is also the most painful place.


Wang Panqu also mentioned that in addition to the model end, the end-to-end also needs to do more data work: the first need to reconstruct the data closed-loop system and its iterative efficiency, the second is the end-to-end testing and verification, the sensor input of the entire simulation platform must be very real, which is a very challenging technical problem. However, in terms of labor costs, the overall cost of the end-to-end intelligent driving system is lower than that of non-end-to-end, because there are only a few modules end-to-end, and the core team should have 20-30 engineers.


This is a good thing for Oems with mass production capabilities, as the overall cost of smart driving solutions will actually decrease significantly further due to their lower cost of data acquisition.


In terms of computing power investment, Lou Tiancheng said that in the short term, the purchase of large computing power chips will indeed increase the current cost. But in the long run, once the end-to-end technology matures, the upfront investment costs will gradually be diluted.


Pure end-to-end computing power investment is less than modular architecture, at least one to two million a year


"If you want the end-to-end model to achieve a better training degree, you need at least one to two hundred million yuan of computing power capital investment a year, and the passenger car track figure will certainly be more impressive."


According to Wang Panqu, the end-to-end computing power required is divided into two aspects: training and deployment. Deployment is equivalent to the number of block domain controllers to be purchased, and this part of the cost is fixed and relatively low, related to the cost of bicycles. The biggest cost is the training cost, which is divided into two kinds: self-built purchase card and cooperation with cloud service providers. For car companies with large orders, building their own data centers is a cost-effective choice; However, for car manufacturers that are not so large or in the early stage of research and development, it is a better choice to find cloud service providers to rent servers.


Previously, Lang Xianpeng, vice president of Ideal car intelligent driving, publicly revealed that Ideal currently spends 1 billion yuan per year on computing training, and is expected to spend 1 billion dollars per year in the future. "If you don't spend $1 billion a year on training, you may be eliminated in the future autonomous driving competition."


In terms of computing power scale, Lou Tiancheng believes that if it is just a simple end-to-end automatic driving model training, hundreds of Gpus with large computing power can be supported. If you want to invest for a long time and ensure end-to-end quality, the training computing power of various autonomous driving companies is basically at the kcal level, and car companies will invest more.


He said that the computing power requirements of a pure end-to-end system are less than the final power requirements of a modular architecture, but in addition to the main system, there is often a bypass system in mass production, and its computing power requirements are generally comparable to the previous modular architecture.


However, Wang Panqu believes that with the rise in the capacity of the car computing chip, computing power will not become an obstacle to the future end-to-end car. Lou Tiancheng holds the same view, saying that from the classical architecture to the end-to-end, the total number of code will be significantly reduced, and the computing resource consumption brought by the end-to-end neural network will not necessarily be significantly increased compared with the BEV model.


"The desire for more computing power comes more from an increase in the number of model parameters and model performance than from an end-to-end shift." In addition, he pointed out that from the perspective of end-to-end landing applications, relevant enterprises should think more about how to make full use of existing chip computing resources to improve utilization efficiency.


  • 13004184443

  • Room 607, 6th Floor, Building 9, Hongjing Xinhuiyuan, Qingpu District, Shanghai

  • gcfai@dongfangyuzhe.com

  • wechat

  • WeChat official account

Quantum (Shanghai) Artificial Intelligence Technology Co., Ltd. ICP:沪ICP备2025113240号-1

friend link