作者:Derrick Harris,Matt Bornstein,Guido Appenzeller
Research in artificial intelligence is increasing at an exponential rate. It’s difficult for AI experts to keep up with everything new being published, and even harder for beginners to know where to start.
人工智能的研究正以指數(shù)級的速度增長。人工智能專家很難跟上所有新發(fā)布的內(nèi)容,初學(xué)者更難知道從哪里開始。
So, in this post, we’re sharing a curated list of resources we’ve relied on to get smarter about modern AI. We call it the “AI Canon” because these papers, blog posts, courses, and guides have had an outsized impact on the field over the past several years.
所以,在這篇文章中,我們分享了一個精選的資源列表,我們依靠這些資源來更聰明地了解現(xiàn)代AI。我們稱之為“AI佳能”,因為這些論文,博客文章,課程和指南在過去幾年中對該領(lǐng)域產(chǎn)生了巨大的影響。
We start with a gentle introduction to transformer and latent diffusion models, which are fueling the current AI wave. Next, we go deep on technical learning resources; practical guides to building with large language models (LLMs); and analysis of the AI market.
Finally, we include a reference list of landmark research results, starting with “Attention is All You Need” — the 2017 paper by Google that introduced the world to transformer models and ushered in the age of generative AI.
最后,我們包括一個具有里程碑意義的研究成果的參考列表,從“注意力是你所需要的一切”開始-谷歌2017年的論文,該論文向世界介紹了變壓器模型,并迎來了生成式人工智能的時代。
首先,我們將溫和地介紹變壓器和潛在擴散模型,這些模型正在推動當前的AI浪潮。接下來,我們深入技術(shù)學(xué)習(xí)資源;使用大型語言模型(LLM)構(gòu)建的實用指南;分析AI市場。
These articles require no specialized background and can help you get up to speed quickly on the most important parts of the modern AI wave.
這些文章不需要專業(yè)背景,可以幫助您快速了解現(xiàn)代AI浪潮中最重要的部分。
?Software 2.0: Andrej Karpathy was one of the first to clearly explain (in 2017!) why the new AI wave really matters. His argument is that AI is a new and powerful way to program computers.
As LLMs have improved rapidly, this thesis has proven prescient, and it gives a good mental model for how the AI market may progress.
隨著LLM的迅速發(fā)展,這篇論文被證明是有先見之明的,它為人工智能市場的發(fā)展提供了一個很好的心理模型。
軟件2.0:Andrej Karpathy是最早明確解釋的人之一(2017年!)為什么新的人工智能浪潮真的很重要他的論點是,人工智能是一種新的、強大的計算機編程方式。
?State of GPT: Also from Karpathy, this is a very approachable explanation of how ChatGPT / GPT models in general work, how to use them, and what directions R&D may take.
GPT狀態(tài):同樣來自Karpathy,這是一個非常平易近人的解釋如何ChatGPT / GPT模型在一般的工作,如何使用它們,以及什么方向的研發(fā)可能采取。
?What is ChatGPT doing … and why does it work?: Computer scientist and entrepreneur Stephen Wolfram gives a long but highly readable explanation, from first principles, of how modern AI models work. He follows the timeline from early neural nets to today’s LLMs and ChatGPT.
ChatGPT在做什么…為什么會有效:計算機科學(xué)家和企業(yè)家Stephen Wolfram從基本原理出發(fā),對現(xiàn)代人工智能模型的工作原理進行了冗長但可讀性很強的解釋。他遵循從早期神經(jīng)網(wǎng)絡(luò)到今天的LLM和ChatGPT的時間軸。
?Transformers, explained: This post by Dale Markowitz is a shorter, more direct answer to the question “what is an LLM, and how does it work?” This is a great way to ease into the topic and develop intuition for the technology. It was written about GPT-3 but still applies to newer models.
Transformers解釋說:這篇文章由戴爾馬科維茨是一個更短,更直接的回答問題“什么是法學(xué)碩士,它是如何工作的?””這是一個很好的方式來輕松進入主題,并發(fā)展對技術(shù)的直覺。它是關(guān)于GPT-3的,但仍然適用于較新的模型。
?How Stable Diffusion works: This is the computer vision analogue to the last post. Chris McCormick gives a layperson’s explanation of how Stable Diffusion works and develops intuition around text-to-image models generally. For an even gentler introduction, check out this comic from r/StableDiffusion.
穩(wěn)定擴散的工作原理:這是上一篇文章的計算機視覺模擬。Chris McCormick給出了一個外行的解釋,說明了穩(wěn)定擴散是如何工作的,并通常圍繞文本到圖像模型開發(fā)了直覺。對于更溫和的介紹,請查看r/StableDiffusion的這部漫畫。
These resources provide a base understanding of fundamental ideas in machine learning and AI, from the basics of deep learning to university-level courses from AI experts.
這些資源提供了對機器學(xué)習(xí)和人工智能基本思想的基本理解,從深度學(xué)習(xí)的基礎(chǔ)知識到人工智能專家的大學(xué)課程。
?Deep learning in a nutshell: core concepts: This four-part series from Nvidia walks through the basics of deep learning as practiced in 2015, and is a good resource for anyone just learning about AI.
深度學(xué)習(xí)簡介:核心理念:Nvidia的這個四部分系列介紹了2015年實踐的深度學(xué)習(xí)基礎(chǔ)知識,對于任何剛剛學(xué)習(xí)AI的人來說都是一個很好的資源。
?Practical deep learning for coders: Comprehensive, free course on the fundamentals of AI, explained through practical examples and code.
面向程序員的實用深度學(xué)習(xí):關(guān)于人工智能基礎(chǔ)的全面免費課程,通過實際示例和代碼進行解釋。
?Word2vec explained: Easy introduction to embeddings and tokens, which are building blocks of LLMs (and all language models).
Word2vec解釋說:簡單介紹嵌入和令牌,它們是LLM(和所有語言模型)的構(gòu)建塊。
?Yes you should understand backprop: More in-depth post on back-propagation if you want to understand the details. If you want even more, try the Stanford CS231n lecture on Youtube.
是的,你應(yīng)該理解backprop:如果你想了解更多的細節(jié),請閱讀關(guān)于反向傳播的更深入的文章。如果你想知道更多,可以試試Youtube上的斯坦福大學(xué)CS231n講座。
?Stanford CS229: Introduction to Machine Learning with Andrew Ng, covering the fundamentals of machine learning.
斯坦福大學(xué)CS229:與Andrew Ng一起介紹機器學(xué)習(xí),涵蓋機器學(xué)習(xí)的基礎(chǔ)知識。
?Stanford CS224N: NLP with Deep Learning with Chris Manning, covering NLP basics through the first generation of LLMs.
斯坦福大學(xué)CS224N:NLP與Chris Manning的深度學(xué)習(xí),通過第一代LLM涵蓋NLP基礎(chǔ)知識。
There are countless resources — some better than others — attempting to explain how LLMs work. Here are some of our favorites, targeting a wide range of readers/viewers.
有無數(shù)的資源-一些比其他的更好-試圖解釋LLM是如何工作的。以下是我們的一些最愛,針對廣泛的讀者/觀眾。
?The illustrated transformer: A more technical overview of the transformer architecture by Jay Alammar.
圖示的變壓器:Jay Alammar對變壓器架構(gòu)的技術(shù)概述。
?The annotated transformer: In-depth post if you want to understand transformers at a source code level. Requires some knowledge of PyTorch.
帶注釋的變壓器:如果你想在源代碼級別上理解transformer,這篇文章是一篇深入的文章。需要一些PyTorch的知識。
?Let’s build GPT: from scratch, in code, spelled out: For the engineers out there, Karpathy does a video walkthrough of how to build a GPT model.
讓我們構(gòu)建GPT:從頭開始,用代碼拼出來:對于那里的工程師,Karpathy做了一個如何構(gòu)建GPT模型的視頻演練。
?The illustrated Stable Diffusion: Introduction to latent diffusion models, the most common type of generative AI model for images.
圖示的穩(wěn)定擴散:介紹潛在擴散模型,這是最常見的圖像生成AI模型。
?RLHF: Reinforcement Learning from Human Feedback: Chip Huyen explains RLHF, which can make LLMs behave in more predictable and human-friendly ways. This is one of the most important but least well-understood aspects of systems like ChatGPT.
RLHF:從人類反饋中強化學(xué)習(xí)Chip Huyen解釋了RLHF,它可以使LLM以更可預(yù)測和人性化的方式運行。這是像ChatGPT這樣的系統(tǒng)最重要但最不容易理解的方面之一。
?Reinforcement learning from human feedback: Computer scientist and OpenAI cofounder John Shulman goes deeper in this great talk on the current state, progress and limitations of LLMs with RLHF.
來自人類反饋的強化學(xué)習(xí):計算機科學(xué)家和OpenAI聯(lián)合創(chuàng)始人John Shulman在這個關(guān)于RLHF LLM的當前狀態(tài),進展和局限性的精彩演講中進行了深入的探討。
?Stanford CS25: Transformers United, an online seminar on Transformers.
斯坦福大學(xué)CS25:變形金剛聯(lián)合會,一個關(guān)于變形金剛的在線研討會。
?Stanford CS324: Large Language Models with Percy Liang, Tatsu Hashimoto, and Chris Re, covering a wide range of technical and non-technical aspects of LLMs.
斯坦福大學(xué)CS324:與珀西Liang,Tatsu Hashimoto和Chris Re合作的大型語言模型,涵蓋了LLM的廣泛技術(shù)和非技術(shù)方面。
?Predictive learning, NIPS 2016: In this early talk, Yann LeCun makes a strong case for unsupervised learning as a critical element of AI model architectures at scale. Skip to 19:20 for the famous cake analogy, which is still one of the best mental models for modern AI.
預(yù)測學(xué)習(xí),NIPS 2016:在這個早期的演講中,Yann LeCun將無監(jiān)督學(xué)習(xí)作為大規(guī)模AI模型架構(gòu)的關(guān)鍵要素。跳到19:20來看著名的蛋糕類比,它仍然是現(xiàn)代人工智能最好的心智模型之一。
?AI for full-self driving at Tesla: Another classic Karpathy talk, this time covering the Tesla data collection engine. Starting at 8:35 is one of the great all-time AI rants, explaining why long-tailed problems (in this case stop sign detection) are so hard.
特斯拉的全自動駕駛AI:另一個經(jīng)典的Karpathy演講,這次涵蓋了特斯拉的數(shù)據(jù)收集引擎。從8:35開始是人工智能歷史上最偉大的咆哮之一,解釋了為什么長尾問題(在這種情況下停止標志檢測)如此困難。
?The scaling hypothesis: One of the most surprising aspects of LLMs is that scaling — adding more data and compute — just keeps increasing accuracy. GPT-3 was the first model to demonstrate this clearly, and Gwern’s post does a great job explaining the intuition behind it.
縮放假設(shè):LLM最令人驚訝的方面之一是擴展-添加更多的數(shù)據(jù)和計算-只是不斷提高準確性。GPT-3是第一個清楚地證明這一點的模型,Gwern的帖子很好地解釋了它背后的直覺。
?Chinchilla’s wild implications: Nominally an explainer of the important Chinchilla paper (see below), this post gets to the heart of the big question in LLM scaling: are we running out of data? This builds on the post above and gives a refreshed view on scaling laws.
龍貓的野生含義:名義上是重要的Chinchilla論文的解釋者(見下文),這篇文章觸及了LLM縮放中的大問題的核心:我們的數(shù)據(jù)用完了嗎?這是建立在上面的帖子上,并給出了一個關(guān)于縮放定律的更新視圖。
?A survey of large language models: Comprehensive breakdown of current LLMs, including development timeline, size, training strategies, training data, hardware, and more.
大型語言模型綜述:當前LLM的全面細分,包括開發(fā)時間軸,規(guī)模,培訓(xùn)策略,培訓(xùn)數(shù)據(jù),硬件等。
?Sparks of artificial general intelligence: Early experiments with GPT-4: Early analysis from Microsoft Research on the capabilities of GPT-4, the current most advanced LLM, relative to human intelligence.
通用人工智能的火花:GPT-4的早期實驗:微軟研究院對GPT-4的能力的早期分析,GPT-4是目前最先進的LLM,相對于人類智能。
?The AI revolution: How Auto-GPT unleashes a new era of automation and creativity: An introduction to Auto-GPT and AI agents in general. This technology is very early but important to understand — it uses internet access and self-generated sub-tasks in order to solve specific, complex problems or goals.
AI革命:Auto-GPT如何開啟自動化和創(chuàng)造力的新時代:Auto-GPT和AI代理的一般介紹。這項技術(shù)非常早期,但重要的是要了解-它使用互聯(lián)網(wǎng)訪問和自我生成的子任務(wù),以解決特定的,復(fù)雜的問題或目標。
?The Waluigi Effect: Nominally an explanation of the “Waluigi effect” (i.e., why “alter egos” emerge in LLM behavior), but interesting mostly for its deep dive on the theory of LLM prompting.
瓦盧吉效應(yīng):名義上是對“Waluigi效應(yīng)”的解釋(即,為什么“改變自我”出現(xiàn)在法學(xué)碩士行為中),但有趣的主要是它對法學(xué)碩士激勵理論的深入研究。
A new application stack is emerging with LLMs at the core. While there isn’t a lot of formal education available on this topic yet, we pulled out some of the most useful resources we’ve found.
一個新的應(yīng)用程序堆棧正在以LLM為核心出現(xiàn)。雖然還沒有很多關(guān)于這個主題的正規(guī)教育,但我們找到了一些最有用的資源。
?Build a GitHub support bot with GPT3, LangChain, and Python: One of the earliest public explanations of the modern LLM app stack. Some of the advice in here is dated, but in many ways it kicked off widespread adoption and experimentation of new AI apps.
使用GPT3、LangChain和Python構(gòu)建GitHub支持機器人:現(xiàn)代LLM應(yīng)用程序堆棧的最早公開解釋之一。這里的一些建議已經(jīng)過時,但在許多方面,它開啟了新AI應(yīng)用程序的廣泛采用和實驗。
?Building LLM applications for production: Chip Huyen discusses many of the key challenges in building LLM apps, how to address them, and what types of use cases make the most sense.
構(gòu)建用于生產(chǎn)的LLM應(yīng)用程序:Chip Huyen討論了構(gòu)建LLM應(yīng)用程序的許多關(guān)鍵挑戰(zhàn),如何解決這些挑戰(zhàn),以及什么類型的用例最有意義。
?Prompt Engineering Guide: For anyone writing LLM prompts — including app devs — this is the most comprehensive guide, with specific examples for a handful of popular models. For a lighter, more conversational treatment, try Brex’s prompt engineering guide.
提示工程指南:對于任何編寫LLM提示的人-包括應(yīng)用程序開發(fā)人員-這是最全面的指南,其中包含少數(shù)流行模型的具體示例。要想獲得更輕松、更有對話性的治療,請嘗試Brex的即時工程指南。
?Prompt injection: What’s the worst that can happen? Prompt injection is a potentially serious security vulnerability lurking for LLM apps, with no perfect solution yet. Simon Willison gives the definitive description of the problem in this post. Nearly everything Simon writes on AI is outstanding.
即時注射:最壞的結(jié)果是什么?提示注入是潛伏在LLM應(yīng)用程序中的一個潛在的嚴重安全漏洞,目前還沒有完美的解決方案。Simon Willison在這篇文章中給出了這個問題的明確描述。西蒙寫的關(guān)于AI的幾乎所有東西都很出色。
?OpenAI cookbook: For developers, this is the definitive collection of guides and code examples for working with the OpenAI API. It’s updated continually with new code examples.
OpenAI食譜:對于開發(fā)人員來說,這是使用OpenAI API的指南和代碼示例的權(quán)威集合。它不斷更新新的代碼示例。
?Pinecone learning center: Many LLM apps are based around a vector search paradigm. Pinecone’s learning center — despite being branded vendor content — offers some of the most useful instruction on how to build in this pattern.
松果學(xué)習(xí)中心:許多LLM應(yīng)用程序都基于矢量搜索范式。Pinecone的學(xué)習(xí)中心--盡管是品牌供應(yīng)商內(nèi)容--提供了一些關(guān)于如何構(gòu)建這種模式的最有用的指導(dǎo)。
?LangChain docs: As the default orchestration layer for LLM apps, LangChain connects to just about all other pieces of the stack. So their docs are a real reference for the full stack and how the pieces fit together.
LangChain文檔:作為LLM應(yīng)用程序的默認編排層,LangChain連接到堆棧的所有其他部分。因此,他們的文檔是完整堆棧以及各部分如何組合在一起的真實的參考。
?LLM Bootcamp: A practical course for building LLM-based applications with Charles Frye, Sergey Karayev, and Josh Tobin.
LLM Bootcamp:與Charles Frye,Sergey Karayev和Josh Tobin一起構(gòu)建基于LLM的應(yīng)用程序的實用課程。
?Hugging Face Transformers: Guide to using open-source LLMs in the Hugging Face transformers library.
擁抱臉變形金剛:在Hugging Face變壓器庫中使用開源LLM的指南。
?Chatbot Arena: An Elo-style ranking system of popular LLMs, led by a team at UC Berkeley. Users can also participate by comparing models head to head.
聊天機器人競技場:一個流行的法學(xué)碩士的Elo風(fēng)格的排名系統(tǒng),由加州大學(xué)伯克利分校的一個團隊領(lǐng)導(dǎo)。用戶還可以通過頭對頭比較模型來參與。
?Open LLM Leaderboard: A ranking by Hugging Face, comparing open source LLMs across a collection of standard benchmarks and tasks.
開放LLM排行榜:Hugging Face的排名,在一系列標準基準和任務(wù)中比較開源LLM。
We’ve all marveled at what generative AI can produce, but there are still a lot of questions about what it all means. Which products and companies will survive and thrive? What happens to artists? How should companies use it? How will it affect literally jobs and society at large? Here are some attempts at answering these questions.
我們都對生成式人工智能能產(chǎn)生什么感到驚訝,但關(guān)于這一切意味著什么,仍然有很多問題。哪些產(chǎn)品和公司將生存和發(fā)展?藝術(shù)家怎么了?企業(yè)應(yīng)該如何使用它?它將如何影響就業(yè)和整個社會?以下是一些回答這些問題的嘗試。
?Who owns the generative AI platform?: Our flagship assessment of where value is accruing, and might accrue, at the infrastructure, model, and application layers of generative AI.
誰擁有人工智能平臺?:我們對生成AI的基礎(chǔ)設(shè)施、模型和應(yīng)用層的價值正在積累和可能積累的旗艦評估。
?Navigating the high cost of AI compute: A detailed breakdown of why generative AI models require so many computing resources, and how to think about acquiring those resources (i.e., the right GPUs in the right quantity, at the right cost) in a high-demand market.
駕馭AI計算的高成本:詳細分析了為什么生成式AI模型需要如此多的計算資源,以及如何考慮獲取這些資源(即,在高需求的市場中,以合適的數(shù)量、合適的成本獲得合適的GPU)。
?Art isn’t dead, it’s just machine-generated: A look at how AI models were able to reshape creative fields — often assumed to be the last holdout against automation — much faster than fields such as software development.
藝術(shù)并沒有死,它只是機器生成的:看看人工智能模型如何能夠重塑創(chuàng)造性領(lǐng)域-通常被認為是最后一個反對自動化的領(lǐng)域-比軟件開發(fā)等領(lǐng)域快得多。
?The generative AI revolution in games: An in-depth analysis from our Games team at how the ability to easily create highly detailed graphics will change how game designers, studios, and the entire market function. This follow-up piece from our Games team looks specifically at the advent of AI-generated content vis à vis user-generated content.
游戲中的人工智能革命:我們的游戲團隊深入分析了輕松創(chuàng)建高細節(jié)圖形的能力將如何改變游戲設(shè)計師,工作室和整個市場的運作方式。我們的游戲團隊的這篇后續(xù)文章特別關(guān)注人工智能生成內(nèi)容維斯用戶生成內(nèi)容的出現(xiàn)。
?For B2B generative AI apps, is less more?: A prediction for how LLMs will evolve in the world of B2B enterprise applications, centered around the idea that summarizing information will ultimately be more valuable than producing text.
對于B2B生成式AI應(yīng)用程序來說,更少更多嗎?:預(yù)測LLM將如何在B2B企業(yè)應(yīng)用程序的世界中發(fā)展,圍繞著總結(jié)信息最終將比生成文本更有價值的想法。
?Financial services will embrace generative AI faster than you think: An argument that the financial services industry is poised to use generative AI for personalized consumer experiences, cost-efficient operations, better compliance, improved risk management, and dynamic forecasting and reporting.
金融服務(wù)將比你想象的更快地擁抱生成式AI:金融服務(wù)行業(yè)準備使用生成式人工智能來實現(xiàn)個性化的消費者體驗、具有成本效益的運營、更好的合規(guī)性、更好的風(fēng)險管理以及動態(tài)預(yù)測和報告。
?Generative AI: The next consumer platform: A look at opportunities for generative AI to impact the consumer market across a range of sectors from therapy to ecommerce.
生成AI:下一個消費平臺:看看生成式人工智能在從治療到電子商務(wù)的一系列領(lǐng)域影響消費者市場的機會。
?To make a real difference in health care, AI will need to learn like we do: AI is poised to irrevocably change how we look to prevent and treat illness. However, to truly transform drug discovery to care delivery, we should invest in creating an ecosystem of “specialist” AIs — that learn like our best physicians and drug developers do today.
為了在醫(yī)療保健領(lǐng)域發(fā)揮真實的的作用,人工智能需要像我們一樣學(xué)習(xí):人工智能將不可逆轉(zhuǎn)地改變我們預(yù)防和治療疾病的方式。然而,要真正將藥物發(fā)現(xiàn)轉(zhuǎn)變?yōu)獒t(yī)療服務(wù),我們應(yīng)該投資創(chuàng)建一個“專家”人工智能生態(tài)系統(tǒng)-像我們今天最好的醫(yī)生和藥物開發(fā)人員一樣學(xué)習(xí)。
?The new industrial revolution: Bio x AI: The next industrial revolution in human history will be biology powered by artificial intelligence.
新工業(yè)革命:Bio x AI:人類歷史上的下一次工業(yè)革命將是由人工智能驅(qū)動的生物學(xué)。
?On the opportunities and risks of foundation models: Stanford overview paper on Foundation Models. Long and opinionated, but this shaped the term.
關(guān)于基金會模式的機遇和風(fēng)險:斯坦福大學(xué)關(guān)于基金會模型的綜述論文。很長很固執(zhí),但這塑造了這個詞。
?State of AI Report: An annual roundup of everything going on in AI, including technology breakthroughs, industry development, politics/regulation, economic implications, safety, and predictions for the future.
AI狀態(tài)報告:年度綜述AI領(lǐng)域發(fā)生的一切,包括技術(shù)突破、行業(yè)發(fā)展、政治/監(jiān)管、經(jīng)濟影響、安全性和對未來的預(yù)測。
?GPTs are GPTs: An early look at the labor market impact potential of large language models: This paper from researchers at OpenAI, OpenResearch, and the University of of Pennsylvania predicts that “around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted.”
GPT是GPT:對大型語言模型的勞動力市場影響潛力的初步研究:這篇來自O(shè)penAI、OpenResearch和賓夕法尼亞大學(xué)的研究人員的論文預(yù)測,“大約80%的美國勞動力可能會有至少10%的工作任務(wù)受到引入LLM的影響,而大約19%的工人可能會看到至少50%的任務(wù)受到影響。
?Deep medicine: How artificial intelligence can make healthcare human again: Dr. Eric Topol reveals how artificial intelligence has the potential to free physicians from the time-consuming tasks that interfere with human connection. The doctor-patient relationship is restored. (a16z podcast)
深層醫(yī)學(xué):人工智能如何讓醫(yī)療保健再次成為人類:Eric Topol博士揭示了人工智能如何有潛力將醫(yī)生從干擾人類聯(lián)系的耗時任務(wù)中解放出來。醫(yī)患關(guān)系得到恢復(fù)。(a16z播客)
Most of the amazing AI products we see today are the result of no-less-amazing research, carried out by experts inside large companies and leading universities.
我們今天看到的大多數(shù)令人驚嘆的人工智能產(chǎn)品都是由大公司和一流大學(xué)的專家進行的驚人研究的結(jié)果。
Lately, we’ve also seen impressive work from individuals and the open source community taking popular projects into new directions, for example by creating automated agents or porting models onto smaller hardware footprints.
最近,我們也看到了來自個人和開源社區(qū)的令人印象深刻的工作,將流行的項目帶入新的方向,例如通過創(chuàng)建自動化代理或?qū)⒛P鸵浦驳捷^小的硬件足跡上。
Here’s a collection of many of these papers and projects, for folks who really want to dive deep into generative AI.
這里收集了許多這樣的論文和項目,適合那些真正想要深入研究生成式AI的人。
(For research papers and projects, we’ve also included links to the accompanying blog posts or websites, where available, which tend to explain things at a higher level. And we’ve included original publication years so you can track foundational research over time.)
(For除了研究論文和項目外,我們還提供了相應(yīng)的博客文章或網(wǎng)站的鏈接,這些鏈接往往會在更高的層次上解釋事情。我們還包括原始出版年份,以便您可以跟蹤基礎(chǔ)研究的時間。
New models
?Attention is all you need (2017): The original transformer work and research paper from Google Brain that started it all. (blog post)
你需要的只是注意力(2017)最初的變壓器工作和Google Brain的研究論文開始了這一切。(blog職位)
?BERT: pre-training of deep bidirectional transformers for language understanding (2018): One of the first publicly available LLMs, with many variants still in use today. (blog post)
BERT:語言理解的深度雙向轉(zhuǎn)換器的預(yù)訓(xùn)練(2018):第一個公開可用的LLM之一,今天仍在使用許多變體。(blog職位)
?Improving language understanding by generative pre-training (2018): The first paper from OpenAI covering the GPT architecture, which has become the dominant development path in LLMs. (blog post)
通過生成式預(yù)訓(xùn)練提高語言理解(2018):OpenAI的第一篇論文涉及GPT架構(gòu),該架構(gòu)已成為LLM的主要開發(fā)路徑。(blog職位)
?Language models are few-shot learners (2020): The OpenAI paper that describes GPT-3 and the decoder-only architecture of modern LLMs.
語言模型是少數(shù)學(xué)習(xí)者(2020):OpenAI論文描述了GPT-3和現(xiàn)代LLM的僅解碼器架構(gòu)。
?Training language models to follow instructions with human feedback (2022): OpenAI’s paper explaining InstructGPT, which utilizes humans in the loop to train models and, thus, better follow the instructions in prompts. This was one of the key unlocks that made LLMs accessible to consumers (e.g., via ChatGPT). (blog post)
訓(xùn)練語言模型以遵循人類反饋的指令(2022):OpenAI的論文解釋了InstructGPT,它利用循環(huán)中的人類來訓(xùn)練模型,從而更好地遵循提示中的說明。這是使LLM對消費者可訪問的關(guān)鍵解鎖之一(例如,通過ChatGPT)。(blog職位)
?LaMDA: language models for dialog applications (2022): A model form Google specifically designed for free-flowing dialog between a human and chatbot across a wide variety of topics. (blog post)
LaMDA:對話應(yīng)用程序的語言模型(2022):Google專為人類和聊天機器人之間的自由對話而設(shè)計的模型表單,涉及各種主題。(blog職位)
?PaLM: Scaling language modeling with pathways (2022): PaLM, from Google, utilized a new system for training LLMs across thousands of chips and demonstrated larger-than-expected improvements for certain tasks as model size scaled up. (blog post). See also the PaLM-2 technical report.
PaLM:Scaling language modeling with pathways(2022):來自谷歌的PaLM利用一個新系統(tǒng)在數(shù)千個芯片上訓(xùn)練LLM,并在模型規(guī)模擴大時,對某些任務(wù)的改進超過預(yù)期。(blog post)。另見PaLM-2技術(shù)報告。
?OPT: Open Pre-trained Transformer language models (2022): OPT is one of the top performing fully open source LLMs. The release for this 175-billion-parameter model comes with code and was trained on publicly available datasets. (blog post)
OPT:開放預(yù)訓(xùn)練的Transformer語言模型(2022):OPT是表現(xiàn)最好的完全開源LLM之一。這個1750億參數(shù)模型的發(fā)布附帶了代碼,并在公開可用的數(shù)據(jù)集上進行了訓(xùn)練。(blog職位)
?Training compute-optimal large language models (2022): The Chinchilla paper. It makes the case that most models are data limited, not compute limited, and changed the consensus on LLM scaling. (blog post)
訓(xùn)練計算最優(yōu)的大型語言模型(2022):龍貓紙它使得大多數(shù)模型都是數(shù)據(jù)有限的,而不是計算有限的,并且改變了對LLM縮放的共識。(blog職位)
?GPT-4 technical report (2023): The latest and greatest paper from OpenAI, known mostly for how little it reveals! (blog post). The GPT-4 system card sheds some light on how OpenAI treats hallucinations, privacy, security, and other issues.
GPT-4技術(shù)報告(2023):OpenAI最新最偉大的論文,主要是因為它揭示的很少!(blog post)。GPT-4系統(tǒng)卡揭示了OpenAI如何處理幻覺,隱私,安全和其他問題。
?LLaMA: Open and efficient foundation language models (2023): The model from Meta that (almost) started an open-source LLM revolution. Competitive with many of the best closed-source models but only opened up to researchers on a restricted license. (blog post)
LLaMA:開放和高效的基礎(chǔ)語言模型(2023):梅塔的模型(幾乎)開始了開源LLM革命。與許多最好的閉源模型競爭,但只向研究人員開放有限的許可證。(blog職位)
?Alpaca: A strong, replicable instruction-following model (2023): Out of Stanford, this model demonstrates the power of instruction tuning, especially in smaller open-source models, compared to pure scale.
Alpaca:一個強大的,可復(fù)制的指令遵循模型(2023):在斯坦福大學(xué)大學(xué),這個模型展示了指令調(diào)優(yōu)的力量,特別是在較小的開源模型中,與純規(guī)模相比。
Model improvements (e.g. fine-tuning, retrieval, attention)
模型改進(例如微調(diào)、檢索、注意)
?Deep reinforcement learning from human preferences (2017): Research on reinforcement learning in gaming and robotics contexts, that turned out to be a fantastic tool for LLMs.
從人類偏好進行深度強化學(xué)習(xí)(2017):在游戲和機器人環(huán)境中進行強化學(xué)習(xí)的研究,這對LLM來說是一個很好的工具。
?Retrieval-augmented generation for knowledge-intensive NLP tasks (2020): Developed by Facebook, RAG is one of the two main research paths for improving LLM accuracy via information retrieval. (blog post)
知識密集型NLP任務(wù)的檢索增強生成(2020年):由Facebook開發(fā)的RAG是通過信息檢索提高LLM準確性的兩條主要研究路徑之一。(blog職位)
?Improving language models by retrieving from trillions of tokens (2021): RETRO, for “Retrieval Enhanced TRansfOrmers,” is another approach — this one by DeepMind — to improve LLM accuracy by accessing information not included in their training data. (blog post)
通過從數(shù)萬億個令牌中檢索來改進語言模型(2021):RETRO,即“檢索增強的TRansfOrmers”,是另一種方法-這是DeepMind的一種方法-通過訪問未包含在其訓(xùn)練數(shù)據(jù)中的信息來提高LLM的準確性。(blog職位)
?LoRA: Low-rank adaptation of large language models (2021): This research out of Microsoft introduced a more efficient alternative to fine-tuning for training LLMs on new data. It’s now become a standard for community fine-tuning, especially for image models.
LoRA:大型語言模型的低秩適應(yīng)(2021):微軟的這項研究為在新數(shù)據(jù)上訓(xùn)練LLM引入了一種更有效的微調(diào)替代方案。它現(xiàn)在已經(jīng)成為社區(qū)微調(diào)的標準,特別是對于圖像模型。
?Constitutional AI (2022): The Anthropic team introduces the concept of reinforcement learning from AI Feedback (RLAIF). The main idea is that we can develop a harmless AI assistant with the supervision of other AIs.
憲法AI(2022年):Anthropic團隊從AI Feedback(RLAIF)引入了強化學(xué)習(xí)的概念。主要想法是,我們可以在其他AI的監(jiān)督下開發(fā)一個無害的AI助手。
?FlashAttention: Fast and memory-efficient exact attention with IO-awareness (2022): This research out of Stanford opened the door for state-of-the-art models to understand longer sequences of text (and higher-resolution images) without exorbitant training times and costs. (blog post)
FlashAttention:具有IO感知功能的快速高效的精確注意力(2022):斯坦福大學(xué)大學(xué)的這項研究為最先進的模型打開了大門,可以理解更長的文本序列(和更高分辨率的圖像),而無需過多的訓(xùn)練時間和成本。(blog職位)
?Hungry hungry hippos: Towards language modeling with state space models (2022): Again from Stanford, this paper describes one of the leading alternatives to attention in language modeling. This is a promising path to better scaling and training efficiency. (blog post)
饑餓的河馬:使用狀態(tài)空間模型進行語言建模(2022):同樣來自斯坦福大學(xué),本文描述了語言建模中注意力的主要替代方案之一。這是一條有希望實現(xiàn)更好的擴展和訓(xùn)練效率的途徑。(博客文章)
?Learning transferable visual models from natural language supervision (2021): Paper that introduces a base model — CLIP — that links textual descriptions to images. One of the first effective, large-scale uses of foundation models in computer vision. (blog post)
從自然語言監(jiān)督中學(xué)習(xí)可轉(zhuǎn)移的視覺模型(2021):論文介紹了一個基本模型- CLIP -鏈接文本描述的圖像。這是計算機視覺中基礎(chǔ)模型的首次有效、大規(guī)模使用之一。(blog職位)
?Zero-shot text-to-image generation (2021): This is the paper that introduced DALL-E, a model that combines the aforementioned CLIP and GPT-3 to automatically generate images based on text prompts. Its successor, DALL-E 2, would kick off the image-based generative AI boom in 2022. (blog post)
零拍攝文本到圖像生成(2021年):這篇論文介紹了DALL-E,這是一個結(jié)合了前面提到的CLIP和GPT-3的模型,可以根據(jù)文本提示自動生成圖像。它的繼任者DALL-E 2將在2022年開啟基于圖像的生成AI熱潮。(blog職位)
?High-resolution image synthesis with latent diffusion models (2021): The paper that described Stable Diffusion (after the launch and explosive open source growth).
使用潛在擴散模型的高分辨率圖像合成(2021):描述穩(wěn)定擴散(在發(fā)布和爆炸性開源增長之后)的論文。
?Photorealistic text-to-image diffusion models with deep language understanding (2022): Imagen was Google’s foray into AI image generation. More than a year after its announcement, the model has yet to be released publicly as of the publish date of this piece. (website)
具有深度語言理解的逼真文本到圖像擴散模型(2022):Imagen是Google進軍AI圖像生成領(lǐng)域的嘗試。在宣布一年多之后,該模型尚未公開發(fā)布,截至本文發(fā)布之日。(網(wǎng)址)
?DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation (2022): DreamBooth is a system, developed at Google, for training models to recognize user-submitted subjects and apply them to the context of a prompt (e.g. [USER] smiling at the Eiffel Tower). (website)
DreamBooth:微調(diào)文本到圖像的擴散模型,用于主題驅(qū)動的生成(2022):DreamBooth是一個由Google開發(fā)的系統(tǒng),用于訓(xùn)練模型識別用戶提交的主題并將其應(yīng)用于提示的上下文(例如:[用戶]微笑在埃菲爾鐵塔)。(網(wǎng)址)
?Adding conditional control to text-to-image diffusion models (2023): This paper from Stanford introduces ControlNet, a now very popular tool for exercising fine-grained control over image generation with latent diffusion models.
向文本到圖像擴散模型添加條件控制(2023):這篇來自斯坦福大學(xué)論文介紹了ControlNet,這是一種現(xiàn)在非常流行的工具,用于使用潛在擴散模型對圖像生成進行細粒度控制。
?A path towards autonomous machine intelligence (2022): A proposal from Meta AI lead and NYU professor Yann LeCun on how to build autonomous and intelligent agents that truly understand the world around them.
邁向自主機器智能的道路(2022年):梅塔AI負責人和紐約大學(xué)教授Yann LeCun提出了一項關(guān)于如何構(gòu)建真正了解周圍世界的自主智能代理的建議。
?ReAct: Synergizing reasoning and acting in language models (2022): A project out of Princeton and Google to test and improve the reasoning and planning abilities of LLMs. (blog post)
ReAct:Synergizing Reasoning and Acting in Language Models(2022)普林斯頓大學(xué)和谷歌的一個項目,旨在測試和提高法學(xué)碩士的推理和規(guī)劃能力。(博客文章)
?Generative agents: Interactive simulacra of human behavior (2023): Researchers at Stanford and Google used LLMs to power agents, in a setting akin to “The Sims,” whose interactions are emergent rather than programmed.
生成劑:人類行為的互動模擬(2023):斯坦福大學(xué)和谷歌的研究人員使用LLM為代理人提供動力,其設(shè)置類似于“西姆斯”,其交互是緊急的而不是編程的。
?Reflexion: an autonomous agent with dynamic memory and self-reflection (2023): Work from researchers at Northeastern University and MIT on teaching LLMs to solve problems more reliably by learning from their mistakes and past experiences.
Reflexion:an autonomous agent with dynamic memory and self-reflection(2023):東北大學(xué)和麻省理工學(xué)院的研究人員通過從錯誤和過去的經(jīng)驗中學(xué)習(xí),教授法學(xué)碩士更可靠地解決問題。
?Toolformer: Language models can teach themselves to use tools (2023): This project from Meta trained LLMs to use external tools (APIs, in this case, pointing to things like search engines and calculators) in order to improve accuracy without increasing model size.
Toolformer:語言模型可以教自己使用工具(2023):梅塔的這個項目訓(xùn)練LLM使用外部工具(API,在這種情況下,指向搜索引擎和計算器等),以便在不增加模型大小的情況下提高精度。
?Auto-GPT: An autonomous GPT-4 experiment: An open source experiment to expand on the capabilities of GPT-4 by giving it a collection of tools (internet access, file storage, etc.) and choosing which ones to use in order to solve a specific task.
Auto-GPT:自主GPT-4實驗:一個開源實驗,通過提供一系列工具(互聯(lián)網(wǎng)訪問、文件存儲等)來擴展GPT-4的功能。以及選擇使用哪些來解決特定任務(wù)。
?BabyAGI: This Python script utilizes GPT-4 and vector databases (to store context) in order to plan and executes a series of tasks that solve a broader objective.
BabyAGI:這個Python腳本利用GPT-4和向量數(shù)據(jù)庫(存儲上下文)來計劃和執(zhí)行一系列任務(wù),以解決更廣泛的目標。
Code generation
?Evaluating large language models trained on code (2021): This is OpenAI’s research paper for Codex, the code-generation model behind the GitHub Copilot product. (blog post)
評估在代碼上訓(xùn)練的大型語言模型(2021):這是OpenAI為Codex撰寫的研究論文,Codex是GitHub Copilot產(chǎn)品背后的代碼生成模型。(blog職位)
?Competition-level code generation with AlphaCode (2021): This research from DeepMind demonstrates a model capable of writing better code than human programmers. (blog post)
使用AlphaCode(2021)生成競賽級代碼:DeepMind的這項研究展示了一種能夠編寫比人類程序員更好的代碼的模型。(博客文章)
?CodeGen: An open large language model for code with multi-turn program synthesis (2022): CodeGen comes out of the AI research arm at Salesforce, and currently underpins the Replit Ghostwriter product for code generation. (blog post)
CodeGen:一個開放的大型語言模型,用于多輪程序合成的代碼(2022):CodeGen來自Salesforce的AI研究部門,目前支持Replit Ghostwriter產(chǎn)品用于代碼生成。(博客文章)
Video generation
?Make-A-Video: Text-to-video generation without text-video data (2022): A model from Meta that creates short videos from text prompts, but also adds motion to static photo inputs or creates variations of existing videos. (blog post)
制作視頻:無文本-視頻數(shù)據(jù)的文本-視頻生成(2022):梅塔的一個模型,從文本提示創(chuàng)建短視頻,但也向靜態(tài)照片輸入添加運動或創(chuàng)建現(xiàn)有視頻的變體。(blog職位)
?Imagen Video: High definition video generation with diffusion models (2022): Just what it sounds like: a version of Google’s image-based Imagen model optimized for producing short videos from text prompts. (website)
Imagen視頻:使用擴散模型生成高清視頻(2022年):就像它聽起來一樣:Google基于圖像的Imagen模型的一個版本,優(yōu)化用于從文本提示生成短視頻。(網(wǎng)址)
Human biology and medical data 人體生物學(xué)和醫(yī)學(xué)數(shù)據(jù)
?Strategies for pre-training graph neural networks (2020): This publication laid the groundwork for effective pre-training methods useful for applications across drug discovery, such as molecular property prediction and protein function prediction. (blog post)
預(yù)訓(xùn)練圖神經(jīng)網(wǎng)絡(luò)的策略(2020):該出版物為有效的預(yù)訓(xùn)練方法奠定了基礎(chǔ),這些方法可用于藥物發(fā)現(xiàn)的應(yīng)用,例如分子性質(zhì)預(yù)測和蛋白質(zhì)功能預(yù)測。(博客文章)
?Improved protein structure prediction using potentials from deep learning (2020): DeepMind’s protein-centric transformer model, AlphaFold, made it possible to predict protein structure from sequence — a true breakthrough which has already had far-reaching implications for understanding biological processes and developing new treatments for diseases. (blog post) (explainer)
利用深度學(xué)習(xí)的潛力改進蛋白質(zhì)結(jié)構(gòu)預(yù)測(2020):DeepMind以蛋白質(zhì)為中心的轉(zhuǎn)換器模型AlphaFold使得從序列預(yù)測蛋白質(zhì)結(jié)構(gòu)成為可能-這是一項真正的突破,已經(jīng)對理解生物過程和開發(fā)新的疾病治療方法產(chǎn)生了深遠的影響。(blog post)(解釋者)
?Large language models encode clinical knowledge (2022): Med-PaLM is a LLM capable of correctly answering US Medical License Exam style questions. The team has since published results on the performance of Med-PaLM2, which achieved a score on par with “expert” test takers. Other teams have performed similar experiments with ChatGPT and GPT-4. (video)
大型語言模型編碼臨床知識(2022):Med-PaLM是一個能夠正確回答美國醫(yī)學(xué)執(zhí)照考試風(fēng)格問題的LLM。此后,該團隊發(fā)表了Med-PaLM 2的性能結(jié)果,該結(jié)果與“專家”測試者的得分相當。其他團隊也用ChatGPT和GPT-4進行了類似的實驗。(視頻)
Audio generation
?Jukebox: A generative model for music (2020): OpenAI’s foray into music generation using transformers, capable of producing music, vocals, and lyrics with minimal training. (blog post)
Jukebox:A generative model for music(2020)OpenAI使用變形金剛進軍音樂生成領(lǐng)域,能夠以最少的培訓(xùn)制作音樂,人聲和歌詞。(blog職位)
?AudioLM: a language modeling approach to audio generation (2022): AudioLM is a Google project for generating multiple types of audio, including speech and instrumentation. (blog post)
AudioLM:音頻生成的語言建模方法(2022):AudioLM是一個Google項目,用于生成多種類型的音頻,包括語音和樂器。(blog職位)
?MusicLM: Generating nusic from text (2023): Current state of the art in AI-based music generation, showing higher quality and coherence than previous attempts. (blog post)
MusicLM:Generating nusic from text(2023):基于AI的音樂生成的最新技術(shù)水平,顯示出比以前的嘗試更高的質(zhì)量和一致性。(blog職位)
Multi-dimensional image generation
多維圖像生成
?NeRF: Representing scenes as neural radiance fields for view synthesis (2020): Research from a UC-Berkeley-led team on “synthesizing novel views of complex scenes” using 5D coordinates. (website)
NeRF:將場景表示為用于視圖合成的神經(jīng)輻射場(2020):加州大學(xué)伯克利分校領(lǐng)導(dǎo)的團隊使用5D坐標“合成復(fù)雜場景的新穎視圖”的研究。(網(wǎng)址)
?DreamFusion: Text-to-3D using 2D diffusion (2022): Work from researchers at Google and UC-Berkeley that builds on NeRF to generate 3D images from 2D inputs. (website)
DreamFusion:使用2D擴散的文本到3D(2022):Google和UC-Berkeley的研究人員的工作,建立在NeRF的基礎(chǔ)上,從2D輸入生成3D圖像。(網(wǎng)址)
原文地址:https://a16z.com/2023/05/25/ai-canon/