The community around Stable Diffusion, the open source AI project for text-to-image generation, has been buzzing. From nonexistent a year ago to thousands of contributors and forks and spinoffs. There’s even a GUI macOS app.
Lexica is a project to index and make prompts and images from Stable Diffusion searchable. Playing around with it, it’s pretty impressive. So much incredible possibility here. This tech will make the volume of content on the internet literally infinite.
This interview was one of the best overviews and deep dives on the current state of AI / machine learning I’ve heard yet. Daniel was at Apple in the early work on machine learning in iOS, and Nat Friedman was CEO of GitHub during their development of the excellent Copilot product.
Nat on the previously-predicted tendency toward centralization in AI:
The centralization/decentralization thing is fascinating because I also bought the narrative that AI was going to be this rare case where this technology breakthrough was not going to diffuse through the industry and would be locked up within a few organizations. There were a few reasons why we thought this. One was this idea that maybe the know-how would be very rare, there’d be some technical secrets that wouldn’t escape. What we found instead is that every major breakthrough is incredibly simple and it’s like you could summarize on the back of an index card or ten lines of code, something like that. The ML community is really interconnected, so the secrets don’t seem to stay secret for very long, so that one’s out, at least for most organizations.
Daniel on the importance of the right interface for widening AI applications:
We’re in this new era where new user interfaces are possible and it’s somewhere in between the spectrum of a GUI and a voice or text user interface. I don’t think it’ll be text just because in the domain of images, sure, all mistakes are actually features, great, but the issue that you have is in real domains, like you mentioned legal, tax, where productive work is made, mistakes are bad. The issue with text is of one observation we always had from Apple is unlike a GUI, the customer does not understand the boundaries of the system. So unless, to Nat’s point, if you have AGI and it’s smarter than a human, great. Up until that point, you need something that has this feature that the GUI has, which is amazing. The GUI only shows you buttons you can press on, it doesn’t have buttons that don’t work, usually.
Martin Gurri is one of the best minds we have for the current moment. Make sure to subscribe to his essays on the Mercatus Center’s “The Bridge.”
The American people appear to be caught in the grip of a psychotic episode. Most of us are still sheltering in place, obsessed with the risk of viral infection, primly waiting for someone to give us permission to shake hands with our friends again. Meanwhile, online and on television, we watch, as in a dream, crowds of our fellow citizens thronging into the streets of our cities, raging at the police and the established order generally, with some engaged in arson, looting, and violence.
On one side, a reflexive obedience to authority. On the other, a near-absolute repudiation of the rules of the system—for some, of any restraint whatever. The future will be determined by the uncertain relationship between these two extremes.
My friend and former colleague Kevin Stofan wrote the launch post for DataRobot’s latest product additions for spatial AI. Pretty amazing additions to their platform.
A discussion among physicians on how oncology is changing and will likely continue to evolve in the wake of the coronavirus. Testing, chemo, and other treatment steps currently considered to be standards of care will change, and things like telemedicine will change what options doctors have in working with patients.
I’ve got a set of scans and a follow up this week, so will see how Mayo Clinic has adapted their approach in response to this crisis.
We’ve been doing some thinking on our team about how to systematically address (and repay) technical debt. With the web of interconnected dependencies and micro packages that exists now through tools like npm and yarn, no single person can track all the versions and relationships between modules. This post proposes a “Dependency Drift” metric to quantify how far out of date a codebase is on the latest updates to its dependencies:
Create a numeric metric that incorporates the volume of dependencies and the recency of each of them.
Devise a simple high level A-F grading system from that number to communicate how current a project is with it’s dependencies. We’ll call this a drift score.
Regularly recalculate and publish for open source projects.
Publish a command line tool to use in any continuous integration pipeline. In CI, policies can be set to fail CI if drift is too high. Your drift can be tracked and reported to help motivate the team and inform stakeholders.
Use badges in source control README files to show drift, right alongside the projects’s Continuous Integration status.
A technical write-up on a Google chatbot called “Meena,” which they propose has a much more realistic back-and-forth response technique:
Meena is an end-to-end, neural conversational model that learns to respond sensibly to a given conversational context. The training objective is to minimize perplexity, the uncertainty of predicting the next token (in this case, the next word in a conversation). At its heart lies the Evolved Transformer seq2seq architecture, a Transformer architecture discovered by evolutionary neural architecture search to improve perplexity.
John Gruber uses the iPad’s recent 10th birthday to reflect missed opportunity and how much better a product it could be/could have been:
Ten years later, though, I don’t think the iPad has come close to living up to its potential. By the time the Mac turned 10, it had redefined multiple industries. In 1984 almost no graphic designers or illustrators were using computers for work. By 1994 almost all graphic designers and illustrators were using computers for work. The Mac was a revolution. The iPhone was a revolution. The iPad has been a spectacular success, and to tens of millions it is a beloved part of their daily lives, but it has, to date, fallen short of revolutionary.
I would agree with most of his criticisms, especially on the multitasking UI and the general impenetrability of the gesturing interfaces. As a very “pro iPad” user, I would love to see a movement toward the device coming into its own as a distinctly different platform than macOS and desktop computers. It has amazing promise even outside of creativity (music, art) and consumption. With the right focus on business model support, business productivity applications could be so much better.
I don’t know what Lex Fridman is doing to recruit the guests he gets on his show (The Artificial Intelligence Podcast), but it’s one of the best technical podcasts out there.
This one is a good introduction to the work of legendary psychologist Daniel Kahneman (of Thinking, Fast and Slow fame).
This is a new notes app from Brett Terpstra (creator of nvALT) and Fletcher Penney (creator of MultiMarkdown). I used nvALT for years for note taking on my Mac. This new version looks like a slick reboot of that with some more power features. In private beta right now, but hopefully dropping soon.
Progress itself is understudied. By “progress,” we mean the combination of economic, technological, scientific, cultural, and organizational advancement that has transformed our lives and raised standards of living over the past couple of centuries. For a number of reasons, there is no broad-based intellectual movement focused on understanding the dynamics of progress, or targeting the deeper goal of speeding it up. We believe that it deserves a dedicated field of study. We suggest inaugurating the discipline of “Progress Studies.”
Patrick Collison and Tyler Cowen co-authored this piece for The Atlantic making the case for a new science to study how we create progress.
Looking backwards, it’s striking how unevenly distributed progress has been in the past. In antiquity, the ancient Greeks were discoverers of everything from the arch bridge to the spherical earth. By 1100, the successful pursuit of new knowledge was probably most concentrated in parts of China and the Middle East. Along the cultural dimension, the artists of Renaissance Florence enriched the heritage of all humankind, and in the process created the masterworks that are still the lifeblood of the local economy. The late 18th and early 19th century saw a burst of progress in Northern England, with the beginning of the Industrial Revolution. In each case, the discoveries that came to elevate standards of living for everyone arose in comparatively tiny geographic pockets of innovative effort. Present-day instances include places like Silicon Valley in software and Switzerland’s Basel region in life sciences.
George Hotz is the founder of Comma.ai, a machine learning based vehicle automation company. He is an outspoken personality in the field of AI and technology in general. He first gained recognition for being the first person to carrier-unlock an iPhone, and since then has done quite a few interesting things at the intersection of hardware and software.
This is an interesting interview with Been Kim from Google Brain on developing systems for seeing how trained machines make decisions. One of the major challenges with neural network-based based deep learning systems is that the decision chain used by the AI is a black box to humans. It’s difficult (or impossible) for even the creators to figure out what factors influenced a decision, and how the AI “weighted” the inputs. What Kim is developing is a “translation” framework for giving operators better insight into the decision chain of AI:
Kim and her colleagues at Google Brain recently developed a system called “Testing with Concept Activation Vectors” (TCAV), which she describes as a “translator for humans” that allows a user to ask a black box AI how much a specific, high-level concept has played into its reasoning. For example, if a machine-learning system has been trained to identify zebras in images, a person could use TCAV to determine how much weight the system gives to the concept of “stripes” when making a decision.
I enjoyed this interview with robotics professor Rodney Brooks on EconTalk. The conversation around AI and automation in the popular conversation is so charged, it’s good to hear a perspective that brings some reason into the discussion. The collective conversation on the subject of AI, driverless vehicles, and other forms of automation leans toward “it’ll be here tomorrow” or “we’ll never have any automation.” I think there’s too much pessimism in the former view, and too little optimism in the latter.
Brooks (who has spent his entire career on robotics and intelligence, currently at MIT) brings some reason to the subject — that the truth is somewhere in between. He puts it best talking about how your average “expert” views the future of technology development:
“We tend to overestimate impacts in the short term, and underestimate them in the long term.”
This week was Amazon’s annual re:Invent conference, where they release n + 10 new products for AWS (where n is the number of products launched at last year’s event). It’s mind-boggling how many new things they can ship each year.
SageMaker was launched last year as a platform for automating machine learning pipelines. One of the missing pieces was the ability to build training datasets with your own custom data. That’s the intent with Ground Truth. It supports building your dataset in S3 (like a group of images), creating a labeling task, and distributing it to a team to annotate to train a model. It integrates with Mechanical Turk, Amazon’s network of third-party vendors, or your own private team. This is awesome for anyone with massive datasets but no easy-to-use system to build the training info.
This, combined with their Rekognition product open up some interesting possibilities for image recognition use cases I’d like to test out.
The cascading effect of a world with no human drivers is my favorite “what if” to consider with the boom of electric, autonomous car development. Benedict Evans has a great analysis postulating several tangential effects:
However, it’s also useful, and perhaps more challenging, to think about the second and third order consequences of these two technology changes. Moving to electric means much more than replacing the gas tank with a battery, and moving to autonomy means much more than ending accidents. Quite what those consequences would be is much harder to predict: as the saying goes, it was easy to predict mass car ownership but hard to predict Walmart, and the broader consequences of the move to electric and autonomy will come in some very widely-spread industries, in complex interlocked ways.
Siddhartha Mukherjee looks at the potential for AI in medicine, specifically as a diagnostic tool. Combine processing and machine learning with sensors everywhere, and things get interesting:
Thrun blithely envisages a world in which we’re constantly under diagnostic surveillance. Our cell phones would analyze shifting speech patterns to diagnose Alzheimer’s. A steering wheel would pick up incipient Parkinson’s through small hesitations and tremors. A bathtub would perform sequential scans as you bathe, via harmless ultrasound or magnetic resonance, to determine whether there’s a new mass in an ovary that requires investigation. Big Data would watch, record, and evaluate you: we would shuttle from the grasp of one algorithm to the next. To enter Thrun’s world of bathtubs and steering wheels is to enter a hall of diagnostic mirrors, each urging more tests.
This piece is one of the best explanations of neural networks I’ve read.
If you follow the Apple universe, you’ve surely heard the frustration of professional Mac users who’ve felt abandoned by Apple neglecting their pro hardware for 3 years. They’re resurrecting the lineup now with a redesigned Mac Pro. The craziest bit about this story is that Apple is coming out of the shell to talk about a new product months before launch, to a handful of select journalists.
Trying out a new thing here to document 3 links that caught my interest over the past week. Sometimes they might be related, sometimes not. It’ll be an experiment to journal the things I was reading at the time, for posterity.
Good piece from Ben Thompson comparing the current developmental stage of machine learning and AI with the formative years of Claude Shannon and Alan Turing’s initial discoveries of information theory. They figured out how to take mathematical logic concepts (Boolean logic) and merge them with physical circuits — the birth of the modern computer. With AI we’re on the brink of similar breakthroughs. Thompson does well here to make clear the distinctions between Artificial General Intelligence (what most people think of when they hear the term, things like Skynet) and Narrow Intelligence (which is all we have currently, AIs that can replicate human thinking in a narrow problem set).
Apple announced their new APFS file system at last year’s WWDC, and this week launched it as part of the iOS 10.3 update. Their HFS+ file system is now 20 years old, but file systems aren’t something that you change lightly. They’re the core data storage and retrieval engine for computers, and massively complex. APFS is engineered with encryption as a first-class feature and also includes enhancements for SSD-based storage. The most amazing thing to me about this story is the guts it takes to make a seismic change like this to millions of devices in one swoop. It’s the sort of change that is 100% invisible to the average iPhone owner if it works, and could brick millions of phones if it doesn’t. Working in a software company building mission-critical software, it takes serious planning, testing, and skills to deploy risky changes like this to move your platform forward. Kudos to Apple for pulling off such a monumental and thankless change.
I’ve read Fred Wilson’s AVC blog for some time, but only through post links that make the rounds. Recently I discovered his archive of “MBA Mondays” articles covering tons of business topics. He’s got pieces on budgeting, cash flow, equity, M&A, unit economics — tons of great stuff from someone learning and practicing all of this in reality. Much more digestible than textbook business school material. I’m gradually making my way through the archive from the beginning and really enjoying it.
Great post from Benedict Evans on the state of voice computing in 2017. On wider answer domains and creating the uncanny valley:
This tends to point to the conclusion that for most companies, for voice to work really well you need a narrow and predictable domain. You need to know what the user might ask and the user needs to know what they can ask.
This has been the annoyance with voice UIs. For me Siri was really the first commonplace voice interface I ever tried for day to day things. The dissonance between “you can say a few things” and “ask me anything” has been the issue with Siri. Apple set false expectations of the technology that end up creating a let down. Evans makes a good point on the combination of selecting the right problem and narrowing the domain:
This was the structural problem with Siri - no matter how well the voice recognition part worked, there were still only 20 things that you could ask, yet Apple managed to give people the impression that you could ask anything, so you were bound so ask something that wasn’t on the list and get a computerized shrug. Conversely, Amazon’s Alexa seems to have done a much better job at communicating what you can and cannot ask. Other narrow domains (hotel rooms, music, maps) also seem to work well, again, because you know what you can ask. You have to pick a field where it doesn’t matter that you can’t scale.
With the expansion of this tech in Google Now, Alexa, Siri and others, the problem becomes “what can I ask?” rather than the technical conversion of speech to text and text to command. “Ask me anything” is a non-starter, because right now you know the failure rate on any given question will be high. This is what happened with Siri and many users; it only takes a few failures of what we perceive as simple answers to switch us off entirely. I gave up on Siri years ago, and I wonder how hard it’ll be for Apple to reframe the perception of the technology to restore that confidence.