On June 2nd, the New York Times declared in an opinion piece that It’s the End of Programming as We Know It. (And I Feel Fine.). Perhaps this struck fear into your heart, gentle reader, but this is far from the first time seasoned developers have heard such proclamations. ChatGPT and Copilot are remarkably compelling, and the Times article’s author makes a nuanced point that has merit, so it’s worth assessing what these tools are and how they impact the work of a software developer.
Let’s see whether there’s a fire under all the smoke.
tl;dr (courtesy of GPT-3.5): The article discusses the impact of AI tools like ChatGPT and Copilot on software development. While these tools have strengths in generating content and summarizing data, they also have weaknesses such as hallucinating and erratic behavior. There are also legal concerns surrounding their use, particularly regarding training data and copyright infringement. The article suggests using these tools for filler content, prototyping, and summarizing data, but always vetting the results. The author argues that programming is not dead, as these tools cannot replace the need for skilled programmers to ensure safe, efficient, and maintainable code.
What is AI?
As one Twitter user quipped, artificial intelligence is “a poor choice of words made in 1956”. The term itself—artificial intelligence—is an anthropomorphism that biases us to ascribe characteristics and behavior that these programs do not possess. Even so, we shouldn’t dismiss the capabilities of these programs outright simply because they lack “real” intelligence—sometimes called AGI.
In a recent interview, sci-fi writer Ted Chiang suggested that a more accurate label is applied statistics: programs that compute results based on the correlations observed in the data set on which they trained. In plain speak, these new AI programs are really good at word association. They’ve scanned enough novels, cookbooks, and blog entries to recognize that buffalo is more likely to be followed by wing than tennis, but that doesn’t mean they comprehend what a buffalo wing is. Likewise, they’ve trained on enough pictures of people to know that hair doesn’t spontaneously turn into feathers or static; hair tends to keep looking like hair.
What is ChatGPT?
ChatGPT is a chatbot created by a company named OpenAI. ChatGPT is a generative, transformer-based program—the G&T in GPT—that can provide significantly more plausible-sounding responses than many previous methodologies. The quality of its responses catapulted modern AI’s capabilities into the mainstream after its public launch in November 2022. With an initial release built atop the GPT-3.5 model, OpenAI has subsequently released an update to ChatGPT based on their GPT-4 model, which offers even more impressive results.
What is Copilot?
GitHub Copilot is a programming-oriented, generative AI system built atop OpenAI’s GPT-3 model. In addition to being trained on natural language examples, this AI-powered assistant used code from public GitHub repos for its training. Launched in June 2021, software developers can use it to auto-complete code or fill gaps based on the comments they provide.
What about Adobe Firefly, DALL-E, Midjourney, and Stable Diffusion?
Adobe Firefly, DALL-E, Midjourney, and Stable Diffusion are generative AI systems—starting to notice a pattern here?—aimed at producing images from text prompts. With public launches dating as far back as January 2021, these systems have become increasingly capable over the last few years (this Reddit thread shows Midjourney’s improvement). We’ve already crossed a point where—for some of the better images coming out of these systems—an average person cannot tell whether a photorealistic image is authentic or AI-generated.
What are their weaknesses?
While these tools are remarkable, a notable weakness is their tendency to hallucinate: to confidently respond with misinformation.
- Two New York lawyers recently landed in hot water after ChatGPT drafted a document that cited six nonexistent cases.
- The hands generated by these tools have become the stuff of memes, with many of the results serving as nightmare fuel.
- A university professor failed his entire class because ChatGPT falsely asserted that it wrote his students’ papers.
- A peer-reviewed study from 2021 found that for some scenarios, roughly 40% of the code generated by Copilot was insecure.
Maintaining context is another area of weakness. ChatGPT keeps a rolling record of what has been said in the conversation—both prompts and responses—to produce results that fit within the context of what has been said. After a while, ChatGPT will eventually “forget” earlier parts of the conversation, with subsequent responses seeming detached. Longer prompts or responses can make this happen faster.
In the worst case, this can lead to another problem: a tendency toward erratic behavior. If an earlier response was particularly long, the bot might “forget” everything prior and assume its response represents the previous conversation, leading to an unexpected turn if that assumption is wrong. In one particularly extreme example, the Bing chatbot tried to convince a New York Times writer to leave his marriage.
Finally, these bots are subject to prompt injection in which malicious actors use various techniques to trick the bot into producing responses contrary to the rules by which it is supposed to abide. A typical example is DAN—short for “Do Anything Now”—in which ChatGPT is instructed to simulate a bot that would do anything the user requested.
What are their strengths?
As generative programs, these tools excel at generating new content in the vein of the content on which they were trained, be it code with Copilot, chat-like responses with ChatGPT, or images with the tools listed above. They also excel at summarizing data, allowing users to quickly glean important details from data that might have otherwise taken hours or days to review.
Because these are automated systems, they are especially effective at rapidly operating on or producing large quantities of content. Their nondeterministic nature also means that they can generate a wide variety of content, but they may struggle when asked to do something for which they don’t have examples in their training data.
Are there risks to using them?
As mentioned earlier, a primary risk is the possibility that a user may be misled by a hallucination. In addition to the two lawyers who are currently facing the possibility of sanctions due to their inclusion of false data generated by ChatGPT , a structural engineer also noted how ChatGPT can generate incorrect results that seem plausible to anyone operating outside their field of expertise.
There are also legal questions that have yet to be fully resolved. While this list is far from exhaustive, some potential avenues for concern might depend on:
- The data used for training: Several of these tools trained on data they allegedly lacked permission to use, hence why GitHub, Microsoft, and OpenAI are facing multiple lawsuits.
- Whether it reproduced someone else’s work: Some of these tools can—without warning or intent on your part—generate content that includes logos, characters, or the works of others.
- Whether confidentiality matters: Many of these tools require that you send data to a third-party cloud service, some of which have terms that permit them to use that data to train their system. Anyone under an NDA or other confidentiality agreement will want to be certain that the data they are sending is safe to send.
What are ideal use cases?
Setting aside the risks mentioned above, the following may be the sorts of uses where these tools can shine:
- Use them for filler content: For things like stock photography or bulk copy, these tools can create results that may be sufficient with minimal additional work.
- Use them to prototype: Rather than coming empty-handed to your first meeting with an artist or writer, a few minutes with an AI tool may be enough to produce a mood board or portfolio that jump-starts the conversation, saving untold hours.
- Use them to lay the groundwork as a content creator: Rather than spending hours sketching ideas, use these tools to rapidly zero in on a style or shape for the work you want, then massage the most promising results in a text, code, or image editor.
- Use them to summarize content: While they shouldn’t be trusted to capture everything or to be 100% factual, these tools can help you get up to speed with a large body of text more quickly.
- Use them for translation: With the same caveats as the previous point, these tools generally provide translations that capture more nuance than was previously possible.
- Use them to find needles in haystacks: Again, while they can’t be trusted to find everything, if you throw a large body of text at ChatGPT and ask it to find something you’re looking for, it can sometimes have surprisingly good results.
An important note: always, always, always vet the results. The results these programs produce can be erratic or unexpected. Any work they generate should be verified by a human before use.
How is this technology being used today?
These technologies have generally seen a slow rollout in commercial use due to the legal, technical, and ethical issues surrounding their use. Even so, a few examples are already starting to take shape.
- Ubisoft recently announced Ghostwriter, an AI-powered tool that can help their writers generate “barks”—background dialog used by the NPCs that fill large video games—en masse.
- Blizzard is rolling out Blizzard Diffusion to its teams, a tool for rapidly generating concept art for games like Diablo IV or World of Warcraft.
- Adobe Firefly was recently introduced as a beta feature in Photoshop, available to any of its users. The demo video for its launch is worth a watch.
- Google Bard and the ChatGPT-powered Microsoft Bing chatbot are provided as alternatives to the traditional search engine experience. Experts suggest these tools could supplant search engines in a few years, which the New York Times reports is “Code Red” for Google.
- In late 2022, CNET Money published dozens of articles generated by AI-powered tools. They buried that information in the fine print, leading to a public outcry once the activity was discovered.
Is my job at risk?
For some fields, the answer is yes. There are two primary concerns:
- Efficiency gains mean fewer employees are needed. If AI tooling can hasten your work, your company may not need as many people doing the work you do.
- AI is reaching a “good enough” point in some fields. If your work can be verified by a layperson who lacks your expertise, they may be able to get good enough results from a supervised AI.
The first is simply an AI-driven twist on a story as old as time. But the second one is of more concern because it affects commercial artists, photographers, writers, paralegals, and, yes, even some software developers. If, for instance, you’re a coder whose bread and butter is producing blog templates for clients, you may find that your clients can get by with an AI-based tool very soon.
Is programming dead?
Allow me to offer an emphatic NO. While some low-hanging tasks may be affected, these current systems are inherently incapable of displacing the need for programming.
Reason 1: Results are unverifiable by laypeople
As mentioned earlier, one peer-reviewed study indicated that as much as 40% of the code generated by Copilot is insecure. These tools are specifically designed to produce varied responses, so no matter how good they become there will always be a risk that they exhibit undesirable behavior. As such, their results must always be vetted. And lest you think you can let Copilot do the heavy lifting and then hire novice developers to clean up its mistakes, a different study found that participants had more difficulty correcting errors in Copilot’s code than their own. Yet another found that participants were less successful at completing tasks when using Copilot. Copilot, and others like it, show potential as exploratory tools or accelerators for programmers, but a layperson cannot vet whether the resulting code is safe, efficient, or maintainable, so an experienced programmer will necessarily need to be involved.
Reason 2: The use of natural languages
These tools gained traction because they respond to natural languages like English, French, or Chinese, putting their power within reach of billions. But the fact that they operate on natural languages is precisely why they can’t kill programming: natural languages are, by nature, ambiguous, lacking in specificity and formal definition.
For a wide variety of use cases, that’s fine. These tools can churn out dozens or hundreds of variations as a non-expert iterates on the prompt until they get an acceptable result. That’s precisely why they’re so good at the verifiable tasks listed earlier.
For other use cases, however, an ambiguously-interpreted result is useless. Try explaining to an IRS auditor that it’s the AI’s fault for misunderstanding your intent. Every programmer I’ve talked to who has tried these tools has had a moment where Copilot or ChatGPT wasn’t getting things quite right, so they kept getting more and more specific—more and more technical—until they had no choice but to express their intent in a programming language, at which point the tool failed at its intended purpose. That situation is precisely why programming languages exist and why programmers will always be needed. No matter how good these tools get at understanding natural language, the only way a person can unambiguously express their intent is with a programming language.
Where do we go from here?
Sit tight. It’s going to be a bumpy next few years as the legal questions get sorted out. I expect some early adopters will come to regret their hastiness later when they’re forced to scrub AI-generated content from their site or app after the courts start holding businesses accountable for content that infringes on the rights of others. That said, some tools can be self-hosted and trained on your own data, potentially allowing you or your company to get a leg up while steering clear of the thorniest legal concerns.