Can AI compare and contrast the chemistry of two elements?

AI is becoming a credible way to write essays, which merits thought if we assess students using essays.

Oxford’s Chemistry degree typically asks third year students to write an essay which compares and contrasts two elements. This blog post is my attempt to get a particular AI - Chat GPT - to answer this question, and an analysis of its answer. Can an AI do a task we demand of a chemist nearing graduation?

The prompt and the parameters

To prompt the AI you need to give it some text, as an instruction. Too short a prompt resulted in an essay which felt like an article from Encyclopaedia Britanica, so I eventually developed a prompt which gave clear technical details about what to include.

Write an essay comparing the chemistry of oxygen and sulphur. The essay should discuss atomic properties (including ionisation energies and radii), the various elemental forms of oxygen and sulphur, common oxidation states, the strong pi bonding of oxygen and the weak pi bonding of sulphur, and typical reactivity.

This level of specification was necessary to get a good-enough answer out of the AI, and it’s worth noting that this phrasing demands a reasonably high level of scientific insight. The spelling of sulphur (IUPAC: sulfur) did not seem to make a big difference, but the specification of pi bonding proved fiddly - there were ways of phrasing the prompt which led the AI to incorrectly state that sulphur formed stronger pi bonds than oxygen.

There is scope to tweak some of the parameters. I only changed one of them from its default setting: increasing the maximum length to 3567 in order to get a long-enough response to elicit a deep-enough answer. This is still a short answer, so it’s never going to be a fully fair comparison with a student answer (especially as diagrams are such an important part of chemistry exam technique). It’s still worth looking at, though.

The AI output

The elements oxygen and sulphur are both non-metals found at the top of Group 16 in the periodic table; they are known as the chalcogens. Oxygen and sulphur, while sharing many similarities in chemical properties, also possess some distinct differences.

Atomic properties of oxygen and sulphur are quite similar. Both elements have the same number of valence electrons, six, and both have a shared inner electron shell consisting of two electrons.

Though oxygen is slightly lighter than sulphur, their atomic radii are almost the same. In terms of ionisation energies, oxygen has a higher first ionisation energy than sulphur, and a lower second ionisation energy. In its elemental form, oxygen is a colourless, odourless gas at room temperature, while sulphur is a yellow solid. Oxygen can also exist as ozone (O3), a metalloid allotrope; and as diatomic oxygen (O2), a non-metal allotrope. Sulphur can also exist as two other non-metal allotropes - polymeric sulphur (S8), and mononuclear sulphur (S4).

Oxygen typically exists in oxidation states of -2, -1, 0, +1, and +2, while the oxidation states of sulphur range from -2 to +6. Oxygen has a higher tendency to accept electrons than sulphur, and is thus more likely to form compounds with higher oxidation states.

The strong pi bond of oxygen and the weak pi bond of sulphur affect their reactivity and chemical properties. Oxygen has a higher electronegativity than sulphur, allowing it to form strong double and triple bonds with other elements. This strong bonding makes oxygen very reactive, which is why it is essential for combustion and respiration. On the other hand, sulphur's weak pi bond means that it has a lower electronegativity, and forms weaker single and double bonds. This makes sulphur less reactive than oxygen.

In conclusion, oxygen and sulphur are two elements that share many similarities in their atomic properties and oxidation states, but also possess some distinct differences. Oxygen's strong pi bonding gives it a higher electronegativity and reactivity, while sulphur's weak pi bonding gives it a lower electronegativity and reactivity. This makes oxygen more reactive than sulphur, making it an essential part of combustion and respiration.

It’s not a great essay, but nor is it truly awful. I’ll start by discussing the good bits, then move onto the weaker ones.

The good bits

The answer keeps making the central comparison consistently, which is to its credit. Similarly, the introduction and conclusion give a clear structure to the flow of the writing. Overall, I feel these pieces of structure are a particularly successful part of the output.

The basic location of the elements in the periodic table is correct, as are the formulae of some of these elements’ allotropes (I’ve never heard of S4 [and surely S4 isn’t ‘mononuclear’], but perhaps it’s out there). The range of oxidation states is good factual content, and the scope for oxygen to draw out the higher oxidation states of its bonding partner might be a plausible reading of “more likely to form compounds with higher oxidation states“.

The reactivity paragraph has some correct content. Oxygen’s electronegativity is higher than sulphur’s. The strong bonds it forms with other elements driving its reactivity is a well-percieved point - I’d happily be ticking this if I saw this in a student’s essay.

The bad bits: errors

There are some important conceptual confusions, which might be themed as mistakes about causation. I don’t agree with “This strong bonding makes oxygen very reactive, which is why it is essential for combustion and respiration.” because being essential and being reactive seem like different ideas within this essay. Similarly “On the other hand, sulphur's weak pi bond means that it has a lower electronegativity“ can be criticised on its appeal to causation: weak pi bonding is not an obvious cause of low electronegativity.

There are also phrases which are ambiguous, even if they don’t stray into causally incorrect. I feel that the phrase “Oxygen has a higher electronegativity than sulphur, allowing it to form strong double and triple bonds with other elements“ doesn’t quite distinguish the role of electronegativity. Electronegativity is arguably why the bonds are strong, but it’s certainly not why the bonds are triple.

The bad bits: style

The brevity of the output is an important limitation here, and overall the text feels like it has quite a lot of waffle when so much of it is being used to lay out the (strong) overall structure. I wonder if this could be addressed by setting it smaller sub-essays on specific sub-topics and manually collating the outputs into one big essay.

The focus of the essay is weak, but weak in surprisingly human ways. Some of it reads as a shopping list of facts, which I have certainly seen students do. The odd, repeated focus on combustion and respiration perhaps reflects something about which sources the AI is accessing - but these kinds of school-level facts can often appear in students’ essays when they’re looking for points to make.

Significance is the biggest issue for me: the essay doesn’t quite get to the heart of the comparison between these elements. Perhaps this is just my judgement, but I would hope to see a student talk more about the atomic properties and the way these develop into the range of oxidation states. Despite my attempts to set up the prompt in this way, the output didn’t feel like a substantial insight into this chemistry.

Conclusion

A student submitting this essay would get a bad mark, and likely fail. But a longer answer - or well-prompted sub-answers stitched together - seem like they might approach the level of a plausible submission quite rapidly. Co-piloting strategies - where a student’s judgement is used to edit AI content - would likely result in a well-structured presentation of mostly-correct material in ways which could well rival a non-AI answer. We probably aren’t there yet for such a technical question, but maybe we will be one day.

It’s all a lot to think about, from an assessment perspective! It won’t greatly affect Oxford’s finals - where we ask this question to students - because of the use of exam conditions. It may be that other assessment formats are endangered by AI, but whatever the drawbacks of pen-and-paper exams they give minimal scope to carry in a laptop.

I heard the head of Google Education once responding to the criticism that students just google the answers to questions now. She said “if students can google the answers, perhaps you are asking the wrong questions”. I’m not sure I completely agree with her, but it’s worth considering this perspective in good faith.

Is AI a tool of the future, which we will have to embrace in assessments sooner or later (like we once did for the SpellCheck or the calculator)? Or does it so threaten independent thought that we should ban it and punish its users (like we do sometimes for English-French dictionaries and graphical calculators)?

Michael O'NeillAssessment