The notion of measuring "AI-generated text" as a fixed percentage of an academic submission is fundamentally flawed. This metric implies a homogeneous substance, akin to measuring the alcohol content in a beverage. However, my recent survey suggests that academic integrity associated with AI use is far from homogeneous. The survey asked educators to evaluate the ethical implications of using AI for twelve different tasks in writing an academic paper, ranging from researching to brainstorming to editing to actually writing full sections.
The findings revealed significant variance in responses. While many respondents were comfortable with AI aiding in brainstorming ideas, they expressed reservations or outright disapproval of AI writing entire paragraphs or papers. This disparity underscores a critical issue: there is no consensus in the academic profession on what constitutes acceptable AI assistance in learning. More strikingly, within each individual's responses, there was considerable variation in how different AI uses were assessed.
Consider the implications of a tool like Turnitin reporting "50% AI-generated" content. What does this figure actually represent? It lacks context about how the AI-generated content was incorporated. For instance, a paper could be largely original, with only minor edits made by AI at the end, potentially showing a high percentage of AI contribution. Conversely, a student might contribute minimally to an essentially AI-written paper, making slight modifications to reduce the AI-detected percentage. Both scenarios could yield vastly different percentages, yet the ethical implications are markedly divergent.
The pursuit of better detection technology misses the point. The issue is not with the detection capabilities but with the construct itself. The very idea of "AI-generated text" as a unified concept is problematic. Just as a depression inventory measures various symptoms that converge on the underlying construct of depression, our methods for evaluating AI in academic work must recognize the diverse and context-dependent nature of its use. The current approach, which treats all AI contributions as equivalent, is akin to judging a book's genre by counting its words. I which Turnitin and other commercial "AI Detectors" would show just a little more integrity and stop selling us the snake oil. They must know for sure that their claims are bogus, because AI-generated text is not a valid construct to be measured.
Instead of focusing obsessively on detecting AI-generated content, we need to shift our perspective. We should expect and require students to use AI as part of their learning process. The challenge then becomes developing assignments that not only measure the content knowledge but also the meta-AI skills and competencies necessary to navigate and leverage these tools effectively. This approach acknowledges the complexity of AI's applications and ensures it is used responsibly, promoting a learning environment that respects both the potential and the limitations of artificial intelligence.