GPT-4 scored in the top 1% (relative to humans) on a creativity test.
"Dr. Erik Guzik, an assistant clinical professor in UM's College of Business, and his partners used the Torrance Tests of Creative Thinking, a well-known tool used for decades to assess human creativity."
"The researchers submitted eight responses generated by ChatGPT, the application powered by the GPT-4 artificial intelligence engine. They also submitted answers from a control group of 24 UM students taking Guzik's entrepreneurship and personal finance classes. These scores were compared with 2,700 college students nationally who took the TTCT in 2016. All submissions were scored by Scholastic Testing Service, which didn't know AI was involved."
"The results placed ChatGPT in elite company for creativity. The AI application was in the top percentile for fluency -- the ability to generate a large volume of ideas -- and for originality -- the ability to come up with new ideas. The AI slipped a bit -- to the 97th percentile -- for flexibility, the ability to generate different types and categories of ideas."
The Torrance Tests of Creative Thinking is a basically a test of "divergent" thinking. Normally when you take a test, it's a "convergent" test, meaning there's a specific, correct answer that students are expected to "converge" on. If the question is, what's 2 + 2, everyone is supposed to converge on 4. With a "divergent thinking" test, there's no "correct" answer and the more "divergent" the answer(s) given, the better.
In the case of the TTCT, there's a series of tasks, classified as "verbal tasks using verbal stimuli", "verbal tasks using non-verbal stimuli", and "non-verbal tasks". In the "verbal tasks using verbal stimuli" category are such tasks as "unusual uses" (name all the uses you can think of for tin cans and books), "impossibilities" (list as many impossible things as you can), "consequences" (list out consequences to improbable situations), "just suppose" (list out consequences after a new or unknown variable is injected into a situation), "situations" (given problems, think of as many solutions as possible), "common problems" (given situations, think of as many problems as possible that could arise in those situations), "improvement" (given common objects, list as many ways as you can to improve each object), "the Mother Hubbard problem" (Mother Hubbard has 12 children and each child needs ...), "imaginative stories" (write the most interesting and exciting story you can think of at this exact moment), and "cow jumping" (think of all possible things which might have happened when the cow jumped over the moon).
In the "verbal tasks using nonverbal stimuli" category, we have such tasks as "ask and guess" (ask as many questions as you can about a picture which cannot be answered by looking at the picture), "product improvement" (given a toy, think of as many improvements as you can which would make it more fun), and "unusual uses" (think of the most unusual uses of a toy, other than as a toy".
In the "non-verbal tasks" category we have such tasks as "incomplete figures" (add lines to a figure), "picture construction" (given a simple shape, construct a picture of which that shape is an integral part), "circles and squares" (given a page full of circles, make objects that have circles as a major part of them, then given a page full of squares, do the same thing), "creative design" (given circles, strips, scissors, and glue, construct creative designs -- somehow I doubt GPT-4 was given this one).
The submissions are scored for fluency (total number of responses with responses deemed uninterpretable, meaningless, or irrelevant thrown out), flexibility (the number of different categories of relevant responses), originality (the statistical rarity of the responses), and elaboration (the amount of detail in the responses).
"Guzik said the TTCT is protected proprietary material, so ChatGPT couldn't 'cheat' by accessing information about the test on the internet or in a public database."
"Guzik said he asked ChatGPT what it would indicate if it performed well on the TTCT." "ChatGPT told us we may not fully understand human creativity, which I believe is correct. It also suggested we may need more sophisticated assessment tools that can differentiate between human and AI-generated ideas."
UM Research: AI tests into top 1% for original creative thinking
#solidstatelife #ai #nlp #llms #genai #gpt #creativity #ttct