EdX, a nonprofit enterprise founded by
Harvard and the Massachusetts Institute of Technology, will release automated
software that uses artificial intelligence to grade student essays and short
written answers.
Imagine taking a college exam, and, instead
of handing in a blue book and getting a grade from a professor a few weeks
later, clicking the “send” button when you are done and receiving a grade back
instantly, your essay scored by a software program. And then, instead of being done with that
exam, imagine that the system would immediately let you rewrite the test to try
to improve your grade.
EdX, the nonprofit
enterprise founded by Harvard and the Massachusetts
Institute of Technology to offer courses on
the Internet, has just introduced such a system and will make its automated
software available free on the Web to any institution that wants to use it. The
software uses artificial intelligence to grade student essays and short written
answers, freeing professors for other tasks.
The new service will bring the educational
consortium into a growing conflict over the role of automation in education.
Although automated grading systems for multiple-choice and true-false tests are
now widespread, the use of artificial intelligence technology to grade essay
answers has not yet received widespread endorsement by educators and has many
critics.
Anant Agarwal, an electrical engineer who is
president of EdX, predicted that the instant-grading software would be a useful
pedagogical tool, enabling students to take tests and write essays over and
over and improve the quality of their answers. He said the technology would
offer distinct advantages over the traditional classroom system, where students
often wait days or weeks for grades.
“There is a huge value in learning with
instant feedback,” Dr. Agarwal said. “Students are telling us they learn much
better with instant feedback.”
But skeptics say the automated system is no
match for live teachers. One longtime critic, Les Perelman, has drawn national
attention several times for putting together nonsense essays that have fooled
software grading programs into giving high marks. He has also been highly critical of studies that purport to show that the software
compares well to human graders.
“My first and greatest objection to the
research is that they did not have any valid statistical test comparing the
software directly to human graders,” said Mr. Perelman, a retired director of
writing and a current researcher at M.I.T.
He is among a group of educators who last
month began circulating a petition opposing automated assessment software. The
group, which calls itself Professionals Against Machine Scoring of
Student Essays in High-Stakes Assessment, has collected nearly 2,000 signatures,
including some from luminaries like Noam Chomsky.
“Let’s face the realities of automatic essay
scoring,” the group’s statement reads in part. “Computers cannot ‘read.’ They
cannot measure the essentials of effective written communication: accuracy,
reasoning, adequacy of evidence, good sense, ethical stance, convincing
argument, meaningful organization, clarity, and veracity, among others.”
But EdX expects its software to be adopted
widely by schools and universities. EdX offers free online classes from
Harvard, M.I.T. and the University of California, Berkeley; this fall, it will
add classes from Wellesley, Georgetown and the University of Texas. In all, 12
universities participate in EdX, which offers certificates for course
completion and has said that it plans to continue to expand next year,
including adding international schools.
The EdX assessment tool requires human
teachers, or graders, to first grade 100 essays or essay questions. The system
then uses a variety of machine-learning techniques to train itself to be able
to grade any number of essays or answers automatically and almost
instantaneously.
The software will assign a grade depending on
the scoring system created by the teacher, whether it is a letter grade or
numerical rank. It will also provide general feedback, like telling a student
whether an answer was on topic or not.
Dr. Agarwal said he believed that the
software was nearing the capability of human grading.
“This is machine learning and there is a long
way to go, but it’s good enough and the upside is huge,” he said. “We found
that the quality of the grading is similar to the variation you find from
instructor to instructor.”
EdX is not the first to use automated
assessment technology, which dates to early mainframe computers in the 1960s.
There is now a range of companies offering commercial programs to grade written
test answers, and four states — Louisiana, North Dakota, Utah and West Virginia
— are using some form of the technology in secondary schools. A fifth, Indiana,
has experimented with it. In some cases the software is used as a “second
reader,” to check the reliability of the human graders.
But the growing influence of the EdX
consortium to set standards is likely to give the technology a boost. On
Tuesday, Stanford announced that it would work with EdX to develop a joint
educational system that will incorporate the automated assessment technology.
Two start-ups, Coursera and Udacity,
recently founded by Stanford faculty members to create “massive open online
courses,” or MOOCs, are also committed to automated assessment systems because
of the value of instant feedback.
“It allows students to get immediate feedback
on their work, so that learning turns into a game, with students naturally
gravitating toward resubmitting the work until they get it right,” said Daphne
Koller, a computer scientist and a founder of Coursera.
Last year the Hewlett Foundation, a
grant-making organization set up by one of the Hewlett-Packard founders and his
wife, sponsored two $100,000 prizes aimed at improving software that grades
essays and short answers. More than 150 teams entered each category. A winner of
one of the Hewlett contests, Vik Paruchuri, was hired by EdX to help design its
assessment software.
“One of our focuses is to help kids learn how
to think critically,” said Victor Vuchic, a program officer at the Hewlett
Foundation. “It’s probably impossible to do that with multiple-choice tests.
The challenge is that this requires human graders, and so they cost a lot more
and they take a lot more time.”
Mark D. Shermis, a professor at the
University of Akron in Ohio, supervised the Hewlett Foundation’s contest on
automated essay scoring and wrote a paper about the experiment. In his view, the
technology — though imperfect — has a place in educational settings.
With increasingly large classes, it is
impossible for most teachers to give students meaningful feedback on writing
assignments, he said. Plus, he noted, critics of the technology have tended to
come from the nation’s best universities, where the level of pedagogy is much
better than at most schools.
“Often they come from very prestigious
institutions where, in fact, they do a much better job of providing feedback
than a machine ever could,” Dr. Shermis said. “There seems to be a lack of
appreciation of what is actually going on in the real world.”
Source: New York Times
Source: New York Times
No comments:
Post a Comment