Sunday, October 27, 2013

A Mile Wide, A Mile Deep, Part 3: Assessment

Preface

Somewhere down the longitudinal trail a brave historian is going to ask the question:  How could a generally literate society not stand up and protest the destruction of real learning in its public K-12 schools:  By a U.S. Secretary of Education observing learning through a straw, while his chain is being yanked by Bill Gates; by a cabal of socially irresponsible testing corporations; by a generation of for-profit psychometricians operating in corporate bubbles; and by states’ political infrastructure putting ideology ahead of a nation’s next generations of citizens?

The companion more localized question is, how could our collegiate schools of education, and virtually entire local K-12 public education establishments, express that ignorance, or be sycophant to alleged reform pivoting on a naïve form of testing?

If you believe that improving U.S. public K-12 education goes beyond alleged standardized multiple-choice testing, and are a fan of Malcolm Gladwell (The Tipping Point), take solace in some recent developments.

Straws in the Wind

Unknown to many Americans, the U.S. emulated the UK in pushing standardized testing, notably because of Margaret Thatcher’s advocacy in the 1980s.  Other countries then emulated those U.S. public primary and secondary education strategies.  Specifically, the use of standardized testing has been prevalent in Israel and in other parts of Europe, and long enough for its efficacy to be assessed.

Domestically, California has now voted to drop most K-12 standardized testing.  New York State has now stated it will reduce its use of that testing, and critics in that State are advocating even broader cuts.  Given that CA and NY are generally where trends begin, perhaps we are seeing approach of that “tipping point?”

Internationally, Scotland discontinued that testing in 2003.  Israel has now discontinued the testing.  Wales recently rescinded most standardized testing, and the reasons are notable: 

“What do Welsh teachers use instead of the tests? With government guidance, teachers come up with their own assessments and report the results to parents, local education authorities, and the Welsh government each year. Freed from the need to prepare students for narrow tests, secondary school teachers employ out-of-school experiences, in-depth research, and presentations, emphasizing applied learning in secondary school and underscoring the importance of play in early childhood education.
Brian Lightman, head teacher at a secondary school outside Cardiff, Wales, helped pilot some of the new approaches and is impressed with the results. ‘Our students now are so much more independent and capable of organizing and analyzing what they're doing, and they're able to improve as a result of that,’ he said. ‘They are very different in the way they go about their learning.’”

Only the U.S. appears still fully in the grip of something close to mass hysteria – or perverse dedication at our state levels to extreme conservative school ideologies executed pretty much without critical thought.  Unfortunately, this tunnel vision extends all the way to Arne Duncan, and one step beyond to the gross hypocrisy of President Obama.  Mr. Obama, in virtually every speech touching public education, churns out the right words about the need for K-12 understanding and the limitations of present testing obsessions, but then blesses the actions of Duncan and the U.S. Department of Education doubling-down on testing imposition.

With the above bits and pieces suggesting emerging challenge of the standardized testing orgy, why is it so deeply entrenched?  By the ignorance and political extremism of current Republican state education bureaucracies and legislators?  Or by the combination of naïveté and cowardice of too many of our public schools in America’s “Pleasantvilles,” lacking intellectually and managerially competent administration and better training than being turned out by our collegiate schools of education?  Or by generations of parents, products of the same systems, lacking criteria other than local ego, splendor of physical plant, and sports obsessions to guide local schools?  Or notably, because of frequent election to local school boards of members lacking the competence or experience to provide oversight, or in some cases those with other agendas.

Award-winning NYC principal Carol Burris, in a recent piece in The Washington Post’s “The Answer Sheet,” offered another perspective that addresses the top down issues:

"What is equally disconcerting is that these reforms are being pursued with little or no evidentiary grounding. There is, for instance, zero sound research that demonstrates that if you raise a student’s score into the new proficiency range, the chances of the student successfully completing college increases. New York’s new cut scores are an attempt to benchmark state scores to the proficiency rates attached to the National Assessment of Educational Progress, or, NAEP. Yet the connections between NAEP scores and college performance are so spurious that researchers have yet to claim that NAEP scores have any predictive value at all when it comes to college and career readiness."

"The bottom line is that there are tremendous financial interests driving the agenda about our schools — from test makers, to publishers, to data management corporations — all making tremendous profits from the chaotic change. When the scores drop, they prosper. When the tests change, they prosper. When schools scramble to buy materials to raise scores, they prosper. There are curriculum developers earning millions to created scripted lessons to turn teachers into deliverers of modules in alignment with the Common Core (or to replace teachers with computer software carefully designed for such alignment).  This is all to be enforced by their principals, who must attend calibration events run by network teams.”

Obfuscation and Myths

Normally this would be the place to launch a spirited defense of those challenging standardized testing.  If the audience is educationally literate that isn’t an issue.  The limitations of alleged standardized multiple-choice tests have been well documented for decades.  What is troubling is that multiple empirical studies of their limitations were prominent in the U.S., by respected academic institutions, before NCLB was launched by the Bush Administration.  Even more studies and critique were available before the bureaucratically-driven debacle of RTTT was launched with billions of dollars in bribes to state governments.  If our Congressional Republicans wanted to plant a scandal-bomb under the White House, it might better be shaped to open and reveal RTTT’s waste rather than Benghazi or ACA.

A second bit of misdirection is the classic tactic of attacking the critic, in the case of present reform, with the usually smirking question:  Why are you against testing; don’t we have to have some way of holding our tax-supported public schools accountable?  One would suspect that genre might be smart enough to know the answer is both out there, and has been in place for most of the tenure of public education.  Rejecting present dominance of that testing has zero to do with the need to assess.  Of course assessment of many types, and testing of many flavors are needed, and have been the process material of public K-12 excellence for over a century.

A third topic that may not be visible in these debates is the checkered history of the hero/villain of present reform, the ubiquitous multiple-choice test.  Surprise to even many educators, the multiple-choice test format, next year, will mark its 100th birthday; precursors existed at the beginning of that century.

The multiple-choice test was created by Frederick J. Kelly, a byproduct of his doctoral dissertation at Emporia State University (formerly Kansas State Teachers’ College).  Allegedly he was motivated by the desire to eliminate the subjectivity of teachers’ judgments at the time and to acquire “uniform results.”  The approach was perceived as “…the assembly-line model of dependability and standardization.” 

Kelly was circumspect about the testing model.  Attributed to him, a quote about the model that today would generate a Twitter firestorm of political correctness.  Kelly said:  “This is a test of lower order thinking for the lower orders.”  The testing was commissioned by the Army in WWI to evaluate recruits, but the story is not unexpectedly not quite that simple:

“Most of us have experienced a multiple-choice test. Our children undergo them, you've certainly taken them, your parents probably did, and for some, even their grandparents had to endure them. All of us have given them the power to decide our destiny. But what most of us do not know is that multiple-choice tests resulted from an attempt to legitimize the field of psychology, with a dash of xenophobia and scientific racism. Stephen Jay Gould spells out the dark past of these tests in his aptly titled book The Mismeasure of Man. This highly recommended read reveals all the gory details of IQ testing. Gould explains that the development of IQ testing was used to identify feeble-mindedness in ‘unwanted’ groups (usually determined by race or country of origin).
Multiple-choice tests had their origin in World War I, when Dr. Robert Yerkes, President of the American Psychological Association (APA), convinced the Army to commission them to test the intelligence of recruits. The Army's goal was to improve the efficiency of evaluating men by moving away from time-consuming written and oral examinations. Yerkes' motives were to make psychiatry a more scientific field and move it away from its affiliation with philosophy.
A total of 1.7 million recruits were tested, giving the multiple-choice test an air of legitimacy, but in the end, the Army found no value in the results. Yerkes omitted that part of the story when he sold this idea to educational testing outfits. The validity of the test was not questioned. The rest is an unfortunate history.”

The ironic conclusion to Professor Kelly’s odyssey:

“A few years later, as President of the University of Idaho, Kelly disowned the idea, pointing out that it was an appropriate method to test only a tiny portion of what is actually taught and should be abandoned. The industrialists and the mass educators revolted and he was fired.”

The story gives further reflective meaning to the old saw, no good deed goes unpunished in American society.

The last piece of the puzzle about our testing trajectory has not been well aired, that is, the role that the psychometric subset of psychology has played in creating the present mess.  The field is focused on the construction and validation of measurement instruments, including tests and personality instruments.  Attributed as launch of a discipline to Sir Francis Galton (1822-1911), the field developed with some distinction from the latter part of the 19th century through the present.  The field is not a household word, and even surveying the history is well beyond this post.  There are two key points, however, that at least at the level of a chapter title merit comment.

In this century psychometric modeling and math were greatly extended, applicable to present standardized testing design through item analysis.   Item analysis is a class of analysis that broke out of relative obscurity when our testing companies and their vended tests were identified as the source of large scale or unexpected shifts in results of state testing.  There is little question that item analysis is valid as a mechanism for identifying testing that discriminates human responses and can create gradients and clustering. 

Point one is that the models could be applicable in any measure of a human population that displays discrete gradients of performance on some set of attributes.  The models say nothing about the concept validity of the property being measured.  In sum, as sophisticated as the techniques for deriving present test components, the results have no intrinsic claim to measuring understanding.  Thus to the extent that much of present testing cannot be linked to clear statements of how test scores explain high order thinking and understanding, the use of test results as the definition of whether testing is of value is pure tautology and not a basis for claiming reform.

Secondly, there is even greater harm in the role our testing companies have assumed, with some hubris, designating what is knowledge, without transparency.  Massively pervasive standardized testing, driving out of classrooms traditional attention to critical thought, de facto by that testing defines a nation's first 12 years of formative knowledge.  Psychometric input deserves its provenance as expertise on selective test creation.  The creators and keepers of knowledge have been excluded, public education disavowed interest in content over a half century ago becoming classroom mechanics, and by default the nation's knowledge is now being devised by amateurs in all except selective disciplines.  That would not appear to project a bright future for our national intellect?

So, How Assess?

Somewhat cynically, I suspect the shadow version of this question is, how assess our students’ performances without working too hard?  If that is the basis for much of public education’s slavish acceptance of present standardized testing, we have indeed evolved a pretty sick public education system.

If instead, the basis is that our education community simply knows no better, then a recent article referencing educator training, by The New York Times’ writer Bill Keller (“An Industry of Mediocrity”) merits your reading.

A third possibility is that our public school systems have been so intimidated (or bought off) by Federal initiatives, by state controls, or their school boards and administrators are too fearful to actually operate on the basis of communities’ desires for local education control. The answer then is at the ballot box, if a community’s school board representation by free election hasn’t already been rigged by incumbents, one of the key sources of local public school corruption of democratic process.  The clues aren’t hard to identify; try a ballot with three candidates, vote for three.  Democratic process in action, or election fraud?

Assessment that evolves from teaching that recognizes and emphasizes understanding and learning is hardly a mystery.  Here is simply a topic list of some proposed assessment methods:
  • Classic Socratic questioning
  • Mastery learning
  • Project application of constructs
  • Student progress reports (a’ la Gardner or Boyer)
  • Performances
  • Authentic assessment (usually with PBL)
  • Embed standardized tests as pragmatism
  • SCALE (Stanford Center for Assessment, Learning and Equity)
  • Old fashioned quizzes
  • Product related outputs
  • Process related outputs
  • Writing, essays!
  • Use authentic audiences for demonstration of performance
  • Role-based (in PBL)
  • The flipped classroom engaging parents
  • Use Bloom’s and Marzano’s taxonomies
  • Dynamic testing (integrated with teaching)
  • Indirect assessment (with formative and summative assessment)
  • Interactive analysis
  • Mathematical thinking              

               -Fault finding and fixing           
               -Plausible estimation
               -Creating measures
               -Convincing and proving
               -Reasoning from evidence
  • Conceptual diagnostic tests
  • Attitude surveys
  • Concept mapping
  • Exhibitions
  • Portfolios
  • Self- and peer-evaluation
  • Gaming outcomes
  • Simulations assessing performance
  • Artificial intelligence (expert systems)
  • Longitudinal performance tracking 
In a recent communication, master educator and author Dr. Marion Brady (who was inventing education before most of you were born) proposed an assessment philosophy that he commented may not be ready for prime time.  I believe it is “just in time” if you will pardon my reversion to a prior profession.  Marion’s take:

“The reform cart is in front of the horse. Its initial assumption is faulty. The aim isn’t to teach the core subjects well, but to rear smart kids. If I’m right, then the first step in a proper reform effort is creating tests. Tests first, not last—tests that evaluate what Einstein said should be our first priority—the ability to imagine alternative futures and deal with the problems those futures create.

That done, tell teachers to teach to have at it. If it’s thought that standards are needed, let teachers write them, but keep them in electronic form so they can continuously evolve as professional dialogue expands expertise.
.
.
.
My evaluation-related assumptions: (1) Evaluation tasks should require kids to apply what they know in a not-previously studied situation; (2) the best tasks are concrete rather than abstract, real-world rather than theoretical, ‘supra-disciplinary’ rather than tied to a single school-subject; (3) there’s no good reason for a test to be timed; (4) a good task requires no security measures, no honor code, no anti-plagiarizing strategy, no vigilant watching for evidence of cheating. The response to a good task will be so idiosyncratic that any teacher in charge of a reasonably-sized class for more than a very few weeks will know who wrote what.

Many years ago, when I first read Alfred North Whitehead’s 1916 Presidential Address to the Mathematical Association of England, I was mystified by his insistence that ‘no educational system is possible unless every question directly asked of a pupil at any examination is either framed or modified by the actual teacher of that pupil in that subject.’

It took me many more years to see the wisdom in that requirement. Now, I can see no acceptable alternative.”

Bottom Lines

As a way of summing, consider this quote from a high school student featured in the Lucas Educational Foundation site, “edutopia:”

“And yet, in the world of education, the "next big thing" is merit pay for teachers and boosting test scores. Do our policymakers not understand that the world is going through a revolution in the way we live, interact and learn?

Our education system is stuck in paralysis. We have tried doing the same thing over and over again with the expectation of a different result. This is insanity at its finest. The way we educate is based on the tenets of the Industrial Revolution -- conformity and standardization.

For instance, creativity is virtually extinguished as a child goes through his or her schooling. In their 1998 book Breakpoint and Beyond, George Land and Beth Jarman refer to a study in which 1,500 kindergartners between three and five years old were given a divergent thinking test. Divergent thinking tests don't measure creativity, but rather one's propensity for creativity. The test asks questions such as ‘How many ways could you use this paperclip?’ or ‘How many ways could you improve this toy fire truck?’ -- questions designed to encourage creative thought rather than elicit right-or-wrong answers. Ninety-eight percent of kindergarteners tested at genius level. The kids were tested every few years. By the end of post-secondary education, only two percent of students tested at genius level.

So, if you're trying to produce compliant, dead-brained, formulaic workers, our system is doing exactly what it was designed for. (I should add ‘grade-obsessed’ to that cadre of properties.) But in a society where innovation is simply everything, it is a cultural and moral failure to encourage this compliance.”

There is when all denial is purged, and when all preconceptions and pretensions are deflated, still the belief that there is some ‘magic sauce’ that will transform a U.S. public K-12 education system that lost its will to excel and its capacity for servant leadership some decades ago.  When confronted with examples of Finland’s comparable systems, or Singapore’s, or Shanghai’s, and their successes relative to the U.S., the prototypical domestic response is some form of “but they’re smaller, or more homogenous, or more socialistic.”  The savvy observer of our society, and NYT writer and author Tom Friedman, recently visited Shanghai in search of the ‘magic sauce;’ there isn’t any, but there is a master class lesson in K-12 education.

Part 4

The next blog goes down into the trenches.  The challenge:  Speculate why too many of our public schools, unwatched outside their bubbles and unheralded by the media, are not able to either conceive of or implement the kind of self-reform that might have prevented the quagmire of present test-based alleged reform.  There is reason to expect that the answers will raise hackles, and will not fit into the neat confines of politically correct caveats.

No comments:

Post a Comment