The last post of Edunationredux
offered a partial critique of the Obama/Duncan scheme to rate America's
colleges and universities. Prior national critique reflected almost a "you gotta be kidding" ambience, illuminating the perceived chasm between what Arne Duncan and the US Department of Education are proposing, and anything resembling intelligent social science applied to the measurement task. Today’s post extends the prior critique, exploring the real measurement chores needed to create valid and reliable ratings of America's colleges and universities.
That chasm between the proposal and reality is so great it raises major questions; what conceptual malaise and what leadership degradation have occurred in that Department, who is steering this measurement debacle, and what resources are executing the work. Is the proposal chain-rattling just to get the attention of higher education leadership? If the intent is to actually carry through the scheme, is this another Federal agency that has now lost steerage, and mismatched the resources needed to actually conduct competent education work?
That chasm between the proposal and reality is so great it raises major questions; what conceptual malaise and what leadership degradation have occurred in that Department, who is steering this measurement debacle, and what resources are executing the work. Is the proposal chain-rattling just to get the attention of higher education leadership? If the intent is to actually carry through the scheme, is this another Federal agency that has now lost steerage, and mismatched the resources needed to actually conduct competent education work?
Post
Critique, Critique
One tiny slip in the
pronouncement of a functionary in the Department of Education may have given
away the naïveté and slanted thinking footing the current proposal: One of the factors allegedly being
considered was how to treat "improvement" as a variable, and presumably
as a simple metric. The statement
infers that the designers of this scheme may see the assessment of our colleges
and universities occupying the same conceptual space as improving test scores
in a public school system. There
are likely a few community college-scale institutions, close to being simply
extensions of high school level performance, where this may be applicable, but
any resources knowledgeable about the functions within a major university would
deservedly see this as bizarre.
A last retrospective issue is
further scrutiny of the misguided proposal to use beginning salaries of
graduating students as a basis for institutional assessment. This component of the proposal has some
serious logic issues. Aside from
the nearly impossible chore of equilibrating the professional destinations of
students across institutions to create one valid metric (or even multiple
metrics), and the cognitive error of relating quality to profession sought, a
peek at the distributions of those starting salaries poses an even more
daunting issue. Starting salaries
are not distributed normally, but are skewed to the high end. The overwhelming
body of starting salaries is so constrained, the distribution leptokurtic, that
little or any discrimination among most salaries attributable to institutions
could be detected.
A pretty cynical outcome of
using the proposed metric(s) for salaries, aside from all other faults, is that
success in that venue would come from maximizing an institution's output of
petroleum engineers, and wiping out the education of all PreK-12 teachers. If the underlying intent of this scheme
is some social engineering to equalize higher education opportunity, and social
and economic states, its extreme liberal designers need to go back to the drawing
board, or better, acquire some higher education.
Fair
Challenge
The classic, and legitimate
challenge to last post's critique of what's proposed -- that it is a loser --
is provide a more effective system for assessing our institutions. The remainder of this post takes a stab
at that challenge.
Dimensions
The starting point in this quest
is identical to every legitimate research effort since the Enlightenment: What is the goal, what hypotheses are
to be tested, what question or questions are being posed for answers; what is
the universe from which measurements are sought; what are the variables or
factors requiring measurement, and what are their functional relationships to
the criterion question(s); what are the properties of the variables, in this
instance measurements wanted, i.e., nominal, ordinal, interval, cardinal; what
are the hypothesized or measurable distributions of measurements sought; how do
the error terms intrinsic to all variables fall out, intra-institutional
variance versus inter-institutional variance, driving the comparisons of
institutions or institutional subsets sought; what are the weights of
contributing variables in forming then informing about the differential
effectiveness or qualities of institutions being assessed; and critically, with
a finite set of candidates for positioning, how may the units in the universe
need to be stratified or clustered to minimize confounding of results
attributable to basically different higher education systems being appraised?
Given a US universe of 4,140
institutions of higher education, with internal partitioning that may multiply
the actual units of analysis by levels of magnitude, with
hypothetically complex variable sets driving the criterion effect, the project
is not the simplistic vision of the US Department of Education, revolving
around already extant data, but what is now colloquially termed "big
data:" "...an all-encompassing term for any collection of data
sets so large and complex that it becomes difficult to process them using
traditional data processing applications. The challenges include analysis,
capture, curation, search, sharing, storage, transfer, visualization, and
privacy violations." The mission here, assigning performance
ratings to America's colleges and universities, is arguably the very definition
of the analysis challenge described.
Department of Education thinking
is apparently to measure some amalgam of institutional functional performance
and contribution to social goals.
Both become subdivided into constituent goals that complicate what is
proposed and currently measured:
For performance, institutional graduation rates overall versus by students'
degree tracks, as well as longitudinally by how the process is finally achieved and the
time involved; the learning effectiveness of what's been acquired along the way
(made more complex when apportioned among multiple disciplines and degree
tracks); the complexity of devising true costs of education delivered, plus the
cogent issue of the productivity of all of the assets and operations incurred
to produce a graduate; and close to the most salient first use of any
assessment, whether the results actually materially impact via improvement the
choice processes of prospects seeking higher education. Also ignored in the Department's
rhetoric, the longitudinal complexity of worth of prior learning at exit from
the institution, versus its worth at the various career stages the graduate
experiences.
Measurement Factor Complications
The performance of our
institutions in creating equitable student access may be slightly easier to
access in principle, but introduces major problems in execution: A large multivariate causal set of
determinants of schools screened, preceding the issue of differential
institutional compliance with equitable admissions, is problematic; the reality
that acceptance of those who might be discriminated is also based on the
failures or successes of our public K-12 systems, long before an institution's
action effecting equity kicks in; and a major barrier to measurement at the
level of the individual student/family is driven by confidentiality
considerations.
A pre-collegiate experience case
in point, familial relationship to this writer, is a collegiate freshman at a
major university, majoring in an engineering specialty. Partially because of the 9-12 work in an
effective science high school, this soon-to-be second semester freshman will be
moving into second semester sophomore level academic work with perfect
"A" grades, primed by the prior high school work. Adding to the analysis challenge of
assessing institutional performance, then, are the assets/deficits that precede
and impact acceptance. The
remedial work impeding, or prior learning permitting accelerated collegiate work,
becomes another complication in assessing collegiate end-game contribution.
Another set of factors in
judging performance is the subjectivity of protocols of collegiate grading,
variable among institutions, among schools, among departments, and even among
individual faculty. Without some
national, standardized achievement testing, by specific disciplines or academic
track of students, the comparative use of even grades and point averages as
measures of institutional performance add complexity to any rating scheme.
The prior Edunationredux blog
also unfolded another major constraint, comparison of institutions based on the
proper unit of analysis as well as assuring comparability, rendering the
simplistic measurement chore inferred in the Obama/Duncan thinking the height
of amateurism.
Still another factor ignored in
the current conceptualization is the role played by geographic and location
factors, perhaps even highly specific location factors related to the
population and cultural composition surrounding a student's residential
assignment, influencing institutional outcomes.
But there is another gut issue
that will at present -- and in the absence of never executed benchmark research
on our colleges/universities -- blind side and hamstring the proposal. That is the core pattern of variance of
any variable or factor used as a basis of measurement. In virtually all diversified and
complex systems (precisely what every major college/university is) there is
leveling of outputs based on de facto
competition. In common sense terms, there may be more variation of performance
within an organization, than among similar organizations, where an attempt is
made to sum or average overall experience. The
practical significance, with a small bit of coaching, human experts on higher
education can likely identify the better or worse extremities of “high performing” and "low performing" colleges/universities. The in-the-middle
thousands may blur because their performances tend to regress to each stratum's universe
mean. Consider that in the last
half century no credible college or university has been put out of business
because their outputs were wholly without merit, or their graduates could not
acquire employment.
Rank Versus Supply Real
Information for Choice
The commercially hyped
collegiate rating schemes -- U.S. News, Forbes, Princeton, and et
al. -- have been widely criticized for their simplistic foundations, and the
reality that they are minimal discrimination of a complex product. But they, along with such counter
productive ratings of “best party school,” are still allegedly used for input
to a critical life decision, an American tragedy. That prompts the leading question: Is the Obama/Duncan strategy embodied in the proposed
rankings one of the worst decisions of this administration, matching or
exceeding even the core ignorance of present punitive-based testing in public
K-12? Would far better choices
have been, for example, the long view with strategic research to field a legitimate
comprehensive rating scheme for our institutions’ multidimensional areas of
performance, call it the 'value-rating' model; or a non-punitive and
affirmative alternative 'value-choice' model, the mission, providing
comprehensive valid and comparable information on all public higher education
institutions, letting the user supply their own criteria for use of the
information for choice of school?
Both example approaches start
with the same research roots: A priori judgments of the factors
considered central to the quality and equity of higher education delivered,
irrespective of whether those factors are presently quantified; next the
development work is executed to convert those multidimensional factors, by
algorithm or by scaling techniques to create digital metrics for factors. At this point the approaches bifurcate,
value-choice becoming the issue of creating easily accessible and universal
databases, placing them in
"the cloud" readily available online, searchable via criteria
pertinent to the individual collegiate wannabe, or in another possible form as
the material for use of simulation to derive optimal choices for a student. The rest of our real world is inundated
with clever "apps," available for even the ubiquitous smart
phone. Publicly accessible
digitally, online, the system offers at low or no cost the structured
information to personally search possible school choices. The values or experiences available
from a candidate school remain the elections of the potential student and
parents, not predetermined by big brother.
The second approach --
value-rating -- does carry out the intent of the Obama/Duncan vision, ordinal
rating of institutions, but based on the constituent properties of collegiate
value delivery noted for the first approach. What changes, what additional research is needed? One model for the second approach might
be structured as follows: The
starting point is a quota sample from America's colleges/universities serving
as the development base, the sample reflecting meaningful categorizations of
our institutions; for factors presumed causal for quality and equitable
delivery by an institution, break out programs or tracks that
constitute legitimate units of analysis; use a "human expert model" of
decision making to create criterion positioning of the sample organizations,
for the unit of analysis, by the various factors; then the goodness of fit is tested between metrics
devised and expert positioning of all factors/units of analysis, mathematically determining the
salience and weighting of factors that fit expert prediction. Lastly, the metrics proving predictive are tested
on a second comparable sample of our institutions for verification.
There are already out there, in
the mass of college/university data banked on institutions' web sites, and made
available in detail by a plethora of both public and private sector
organizations, the raw data to start building either of the above
approaches. Most of our
institutions are working with their own game plans, but the composite of data
generated could be a starting point, for example, for building a universal
higher education database serving the value-choice approach. A tragedy of our present society is
that a Bill Gates, instead of funding programs designed to beat on our public
schools with testing, apparently lacked the perspicacity to pursue even his own
suite of digital experiences to fund and guide the assembly of a suitable
higher education national database?
Can the value-ranking model
actually be executed? It is
arguable that it already has been in part, that the logic employed by Tom
Peters and his associates in creating the corporate effort, In Search of Excellence, is an early precursor to that approach; it stopped short of seeking to
quantify determinants of excellence, but the core idea was successful. Using the power of that same Federal
funding to our colleges/universities serves as an incentive to engage our
universities in needed research.
That is a far better use of the incentive than seeking to intimidate our
institutions into change by ranking linked to punitive reductions in
funding. Lastly, you are
developing metrics that are defined by the real measurement challenge, and not
by what was developed for other purposes or is simply convenient.
Conclusion
Historically, toward the end of
last century, one of the Presidential Commissions on Higher Education offered
the White House and our higher education community very practical
recommendations. They encompassed
reducing higher education costs, reforming funding of tuition and other costs
of a degree, and cooperation among all of our post-secondary schools to adopt a
common set of parameters making available to America's families uniform ways to
assess collegiate choice. Both our
college/university leaderships, and our political system quickly rejected all
three sets of well-reasoned recommendations. Clearly, moving either of the above approaches, or anything
resembling them to a productive destination would require some new mindsets,
among our higher education institutions, and in Federal education leadership's
sensitivity to genuine national needs over liberal dreaming.
Counterpoint is that some of our
colleges and universities, presumably “reading the room,” have already initiated
innovative changes in their collegiate instruction. Reported in Saturday’s New York Times, changes are occurring in B-schools’ MBA programs -- to emulate the rapidity of change
and experimentation from Silicon Valley – and in basic collegiate science courses to move from lecture modes to high student involvement and problem
solving. Long valid patterns of
diffusion of innovation will change higher education, even as the critically
deficient Obama/Duncan rating scheme is stumbling out of the starting
gate. Perhaps merely the threat of
that Federal ‘Franken data’ has stimulated collegiate action? Incredibly cynical albeit clever if true; but if accurate the rest of program should be
given a quick burial.
On real inspection the proposed Department of Education rating scheme regardless of intentions simply reeks of ignorance and flawed understanding of both complex academic organizational behavior, of advanced learning, and of the most basic principles of inquiry and social science explanation. Their scheme could, analogically, be compared to trying to build a quantum computer using some AA batteries, a photo transistor, a couple of resistors/capacitors, and some wire scrounged from the ties used on garbage bags. The present scheme, even if Machiavellian, as well as mirroring the mental set that any solution has to be punitive, is wholly unworthy of a Federal education function critical to our nation, and is condemnation of the current resources managing that agency.
On real inspection the proposed Department of Education rating scheme regardless of intentions simply reeks of ignorance and flawed understanding of both complex academic organizational behavior, of advanced learning, and of the most basic principles of inquiry and social science explanation. Their scheme could, analogically, be compared to trying to build a quantum computer using some AA batteries, a photo transistor, a couple of resistors/capacitors, and some wire scrounged from the ties used on garbage bags. The present scheme, even if Machiavellian, as well as mirroring the mental set that any solution has to be punitive, is wholly unworthy of a Federal education function critical to our nation, and is condemnation of the current resources managing that agency.
Epilog
The next issues of Educationredux will move into challenges and opportunities throughout US higher education that might be areas for measured change along with possible innovations. First out of the chute will be the footers for more productive higher education experiences -- bridging the chasm between our K-12, especially 9-12 school outputs, and the incoming requirements for collegiate success -- allowing passage through collegiate work with greater learning effect, in shorter periods of time, and therefore with less investment.
No comments:
Post a Comment