Sunday, December 28, 2014

Assessing US Higher Education: Information, Intimidation, Ignorance, or Insanity?

The last post of Edunationredux offered a partial critique of the Obama/Duncan scheme to rate America's colleges and universities. Prior national critique reflected almost a "you gotta be kidding" ambience, illuminating the perceived chasm between what Arne Duncan and the US Department of Education are proposing, and anything resembling intelligent social science applied to the measurement task.  Today’s post extends the prior critique, exploring the real measurement chores needed to create valid and reliable ratings of America's colleges and universities.

That chasm between the proposal and reality is so great it raises major questions; what conceptual malaise and what leadership degradation have occurred in that Department, who is steering this measurement debacle, and what resources are executing the work.  Is the proposal chain-rattling just to get the attention of higher education leadership?  If the intent is to actually carry through the scheme, is this another Federal agency that has now lost steerage, and mismatched the resources needed to actually conduct competent education work?

Post Critique, Critique

One tiny slip in the pronouncement of a functionary in the Department of Education may have given away the naïveté and slanted thinking footing the current proposal:  One of the factors allegedly being considered was how to treat "improvement" as a variable, and presumably as a simple metric.  The statement infers that the designers of this scheme may see the assessment of our colleges and universities occupying the same conceptual space as improving test scores in a public school system.  There are likely a few community college-scale institutions, close to being simply extensions of high school level performance, where this may be applicable, but any resources knowledgeable about the functions within a major university would deservedly see this as bizarre.

A last retrospective issue is further scrutiny of the misguided proposal to use beginning salaries of graduating students as a basis for institutional assessment.  This component of the proposal has some serious logic issues.  Aside from the nearly impossible chore of equilibrating the professional destinations of students across institutions to create one valid metric (or even multiple metrics), and the cognitive error of relating quality to profession sought, a peek at the distributions of those starting salaries poses an even more daunting issue.  Starting salaries are not distributed normally, but are skewed to the high end. The overwhelming body of starting salaries is so constrained, the distribution leptokurtic, that little or any discrimination among most salaries attributable to institutions could be detected. 

A pretty cynical outcome of using the proposed metric(s) for salaries, aside from all other faults, is that success in that venue would come from maximizing an institution's output of petroleum engineers, and wiping out the education of all PreK-12 teachers.  If the underlying intent of this scheme is some social engineering to equalize higher education opportunity, and social and economic states, its extreme liberal designers need to go back to the drawing board, or better, acquire some higher education.

Fair Challenge

The classic, and legitimate challenge to last post's critique of what's proposed -- that it is a loser -- is provide a more effective system for assessing our institutions.  The remainder of this post takes a stab at that challenge.

Dimensions

The starting point in this quest is identical to every legitimate research effort since the Enlightenment:  What is the goal, what hypotheses are to be tested, what question or questions are being posed for answers; what is the universe from which measurements are sought; what are the variables or factors requiring measurement, and what are their functional relationships to the criterion question(s); what are the properties of the variables, in this instance measurements wanted, i.e., nominal, ordinal, interval, cardinal; what are the hypothesized or measurable distributions of measurements sought; how do the error terms intrinsic to all variables fall out, intra-institutional variance versus inter-institutional variance, driving the comparisons of institutions or institutional subsets sought; what are the weights of contributing variables in forming then informing about the differential effectiveness or qualities of institutions being assessed; and critically, with a finite set of candidates for positioning, how may the units in the universe need to be stratified or clustered to minimize confounding of results attributable to basically different higher education systems being appraised?

Given a US universe of 4,140 institutions of higher education, with internal partitioning that may multiply the actual units of analysis by levels of magnitude, with hypothetically complex variable sets driving the criterion effect, the project is not the simplistic vision of the US Department of Education, revolving around already extant data, but what is now colloquially termed "big data:" "...an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications. The challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and privacy violations."  The mission here, assigning performance ratings to America's colleges and universities, is arguably the very definition of the analysis challenge described.

Department of Education thinking is apparently to measure some amalgam of institutional functional performance and contribution to social goals.  Both become subdivided into constituent goals that complicate what is proposed and currently measured:  For performance, institutional graduation rates overall versus by students' degree tracks, as well as longitudinally by how the process is finally achieved and the time involved; the learning effectiveness of what's been acquired along the way (made more complex when apportioned among multiple disciplines and degree tracks); the complexity of devising true costs of education delivered, plus the cogent issue of the productivity of all of the assets and operations incurred to produce a graduate; and close to the most salient first use of any assessment, whether the results actually materially impact via improvement the choice processes of prospects seeking higher education.  Also ignored in the Department's rhetoric, the longitudinal complexity of worth of prior learning at exit from the institution, versus its worth at the various career stages the graduate experiences.

Measurement Factor Complications

The performance of our institutions in creating equitable student access may be slightly easier to access in principle, but introduces major problems in execution:  A large multivariate causal set of determinants of schools screened, preceding the issue of differential institutional compliance with equitable admissions, is problematic; the reality that acceptance of those who might be discriminated is also based on the failures or successes of our public K-12 systems, long before an institution's action effecting equity kicks in; and a major barrier to measurement at the level of the individual student/family is driven by confidentiality considerations.

A pre-collegiate experience case in point, familial relationship to this writer, is a collegiate freshman at a major university, majoring in an engineering specialty.  Partially because of the 9-12 work in an effective science high school, this soon-to-be second semester freshman will be moving into second semester sophomore level academic work with perfect "A" grades, primed by the prior high school work.  Adding to the analysis challenge of assessing institutional performance, then, are the assets/deficits that precede and impact acceptance.  The remedial work impeding, or prior learning permitting accelerated collegiate work, becomes another complication in assessing collegiate end-game contribution.

Another set of factors in judging performance is the subjectivity of protocols of collegiate grading, variable among institutions, among schools, among departments, and even among individual faculty.  Without some national, standardized achievement testing, by specific disciplines or academic track of students, the comparative use of even grades and point averages as measures of institutional performance add complexity to any rating scheme.

The prior Edunationredux blog also unfolded another major constraint, comparison of institutions based on the proper unit of analysis as well as assuring comparability, rendering the simplistic measurement chore inferred in the Obama/Duncan thinking the height of amateurism. 

Still another factor ignored in the current conceptualization is the role played by geographic and location factors, perhaps even highly specific location factors related to the population and cultural composition surrounding a student's residential assignment, influencing institutional outcomes.

But there is another gut issue that will at present -- and in the absence of never executed benchmark research on our colleges/universities -- blind side and hamstring the proposal.  That is the core pattern of variance of any variable or factor used as a basis of measurement.  In virtually all diversified and complex systems (precisely what every major college/university is) there is leveling of outputs based on de facto competition.  In common sense terms, there may be more variation of performance within an organization, than among similar organizations, where an attempt is made to sum or average overall experience.  The practical significance, with a small bit of coaching, human experts on higher education can likely identify the better or worse extremities of “high performing” and "low performing" colleges/universities.  The in-the-middle thousands may blur because their performances tend to regress to each stratum's universe mean.  Consider that in the last half century no credible college or university has been put out of business because their outputs were wholly without merit, or their graduates could not acquire employment.

Rank Versus Supply Real Information for Choice

The commercially hyped collegiate rating schemes -- U.S. News, Forbes, Princeton, and et al. -- have been widely criticized for their simplistic foundations, and the reality that they are minimal discrimination of a complex product.  But they, along with such counter productive ratings of “best party school,” are still allegedly used for input to a critical life decision, an American tragedy.  That prompts the leading question:  Is the Obama/Duncan strategy embodied in the proposed rankings one of the worst decisions of this administration, matching or exceeding even the core ignorance of present punitive-based testing in public K-12?  Would far better choices have been, for example, the long view with strategic research to field a legitimate comprehensive rating scheme for our institutions’ multidimensional areas of performance, call it the 'value-rating' model; or a non-punitive and affirmative alternative 'value-choice' model, the mission, providing comprehensive valid and comparable information on all public higher education institutions, letting the user supply their own criteria for use of the information for choice of school? 

Both example approaches start with the same research roots:  A priori judgments of the factors considered central to the quality and equity of higher education delivered, irrespective of whether those factors are presently quantified; next the development work is executed to convert those multidimensional factors, by algorithm or by scaling techniques to create digital metrics for factors.  At this point the approaches bifurcate, value-choice becoming the issue of creating easily accessible and universal databases, placing them in "the cloud" readily available online, searchable via criteria pertinent to the individual collegiate wannabe, or in another possible form as the material for use of simulation to derive optimal choices for a student.  The rest of our real world is inundated with clever "apps," available for even the ubiquitous smart phone.  Publicly accessible digitally, online, the system offers at low or no cost the structured information to personally search possible school choices.  The values or experiences available from a candidate school remain the elections of the potential student and parents, not predetermined by big brother.

The second approach -- value-rating -- does carry out the intent of the Obama/Duncan vision, ordinal rating of institutions, but based on the constituent properties of collegiate value delivery noted for the first approach.  What changes, what additional research is needed?  One model for the second approach might be structured as follows:  The starting point is a quota sample from America's colleges/universities serving as the development base, the sample reflecting meaningful categorizations of our institutions; for factors presumed causal for quality and equitable delivery by an institution, break out programs or tracks that constitute legitimate units of analysis; use a "human expert model" of decision making to create criterion positioning of the sample organizations, for the unit of analysis, by the various factors; then the goodness of fit is tested between metrics devised and expert positioning of all factors/units of analysis, mathematically determining the salience and weighting of factors that fit expert prediction.  Lastly, the metrics proving predictive are tested on a second comparable sample of our institutions for verification.

There are already out there, in the mass of college/university data banked on institutions' web sites, and made available in detail by a plethora of both public and private sector organizations, the raw data to start building either of the above approaches.  Most of our institutions are working with their own game plans, but the composite of data generated could be a starting point, for example, for building a universal higher education database serving the value-choice approach.  A tragedy of our present society is that a Bill Gates, instead of funding programs designed to beat on our public schools with testing, apparently lacked the perspicacity to pursue even his own suite of digital experiences to fund and guide the assembly of a suitable higher education national database?

Can the value-ranking model actually be executed?  It is arguable that it already has been in part, that the logic employed by Tom Peters and his associates in creating the corporate effort, In Search of Excellence, is an early precursor to that approach; it stopped short of seeking to quantify determinants of excellence, but the core idea was successful.  Using the power of that same Federal funding to our colleges/universities serves as an incentive to engage our universities in needed research.  That is a far better use of the incentive than seeking to intimidate our institutions into change by ranking linked to punitive reductions in funding.  Lastly, you are developing metrics that are defined by the real measurement challenge, and not by what was developed for other purposes or is simply convenient.

Conclusion

Historically, toward the end of last century, one of the Presidential Commissions on Higher Education offered the White House and our higher education community very practical recommendations.  They encompassed reducing higher education costs, reforming funding of tuition and other costs of a degree, and cooperation among all of our post-secondary schools to adopt a common set of parameters making available to America's families uniform ways to assess collegiate choice.  Both our college/university leaderships, and our political system quickly rejected all three sets of well-reasoned recommendations.  Clearly, moving either of the above approaches, or anything resembling them to a productive destination would require some new mindsets, among our higher education institutions, and in Federal education leadership's sensitivity to genuine national needs over liberal dreaming.

Counterpoint is that some of our colleges and universities, presumably “reading the room,” have already initiated innovative changes in their collegiate instruction.   Reported in Saturday’s New York Times, changes are occurring in B-schools’ MBA programs -- to emulate the rapidity of change and experimentation from Silicon Valley – and in basic collegiate science courses to move from lecture modes to high student involvement and problem solving.  Long valid patterns of diffusion of innovation will change higher education, even as the critically deficient Obama/Duncan rating scheme is stumbling out of the starting gate.  Perhaps merely the threat of that Federal ‘Franken data’ has stimulated collegiate action?  Incredibly cynical albeit clever if true; but if accurate the rest of program should be given a quick burial.

On real inspection the proposed Department of Education rating scheme regardless of intentions simply reeks of ignorance and flawed understanding of both complex academic organizational behavior, of advanced learning, and of the most basic principles of inquiry and social science explanation.  Their scheme could, analogically, be compared to trying to build a quantum computer using some AA batteries, a photo transistor, a couple of resistors/capacitors, and some wire scrounged from the ties used on garbage bags.  The present scheme, even if Machiavellian, as well as mirroring the mental set that any solution has to be punitive, is wholly unworthy of a Federal education function critical to our nation, and is condemnation of the current resources managing that agency.


Epilog

The next issues of Educationredux will move into challenges and opportunities throughout US higher education that might be areas for measured change along with possible innovations.  First out of the chute will be the footers for more productive higher education experiences -- bridging the chasm between our K-12, especially 9-12 school outputs, and the incoming requirements for collegiate success -- allowing passage through collegiate work with greater learning effect, in shorter periods of time, and therefore with less investment.

Sunday, December 21, 2014

US Higher Education: The Light Versus Enlightenment?

The Obama Administration, fronted by Secretary of Education Arne Duncan, having virtually emasculated the chances for intelligent reform of US public schools by dogmatically and despotically backing test-based alleged “corporate reform” — from its inception a cultural throwback view of our schools’ issues, and dedicated to ‘test and punish’ — is now switching venues. Spoiler alert: Our system of US higher education may need to erect real battlements around their academic enclaves to fend off a horde of metric trolls.

For readers who have been preoccupied with trying to survive Black Friday and the need to gift those dear, what the US Department of Education is proposing to launch, allegedly in 2015, is a rating scheme for America’s 4,140 colleges and universities.  Here are the available details about that intent:

The Rating Scheme

On the table thus far from Duncan and company:
  • Schools would be rated as “high performers,” or “low performers,” or “in the middle.”  (Note:  The critique invited by the overwhelming sophistication of this scheme is an immediate temptation, but assessment will wait for the whole story.)
  • The reasoning, and justification for Federal intervention, is allegedly assessment of institutions with students receiving federal student aid.
  • Allegedly being considered:  Which metrics; how to give credit for improvement; meaning and span of “in the middle;” a single composite rating, or multiple ratings for an institution?
  • Factors in the scheme:  Accessibility — number of Pell Grant students, family contributions to tuition, student share whose parents did not attend college (what these have to do with educational performance seems a mystery); affordability — average net price, and ANP for families by income level; outcomes — graduation rates, transfer rates, grad school attendance, loan repayment, and “labor market success” (the latter apparently meaning graduates' beginning earnings, but a better index in today's economy might be time to acquire initial employment, or the discount in salary taken from the target profession's norm to acquire any job).
The last item to drop, ratings will allegedly be calculated separately for institutions segmented into homogeneous clusters.  An immediate observation is that even the simple factors noted — apparently there not because they are the most salient measures, but happen to be available as data — if they are probed are not at all simple.  Established in prior work by our institutions themselves, what seems straightforward, e. g., even average net price for an institutions’ students varies, with real import depending on how costs are staged or offset, and services delivered.

Many of the leaders of our colleges and universities have already weighed in on the cogency of these proposals. Not unexpectedly, most of the comments, while critical of the proposed mechanisms, have been constrained or politically correct.  Our colleges and universities in top tiers are virtually unanimously led by smart people; it is a reasonable proposition that were the faculty/research smarts encompassed by our best 100, or even a dozen excellent institutions, let loose on the validity and reliability of this proposal, the results might be a little fly ash remaining.

Educationredux readers will have to be content with a quick pass at the issues embedded in this scheme; Christmas would intervene were the whole enchilada attempted in one sitting.  Titles for the issues perceived include: Purpose of the ratings, and the core relevance of the proposed rankings; using ‘what’s out there,’ versus researching and designing metrics that are specific and valid; and the troubled path this proposal will encounter if its creators comprehend and apply the concept of “unit of analysis” that foots all science.

Purpose

The alleged purpose of the ratings is what; education quality, social equality, turning out the right human resources, deeply informed candidates, and all with a vague ordinal depiction of our colleges/universities?  The scheme as outlined so far is a patchwork of opportunistic measures, actually a crude multi-dimensional conceptualization; but purporting to offer information suitable for real world discrimination for life-modifying choices.  This scheme’s scope beggars the well developed work in marketing to develop multi-dimensional scaling of single brands.  To even consider simple ordinal positioning or ranking, i.e., comparative assignment of institutions of the complexity we have, ranges from magical thinking to a fool’s errand.

Using New Graduates' Earnings

This item gobsmacks even common sense, and raises the question of the core competence of those developing this scheme.  The determinants of beginning graduate salaries are complex, are a function of the subject matter specialization marketed, and are variably impacted by transitory demand versus supply of workplace candidates.  Beginning salaries are related to mid-career earnings, but not perfectly, and will this Duncanian dysfunctional rating factor wait for promulgation until the next 20 years’ experience of those graduates is logged?  Lastly, but critically, those salaries may have nothing to contribute to assessing the worth of either the graduates, or their preparation for practice, to our economy or society.  

How many more of the finance droids, that brought the US the financial meltdown, does our nation really need?  Or how many CEOs can the system support?  Versus how many more really good teachers, K-12, and post-secondary instructors, does this nation really need?  The proposed ratings scheme flips the world upside down.  It also says that Mr. Duncan, who has never graced a real classroom, or had an education about education, or has questionably matured beyond an extreme liberal visitor to “Alice in (education) Wonderland,” needs to find a new quixotic pursuit.  Perhaps he could link arms with Bill Gates, extinguish two misdirected blow-torches destroying rational US public education.

Unit of Analysis and Those Clusters

The readiness of this concept for prime time is already questionable simply based on the above issues.  The notion of creating a compensatory fix for inequities, by assigning metrics to clusters of institutions judged to be comparable, may constitute the most unreasonable part of the scheme among a litany of the unreasonable.  There are two issues:  What is the proper “unit of analysis” for assembling metrics; and what happens to the set when that unit becomes a valid one?

Saying you are going to rate a higher education institution on a few metrics is roughly the equivalent of saying you are going to assign one measure of assessment to, for example, the qualities of products in an Amazon warehouse. A newly minted college graduate may walk through a common commencement exercise, but the education represented issued from some specific track within that academic labyrinth.  Each track could be considered the proper unit of analysis, accumulated by a scheme and weighting that would metaphorically mirror putting a human on Mars.  Even going up another level of aggregation may work for valid metrics, but the reality of that analysis challenge doesn’t assuage much.  Here’s one example of the challenge you face in trying to decide how to assess one institution — it is one intimately familiar, but also representative of many in the US — Indiana University.

Indiana University (IU) has two main campuses, Bloomington and Indianapolis, different academic environments.  It has six regional campuses. The Bloomington campus has 14 separate schools plus a College of Arts and Science.  All 15 major units have multiple departments, multiple faculties, heterogeneous curricula (and some institutions differential tuition) — that factually determine the quality of a degree — with 180 majors, in 157 departments, representing 330 degree programs.  The other campuses have variable presence of the same venues, plus where a campus is a joint IU-Purdue campus, there may be additional departments representing engineering, nursing, et al.

So the question is:  What is the effective and defensible unit of analysis?  If it is the substantive track the student takes, and if even our 629 public 4-year institutions have an approximation of the above internal structure, the analysis chore for that subset masses up to over 200,000 unique entities to be judged.

But perhaps the most elemental critique of this Obama/Duncan odyssey is a classic used in virtually every operations research course ever offered, what is termed “the drunkard’s search.”  Referenced by philosopher Abraham Kaplan (author of a text used extensively in higher education research courses, The Conduct of Inquiry), it is his observation that:  “Much effort…in behavioral science itself, is vitiated, in my opinion, by the principle of the drunkard’s search:  There is the story of a drunkard, searching under a lamp for his house key, which he dropped some distance away.   Asked why he didn’t look where he dropped it, he replied ‘It’s lighter here!’”

Lastly, a challenge to the creators of this scheme to actually employ some of the science of measurement that has accumulated since Descartes, LaPlace, Pascal, Fermat, et al., roamed the historical halls of academe, through contemporary expertise:  Will the team developing this scheme even tap the most rudimentary pretest of its metrics; putting a test run of their results up against the expert judgements of a panel of our best and brightest, to see if their metrics can replicate the arguably informed and sophisticated professional judgements of quality of a cross section of institutions?  The prudent advice is, don’t try to hold your breath for the pretest.

Tentative Conclusions

American post-secondary institutions, especially the two-year and four-year variety that lack quality accreditation, or are isolated academically from primary campuses, and that lack the internal controls on faculty quality that are embedded in mainstream institutions, are most in need of assessment for the quality of outputs.  But a material fraction, of our almost 2,500 4-year public and private colleges/universities, probably internally does more work on maintaining learning quality than the US Department of Education does to police their own cognitive integrity.

All of America’s colleges and universities, however, may be candidates for inspection for symptoms of “Baumol’s cost disease,” referencing failure to aggressively seek functional productivity increases over decades.  And some of the mainstream campuses we all relate to may have components that have decayed, or are still fielding bricks-and-mortar excesses.  But what appears very clear is, this scheme by the Obama Administration is not a viable cure for any part of America's post-secondary education assessment needs; it comes closer to being another dose of Federal snake-oil.