Evaluating E-learning

Right now I’m supposed to be writing an article about evaluating e-learning.  It is the only assignment in a seminar course on e-learning evaluation.  The spring term ends on June 18th, and with barely over two weeks before my deadline and all I’ve produced is an outline with which I am incredibly dissatisfied.  I could blame writer’s block.  I could claim that my brain hasn’t yet returned from DisneyWorld, where I celebrated my birthday last week.  I could point fingers at my 60+ hour a week work schedule that I’ve only just recently pared back. I could easily name many other distractions.   The simple truth that I am forced to admit to myself is that I haven’t consistently wanted to think long and hard about evaluating e-learning.  At least not until this week.    

As the instructor of an online course and a student who is attempting to complete the final stages of a doctoral degree from a distance, I feel that I have a unique perspective on e-learning and its evaluation.  Certainly, my perspective differs from my classmates; I am the only student in the e-learning evaluation seminar for whom the seminar itself is an e-learning experience.  I want my article to reflect what I have learned as I’ve participated in e-learning experiences, both formal and informal, and in both of my roles, instructor and student.

At the same time, my professor is expecting an academic article and I’m struggling with how best to weave my personal narrative into the evaluation framework that Dr. Williams has provided for us.  The e-learning evaluation framework given by David Williams and Charles Graham in their soon to be published article hangs on the following (rather generic) questions:

  1. What is the context/background?
  2. Who are the stakeholders?
  3. What is the evaluand?
  4. What are the criteria for judging the evaluand?
  5.  What questions will answer how well the evaluand meets the criteria?
  6. What methods should be used to answer the questions?
  7. What do you get when you collect and analyze the data?
  8. How does ‘what is’ compare to ‘what should be’?
  9. What recommendations does the study yield?
  10. How well was the evaluation conducted?

As I review these questions yet again, I realize that this framework matches the logic that I used to design and evaluate the online course that I taught.  At the same time, this methodology does not quite reflect the informal process that I, as a student, use to evaluate my e-learning experiences.  I know that I have found some of my e-learning experiences more valuable than others, which indicates that I evaluated those experiences on some level, but how?

Some of the questions from the framework seem to apply to both perspectives.   I can describe the context for each of my varied e-learning experiences, and in all my informal evaluations of e-learning there has only ever been but one stakeholder: me.  (Even now, in the evaluating e-learning seminar, I am relatively unconcerned with the learning experiences of my classmates.  I hope they are learning, but my evaluation of the experience is independent of what they feel they are gaining from the experience.  I am somewhat concerned about how Dr. Williams will evaluate me, but my evaluation of the seminar will not be impacted by whether or not Dr. Williams feels that it has been a success.)  I have never explicitly stated my criteria for evaluating e-learning experiences as a student and I’m not sure that I could articulate each criterion now, or even if I’ve applied the same criteria to each experience.  I have definitely never framed evaluation questions based on my criteria, contemplated methods of data collection, or analyzed data that I’ve collected during my e-learning experiences.  Still, I know that as a student in each of my e-learning experiences I’ve come to conclusions about how ‘what is’ compares to ‘what should be’ and have used these conclusions to inform the design of e-learning opportunities for the courses that I teach.  I’ve done this  without ever meta-evaluating the process that resulted in my conclusions.  

I find myself wondering: (1) how do most students evaluate e-learning experiences? and (2) if students were to consciously apply the Williams-Graham framework would their evaluations of their e-learning experiences change?

What do you think?  What process do you use to evaluate e-learning experiences as either an instructor or a student?

Posted in about me, Coursework, evaluation, Graduate Work | 3 Comments

Dissertation Time

Tomorrow I have a meeting with my dissertation chair.  I am supposed to come to the meeting prepared with a timeline for completing my dissertation prospectus.  The timeline is meant to help me manage my time as well as provide a means for my professor to hold me accountable for the work that I should be doing.  I figure that I might as well make the timeline public.  Regular readers, feel free to help hold me accountable!

First, I know that my graduation goal is April 2010 and that I need to have my data collected during the Fall 2009 semester in order to meet this deadline.  Ideally, I should complete my prospectus this summer.  However, professors like to take vacations during the summer, which means that I will not be able to gather my committee in one place until the Fall.  As a result, I need to plan on defending my prospectus at the very beginning of the Fall 2009 semester.  The first day of Fall classes is August 31st.  I need to schedule my defense no later than September 4, 2009.

My committee will need time to read and review my prospectus prior to my defense.  Really, 2 weeks is plenty of time for them to review one prospectus.  However, they will be very busy at the beginning of the semester, so I plan to give them 4 weeks.  My prospectus must be completed by August 7, 2009.

I have approximately 12 weeks to complete my prospectus.  

I could say that I will work on the introduction for 4 weeks, then the literature review for 4 weeks, and then finish with the methodology for final 4 weeks.  This approach seems artificial to me.  I think a better, more organic approach would be to allow the prospectus to emerge from my reading.  I think I should start by intensely focusing on the literature.  I also need to follow up on the contacts that I made at the SITE conference.  To be realistic, I need to take into account that I will be on vacation with my husband for at least two weeks this summer.  As a result, I am proposing the following time line:


  • May 21 – Prepare reading list from references of articles already read, and the recommendations of those in the field.  Begin reading the articles/books and take notes on each reading.  Also, begin a concept map connecting the themes from all of the readings.  The purpose of my reading between today and June 4 will be to better understand the terminology.
  • June 4 – Prepare an outline based on my synthesis of my readings to date.  Frame broad research questions.  Edit (most likely augment) the original reading list.  The purpose of my reading between June 5 and June 11 will be to identify the types of research studies that have been conducted previously.
  • June 11 – Revise the outline to include specific research questions.  Identify the types of study most appropriate for the research questions.  The purpose of my reading between June 11 and June 18 will be to identify specific research methodologies that are appropriate for my study.
  • June 18 – Revise the outline to include a description of my proposed research methodology.  Contact any available committee members for feedback at this point.  The purpose of my reading between June 18 and June 24 will be to search for possible inspiration from tangential areas of research.
  • June 24 – Revise outline to reflect any feedback received from the committee and any inspiration from reading.   Honestly, I’ll be attending a family reunion in late June/early July, so I’m not likely to do much academic reading during the weeks that immediately follow June 24.  Still, I will take my articles with me for the plane rides.  My purpose for reading between June 24 and July 9 is to re-read the literature with fresh eyes, looking for insights I may have missed previously.
  • July 9 – Prepare my first prose draft of the literature review of my prospectus.   Provide copies of the draft to peers and members of my committee for feedback.
  • July 16 – Prepare my first prose draft of the methodology section of my prospectus.  Provide copies of the draft to peers and members of my committee for feedback.
  • July 23 – Prepare my first prose draft of the introduction section of my prospectus.  Provide copies of the draft to peers and members of my committee for feedback.  Between July 24 and August 6th, my purpose for reading will be to discover solutions to issues raised during the feedback process.
  • July 30 – Prepare a list of constructive criticisms offered by peers and committee members.  
  • August 6 -Revise all sections of my draft, incorporating feedback from my peers and committee members.
  • August 7 – Provide all committee members with a copy of the prospectus in preparation for the defense on September 4.
  • September 4 (or before) – Defend the prospectus.  I welcome feedback.

Am I being realistic?  Any suggestions?


Posted in about me, Dissertation, Graduate Work | 3 Comments

Assessing Affective Characteristics in Schools

Another book summary in partial fulfillment of my independent reading assignment for graduate school.

Brief Review

I was assigned to read Assessing Affective Characteristics in Schools by Lorin Anderson and Sid Bourke.  I found the text to be less technical than Summated Rating Scale Construction, but often more detailed in its advice.  (This shouldn’t be particularly surprising, since Anderson and Bourke used far more pages than Paul Spector.)  Anderson and Bourke also dedicated far more pages to convincing the reader of the necessity of assessing affective characteristics than Spector did trying to convince the reader of the necessity of constructing summated rating scales.  Over the past few years, I’ve become increasingly convinced of the importance of affective characteristics in learning, particularly in the role of motivation.  As a result, I sometimes felt that Anderson and Bourke were preaching to the choir, and wished I could read a less evangelical version of the text that would simply tell me what I needed to know to get the job done.  

Summary of Content

In the first chapter, Anderson and Bourke define the terms that comprise their title. They enumerate five features that they claim define affective characteristics, specifically, that affective characteristics are typical ways of feeling that are directed toward some target with some intensity. Anderson and Bourke define assessment as “the gathering of information about a human characteristic for a stated purpose.” The authors choose to focus on affective characteristics of students in the context of school settings. According to Anderson and Bourke, affective characteristics have value as means to ends and as ends in themselves. In the latter sections of the first chapter, Anderson and Bourke address common beliefs that sometimes impede the assessment of affective characteristics in schools. According to Anderson and Bourke, affective can and should be assessed in school settings.

Chapter two of Assessing Affective Characteristics in Schools focuses further on definitions, detailing the importance of clearly defining the specific affective characteristic or characteristics that one intends to assess. Anderson and Bourke also point out the importance of carefully defining the target to which the affective characteristic is directed. Conceptual definitions provide an understanding of abstract meaning while operational definitions specify behaviors that allow observers to make inferences about affective characteristics. The authors believe that conceptual and operational definitions must be closely aligned in order to provide useful information about a particular affective characteristic. The chapter provides a description of two major approaches for developing operational definitions of affective characteristics, the mapping sentence approach and the domain-reference approach. Whether one is creating a new assessment instrument or selecting a previously created assessment instrument, one should begin with a precise definition of the affective characteristic in question.

The third chapter discusses the major methods for collecting data about human characteristics, the observational method and the self-report method. Both methods have strengths and weaknesses. The observational method is limited by the observer’s powers of observation as well as their powers of interpretation. The self-report method is limited by respondent’s memory and/or integrity as well as the questioner’s ability to ask the right questions. Some studies have shown that observational and self-report methods that claim to assess the same characteristic provide dissimilar results. Anderson and Bourke believe that, at least in the context of schools, self-report methods are generally superior. However, the authors also state that they do not intend the chapter to be interpreted as a complete rejection of observational methods.

Good affective scales must have communication value, objectivity, validity, reliability, and interpretability. A questionnaire has communication value if the respondent can easily understand what the questionnaire is asking them. A scale has objectivity when it has minimized scorer or coder bias. An instrument has validity when it actually measures what it purports to measure. Scales are considered reliable when they have internal consistency, stability , and equivalence. Internal consistency is often measured by Cronbach’s alpha, stability may be measured using test-retest results, and equivalence may involve a comparison of multiple measures of the same affective characteristic. Questionnaires are considered to have interpretability when the results are reported in such a way that primary audience of the data can understand the results. Anderson and Bourke describe a number of common practices in the assessment of affective characteristics included the use of several varieties of Likert scales.

Anderson and Bourke provide advice for either selecting or designing assessment instruments for affective characteristics. When possible, they recommend selecting an existing an instrument over designing one. They enumerate several potential sources for locating existing assessment instruments,

  • electronic databases,
  • commercial publishing houses,
  • professional associations,
  • research institutes and laboratories, and
  • compendiums.

 They also provide a list of six steps for designing a new instrument:

  • preparing a blueprint,
  • writing the items,
  • writing directions,
  • having the draft instrument reviewed,
  • pilot testing the instrument, and
  • readying the instrument for administration.

 However, whether an individual will select an existing instrument or design a new one, Anderson and Bourke emphasize that the first steps are to determine the purpose of the assessment, identify the target population, and define the affective characteristics and targets. The authors list four common categories of purposes for affective assessment,

  • enhancing student learning,
  • improving the quality of educational programs,
  • evaluating the quality of educational programs, and
  • conforming to administrative or legislative mandates.

 Data analysis is the main focus of chapter six. The authors provide a list of five steps for developing and analyzing scale scores:

  • coding,
  • entering and checking data,
  • dealing with missing data,
  • recoding items as necessary,
  • checking scale validity and reliability, and
  • creating and reporting scale scores.

Anderson and Bourke address the importance of good data and provide advice for error checking, such as dual coding, as well as methods for dealing with small amounts of missing data. The authors also discuss using factor analysis to address empirical validity in multiscale instruments.

 The authors describe the process of interpreting assessment data for affective characteristics in chapter seven. They suggest using absolute and/or relative comparisons to assist in the interpretation of the data. Absolute comparisons require the identification of a neutral point and the creation of a neutral range as well as a range above the neutral range and a range below the neutral range. Relative comparisons may involve a normative sample or it may involve comparisons between known groups whose scale scores are expected to differ. Interpretations will depend on the comparison method used.

 Anderson and Bourke use chapter 8 to argue the importance of affective assessment in finding solutions to common education problems including student motivation, the design of effective learning environments, and character building.   

Posted in Books, Coursework, Graduate Work | Tagged , , , | Leave a comment

Summated Rating Scale Construction: An Introduction

A summary of Summated Rating Scale Construction: An Introduction by Paul E. Spector.  This summary is provided in partial fulfillment of the requirements for my independent reading course this semester.  

Brief Review

Spector uses the Work Locus of Control Survey throughout this work to exemplify the process of constructing summated rating scales. I found it more useful to consider how the advice given applies to the instrument that Dr. Graham has developed to assess pre-service teachers’ assessment of their own Technological Pedagogical Content Knowledge. Also, since Stata, not SPSS, is my preferred statistical package (and because this text was published in 1992) I found the information on computer software irrelevant or obsolete. Still, I think the text helped me to better understand information that I had previously read in survey methodology texts.

Summary of Content

One of the defining characteristic of a summated rating scale is the presence of multiple items. Multiple items provide reliability and precision. Additionally, the individual items that comprise a summated rating scale must be measured using a continuum and written so that there is no single answer. Individuals responding to a summated rating scale must answer each item with its own rating.

The process of developing a summated rating scale is iterative. The primary step involves defining the construct. Only after construct definition, can a researcher hope to design and then pilot a scale. Once a scale has been piloted, the next step is to administer the instrument and conduct a thorough item analysis. The results of the analysis may lead the refine his or her original construct definition. Once the researcher is satisfied with the construct definition, he or she may begin to validate and norm the assessment.

Three common categories of response categories include agreement, evaluation, and frequency. According to Spector, the optimum number of responses for an item ranges between five and nine. Negative responses should be re-scaled before the data is analyzed. The formula for re-scaling negative data is R = (H + L) – I where H is the largest number, L is the lowest number, I is the response to an item, and R is the score for the reversed item.

Spector shares several rules of thumb for item writing:

  1. Items should express single ideas.
  2. Some items should be worded positively, others negatively.
  3. Items should avoid the use of colloquialisms, expressions, and jargon.
  4. Item-writers should remember the reading level of the target audience for the scale.

A main purpose of item analysis is to determine the items that contribute to the internal consistency of the instrument. Coefficient alpha is a common measure for describing internal consistency and 0.70 is a minimum target. Coefficient alpha is used in tandem with item-remainder coefficients to identify potentially troublesome items. One strategy for selecting items for inclusion are to decide on a number, for example, m, and then select the m items with the highest item-remainder coefficients. Alternatively, you can set an item-remainder coefficient criterion and include all items that meet the set criterion. A researcher may consider other, external criteria, such as social desirability, hen selecting items. The Spearman-Brown prophesy formula can provide a useful estimate of the number of items needed to reach internal consistency.

There are many different ways to study the validity of an instrument. Criterion-related validity includes concurrent, predictive, and known-groups validity. Each of these criterion-related validity techniques involves a comparison between the scores from the summated rating scale in question and a set of other variables. In concurrent validity studies, the scale scores are collected at the same time, from the same individuals, as the other variables. In predictive validity, the scale scores are collected and then used to predict the value of a variable in the future. In known-groups validity, the researcher tests one or more hypotheses about differences between the scores to two or more groups.

Convergent and divergent validity studies are based on the principle that measures of the same construct will correlate strongly while measures of different constructs will correlate less strongly. Researchers use the Multitrait-Multimodal Matrix (MTMM) in order to explore convergent and divergent validity.

Factor analysis is another tool that researchers use to explore the validity of instruments. Exploratory factor analysis helps to determine the number of constructs that might describe a particular data set. Confirmatory factor analysis can help determine if a set of constructs in a theoretical framework fits the empirical data.

Spector suggests that researcher validate instruments by collecting as many different types of evidence as possible . Spector also addresses the importance of determining the reliability of the instrument, not only internally, but across time, as in test-retest reliability. Additionally, Spector points out that instruments should be normed with samples from the appropriate target population, not simply with samples of convenience found on college campuses. When calculating norms, mean and standard deviation are of primary importance, as is the overall shape of the distribution.

Finally, since scale construction is a recursive, iterative process, it is never-ending. The goal is not perfection, but to get a scale that behaves consistently within its own theoretical framework.

Posted in Books, Coursework, Graduate Work | Tagged , , , | 2 Comments


I sat down next to one of our IT guys at a meeting last week. “Is that an iPhone?” I asked.

“Yes. . .do you want to play with it?”

“No. My mother-in-law gave me an old iPod Touch over the weekend.”

“It’s great, isn’t it? It was like a gateway drug for my wife.”

I laughed.

“I’ve always been a gadget guy,” he continued, “but I’ve never loved a gadget the way I love this one. It has changed my life.”

I’ve wanted an iPhone for a long time, but never bought one. Now that I’m holding an iPod Touch in my hands’ I’m wondering why I didn’t get one of these before. I’m amazed at this tiny handheld computer; I feel like I’m living in a science fiction movie.

I’ve been experimenting with a host of free apps and researching a number of paid apps that I think I need. So far my only paid download has been a 99 cent ebook collection called Classics (I strongly recommend it). I’m looking for a good app for reading PDF and other documents. I’ve seen several in the App Store, but I’m not sure which one will best meet my needs. I’m mostly concerned with how the App transfers files to the iPod and haven’t been able to find the information I need in the reviews, so I’ve been waiting.

In the meantime, I’m not sure if the iPod touch will be my “Gateway drug” to the iPhone, and I’m waiting to see if it will change my life, but either way, it’s pretty cool.

Posted in about me, Technology, Tools | Tagged | Leave a comment

Guidelines for Blogging at Conferences?

The following is a slightly altered version of a post I made on the AEA Technology Forum.  The original post was a reply in a thread about whether AEA should consider writing guidelines for bloggers at the AEA conference, and if so, what kind of guidelines they should be.

I haven’t seen any good guidelines for conference blogging, but I’ll admit that I haven’t really looked for any either. Thinking as a presenter, I wouldn’t want the blogging activities of audience members to interfere with my ability to present my ideas. For example, if anyone were to attempt to do a live vlog post during my presentation, I would find it distracting, annoying, and simply rude. Also, I wouldn’t want my presentation recorded in an audio or video format without my consent.  However, I think it would be fair to expect that many audience members will prefer to take notes in a digital format and I think they have the right to take written notes for their personal use during any conference session they attend.

Public blogging differs from simple note-taking. If I am thinking as a researcher, my main concern with blogging during conference sessions is the proper attribution of ideas.  If a researcher presents research at a confernce she has consented to share those ideas, at least with other conference attendees.   Is she also consenting to share those ideas with the public at large? I believe that she is, or at least should, but others may not agree. No matter how widely I share my ideas, I still want credit for them and I believe that most researchers feel similarly.

As a blogger, I want the freedom to write about what interests me.  I want to be able to post my reflections as I synthesize ideas from various sources.   I hope that others will provide feedback on my ideas, helping me to sharpen my thinking and clarify my languange.  I want as few barriers to this reflective process as possible.

Synthesizing these three perspectives, I think that a good set of standards for conference blogging would address at least the following:

  • Possible disruptions to on-going presentations
  • Assumptions about consent to disseminate ideas
  • Procedures or policies on attributing sources

What else should be included?

Reblog this post [with Zemanta]
Posted in Conference, edublogosphere | Tagged , , , , | 3 Comments

The 100th Post Anniversary Extravaganza

Auckland Anniversary Fireworks 2009
Image by Chris Gin via Flickr

On March 16, 2008, I clicked “Publish” for the first time and declared myself “(No Longer) Alone in a Library”.  Though my posts make no reference to it, a year ago, I was just out of the hospital, sad and scared, and needed not to be alone.  The first comments, kind and encouraging, appeared within two weeks; I am forever grateful.

One year, 100 posts, 152 comments, and approximately 5,695 visits later, I find it convenient to pause and reflect.  The graph below captures some of the story behind these posts, comments and visits.  I posted most actively when I participated in the comment challenge that took place in May of last year.  I made an effort to connect to other participants in the challenge through commenting and my fellow participants repaid me handsomely with their comments.   The frequency of my posts fell off dramatically after the challenge, but began to rise again in September, as I attempted to document my participation in two courses that I was auditing.   I believe that my participation in these courses triggered the second comment spike seen on the graph.  Near the end of November, I nearly disappeared from blogging, reemerging only last month.

A year in the life of (No Longer) Alone in a Library

A year in the life of (No Longer) Alone in a Library

There are stories that the graph can’t tell.  Through blogging, particularly the comment challenge, I “met” some interesting and intelligent people that I’m sure I never would have met otherwise.  Most notable among these are two of my most frequent commenters, Sarah and Ines.  Who would have thought I’d correspond with an English midwife living “down under” or a multilingual teacher of Portuguese in Lisbon?  Both of these women provide me with additional lenses to examine my work.

Since starting this blog, I’ve moved across two mountain ranges, countless rivers and approximately 2,000 miles.  Still, I’ve been able to keep in touch with friends, colleagues and classmates through this blog and other technologies.  Two valued friends/colleagues/classmates (they both fit in all three categories) are Andrea and SaraJoy.  I don’t get to talk with them nearly as often as I’d like, but I appreciate that they take time to read and comment on my ideas.

Recently, (No Longer) Alone in a Library has brought me an unexpected opportunity.  Because I took the time to post a reflection on the 2008 American Evaluation Association Conference in Denver, I have been invited to participate with other AEA members in an discussion about emerging technologies, particularly blogging.    Several opportunities have been dropped in my lap in the past year and I believe that while this is the first that can be directly traced to my blog, I’m certain that what I’ve learned through blogging is a big reason so many opportunities have come my way.

One final observation, before I close for the night . . . I haven’t empirically tested the data, but I feel fairly safe in concluding that the non-spam comments left on my blog are made by people who have established some sort of a connection with me.  Some of these connections go back to real-world relationships, some to common organizational ties, and some of the connections are made when something I wrote resonates with something inside the reader.  Without the connection, nothing special ever happens.

Reblog this post [with Zemanta]
Posted in about me, Conference, edublogosphere | Tagged | Leave a comment