Saturday, May 26, 2007

I think I just self-plagiarized all over myself...

Plagiarism is something that I think about alot for a couple of reasons. One of them goes back to my first non-Tech honor code experience. During my first year at UVa, the student newspaper catapulted a physics prof to national attention for rosenwinkelling over 100 students in his class for plagiarism. Lou Bloomfield taught a very popular non-mathematical intro phys class called "How Things Work." The "final exam" for the class is essentially a paper that should describe how something works. Motivated by accusations from one of the students in his class, Lou wrote a simple text comparison program to investigate all the papers submitted electronically over the years. In the end, a bunch of students were asked to leave the school or had their diplomas revoked. That particular case was pretty cut and dry because, in most instances, large chunks of the papers had been copied verbatim.

That brings me to the another reason why I think about plagiarism: "Self-Plagiarism." Many of my talks and presentations and technical writeups are copied from each other. In other words, once I've found a way to describe a piece of equipment or a physical process, I simply recycle that piece of text without attribution. I mean, I wrote the damn thing in the first place, right? That should be okay, right? Well, if you google "self plagiarism," one of the first links that comes up is this website by Miguel Roig written for the Office of Research Integrity at US Department of Health and Human Services. Three of the things that Roig lists under self-plagiarism are "redundant or dual publications," "salami slicing," and "text recycling."

An example of dual publication is when the same work or paper is published in two different journals. I personally have never seen this and have trouble believing people try to pull this off. Ok, nevermind: my cubicle buddy just informed me that theorists (snicker) do this all the time. Salami slicing or to quote Roig "the segmenting of a large study into two or more publications" is considered "unacceptable scientific practice." Really? If I understand this correctly, then we do this all the time. Without getting too technical, let me explain: we measured quantities I'll call "A1" and "A2" as a function of another variable "v." One paper we published was essentially "(A1-A2)/v." We then published another paper that was essentially "(A1-A2)v^3."...And then three more papers were published that were literally different linear combinations of A1 and A2. I'm almost almost almost not kidding. The physics of these "derived" quantities are related, but different. Even though all the data came from one experiment taken over a single time interval, is this still self plagiarism and unacceptable?

Here's another situation that Roig talks about (called data augmentation): "when a researcher publishes a study and subsequently collects additional data, which typically end up strengthening the original effect, and publishes the combined results as a new study." Guess what, we've done exactly this as well (see first two links)! Again without going into the details, we count the number of electrons that hit the detectors after they bounce off the target. Being a "counting" experiment, the relative statistical uncertainty scales as the inverse of the square root of the number of electrons counted. We took data in three chunks over two years. Our preliminary results were published after the first year and our final combined results were published after the experiment ended. This appears to be an almost perfect of, in Roig's words, "old data that has been merely augmented with additional data points and that is subsequently presented as a new study." Roig calls this practice a "serious ethical breach."

Finally, he gets to the question that I originally had about text recycling: "a writer’s reuse of portions of text that have appeared previously in other works." Roig gives examples when this is acceptable and when it is "borderline or unacceptable." As you can probably guess by now, we've done it. I'll spare you the details. do i reconcile these things? Well, first of all, the things that I described are fairly common practice in the field I work in: nuclear physics. This is what I call the "cultural differences" defense. Roig makes many good arguments for why self plagiarizing is bad in the "biomedical and social sciences" arena...but can analogous arguments be made to be suited for other fields? I don't know, but maybe I've been "cultured" to believe that what we do is okay. When I think about text recylcing, I feel it's no different from using the same figure depicting an experimental apparatus over and over again. Should you have to make a unique diagram for each new publication if the experimental apparatus is the same? I would say no, but then what's the difference between that diagram and the text used to describe that diagram? And what about our salami slicing? Well, as my cublicle buddy argued, all of those articles were published in a journal that has a limit of ~4ish pages per article. There is no way that we could cover ~20 pages of physics results in ~4 pages. This is what I call the "It's not our fault" defense.

Finally, the trickist one is data augmentation. In the example I used for what we did, the two papers had a different emphasis. Our first year data was a "new" result in the sense that no one had measured it before and it could have been "zero." The fact the the result was not "zero" was a significant finding itself: it was consistent with what we call the Standard Model of Physics. In our second paper, we were interested in seeing if there was any small deviation in the quantity that we were measuring from the theoretical value. This question required more data so that we could achieve the desired statistical precision. (By the way, there wasn't a statistically significant deviation.) Because the scientific questions were different, I claim the two papers really stand on their own. This is what I call the "No, no they're really two different things (hands waving)" defense. My last defense and maybe the most relevant one is the general idea that at no point did we ever try to "decieve" the reader, which is the standard that Roig repeats throughout his document. But this leads to the question of whether an author's "intent" is relevant to the determination of whether plagiarism has occurred. The answer probably depends on what kind of plagiarism is meant by "plagiarism."

Let me attempt to clarify using the ideas of Erik Campbell: "hard" plagiarism is the copying portions of text verbatim. In his very amusing article at the Virginia Quarterly Review Campbell reflects on his run-in with "accidental" hard plagiarism in poetry. He also presents the idea of "soft" plagiarism: "pilfering another’s ideas." This turns out to be a very murky subject because one has to walk a careful line between "creative influence" and "stealing ideas." How does one draw the line when discussing an artistic endeavor?

Take the case of Bryony Lavery's Tony-nominated play "Frozen" as outlined in Malcolm Gladwell's New Yorker article. The play is about a killer, the victim's mother, and a doctor who is studying the killer's mental state to understand his motivation. As Gladwell recalls, the doctor is based on a real life person named Dorothy Lewis whom he had written about in a New Yorker article years ago. The play's author, Lavery, adapted many of the scenes for her play directly from events described in the original article. In some cases the dialogue was (verbatim ) quotes cited in the article. None of these things were attibuted to Gladwell or to the real life doctor Lewis by Laverly. Gladwell goes back and forth about it and ponders how different things that are the result of a creative process, particularly musical ones, are related to each other. Is the relationship one of "cut and paste" or one of transformation and change? Eventually he chides the "plagiarism fundamentalists" for "[pretending that] chains of influence and evolution do not exist, and that a writer’s words have a virgin birth and an eternal life."

Meghan O'Rourke at Slate goes into more detail about how originality and creativity are related to plagiarism. Her article is relevant to the case of Florence Deeks and H.G. Wells which is recounted in Jonathon Keat's review of A.B. McKillop's book "The Spinster and the Prophet." Whereas, in the Lavery case, Gladwell argues that the two works share a "parent-child" relationship, this one is more of a sibling rivalry: a single path bifucates into two different competing trails. The controversy surrounds H.G. Well's famous book "The Outline of History." McKillop argues that although Wells and Deeks appear to have come up with the idea of writing a "history of everything from the beginning" independently, Wells' books clearly borrows heavily from Deeks' book. However, for Keats, hard plagiarism takes a back seat to soft plagiarism. He argues that Wells' book provides evidence for the important and original idea that "the progress of society" is to be measured against the yardstick of democracy. On the other hand, Deeks had written a feminist tome which presented evidence for a different idea but similarly "deeply original for its time", namely that "civilization (as opposed to barbarity) is feminine" and that "peace and properity were characteristic of female leadership."

In all of aforementioned literary examples, care is taken to distinguish between questions of plagiarism, which in my opinion are resolved in the court of public opinion, and questions of copyright infringement, which is a legal issue. Along these lines Tim Wu at Slate produces a thought provoking article discussing the legal battle between Dan Brown (the Da Vinci code) and Robert Leigh, "a self-appointed grail expert." Essentially the historical and religious claims that Brown presents as fiction are the ones that Leigh and his coauthors present as non-fiction in a book called "Holy Blood, Holy Grail." Wu addresses the following interesting questions (1) "Can one writer freely borrow someone else's wacky historical speculations?" (2) "When an author offers up a speculation like "space aliens killed JFK," does it really make sense to call that a fact?" (3) "How can dueling authors ever have a meaningful public discussion of who Mary Magdalene was, if, for example, one side claims exclusive ownership of the theory that she was a lowly prostitute?" The precedent for this case exists in American law and Wu summarizes the reasoning succintly: "If the author calls it a fact, you can steal it."

Finally here are some things that I'll save for another post by me or some interested party: (1) the many pieces of software that exist to uncover hard plagiarism, not the least of which is Google itself: Paul Collins at Slate discusses the impact that google book search will have on old and new cases of literary hard plagiarism. (2) Recent high profile cases of the two historians Stephen Ambrose and Doris Kearns Goodwin. (3) How the question of plagiarism is approached in a journalistic context. (4) John Fogerty's long and strange legal battle with Fantasy Records.


dave hiller said...

I don't have to much to add to this well-written post, except to say that we had to take an ethics class to qualify for NIH funding, one day of which was on plagiarism (mostly of the self- variety). I don't remember much of it, to be honest. I will point out that once you publish an article in a journal, you don't own exclusive rights to the material, which is significant for these sorts of questions.

Also, while there are many possibilities, I approve of this definition for the verb "to rosenwinkel".

Alan Rosenwinkel said...

Not to be confused with the verb "to rosenwinkle": To misspell a word so severely that a spell checker has no suggestions.

jaideep said...

and not to be confused by the verb "to rosenwinkal": to perform an act of physical violence in an ungrateful manner. The rosenwikalee is never at fault and, quite often, has provided aid to the rosenwinkaler usually immediately before the rosenwinkaling is administered.

on a different note, i was amused by (1) this discovery of plagiarism in power point presentations and (2) this advice from an older mathematician to younger mathmeticians about "Publish[ing] the Same Result Several Times"

...and finally, as Dave pointed out, it is true that there are potentially some copyright issues b/c one usually must surrender those rights to the publisher of the journal. from that point of view, text and diagram recycling is probably illegal, but it is unethical?

Alan Rosenwinkel said...

To me there are two distinct questions when submitting a paper for publication:

1) It is someone else's work. If yes, it's unethical and plagiarism. If it's your own original work or your own recycled work, it's okay.

2) Is it a new, publishable idea? If it is, then so be it. Who cares if some of what is written is recycled, as long as it is used to support a new idea. It's not like the old days where space in printed journals is at a premium. For the most part, now days you're just adding a link to a web page.

