Skip to main content

Does the chunk argument trump the plagiarism allegations?






Photo by Marc Nozell via Flickr



One of the hottest
news this week has been Melania Trumps’ allegedly plagiarized speech. Why
allegedly? Because although Donald Trump’s wife address at the 2016 Republican Convention
bears marked similarities to Michele Obama’s speech at the 2008 Democratic
Convention, there is not much in it that would effectively constitute stealing in the linguistic sense.







In the last 30 years,
corpus research (study of language through samples of 'real world' text) has shown that language is highly formulaic, i.e. consisting of
recurring strings of words, otherwise known as “chunks”. What
makes them chunks is the fact that they are stored in and retrieved from memory
as ‘wholes’ rather than generated on a word-by-word basis at the moment of
language production. 





Public speeches are a prime example of
formulaicity in language in that they consist of conventionalised routines; some of these are very fixed, highly probable combinations whose content can be predicted by the hearer. A few years ago
when I was invited to give a talk at a conference I started my session
with a slide with the following lines:




thank you
for  _______ me here today...
      


a topic I am particularly  _______  in     


have enjoyed a fruitful
_______


hope our relationship
continues to _______



and got the audience to complete them. Here’s what they came up with, as I’m sure you did too:






thank you
for  having me here today...
               


a topic I am particularly  interested  in                  


have enjoyed a fruitful co-operation


hope our relationship continues to grow
















Chunks in different genres





The formulaic nature of language was first brought
to the fore in a seminal paper by Australian linguist Andrew Pawley and and his colleague Frances Syder, who pointed out that competent language users have at their disposal
hundreds of thousands of ready-made phrases (Pawley and Syder 1983). Some linguists
have argued that up to 80% of English text (Altenberg 1998) consists of
recurring sequences. More conservative estimates suggest that 50-60% of discourse
is formulaic (Erman and Warren 2000).





The figure, of course,
depends on the genre. There are fewer chunks in creative writing or fiction
but more chunks in news reports. Similarly, when it comes to spoken language there
will be fewer chunks in storytellers' narratives, but a higher prevalence - probably nearing a more
liberal estimate of 80% - in the speech of auctioneers, TV sports announcers or
other ‘smooth talkers’ (Kuiper 2004 cited in Schmitt 2010). This is
because language users rely on chunks to produce fluent speech under time pressure. In
addition to that, chunks perform a number of interactional and social functions,
and are used to accomplish various transactions. Your exchange with a shop
assistant is likely to be very formulaic and predictable:




                                




Image by Julian Lim on Flickr [CC BY 2.0]


Excuse me, do you
work here?


Can I help you?


I’m just looking
around.
 


Have you got ….. in
[size] ? 


I’m looking for a … 


How much is ... 


Where is the
fitting room?





Academic discourse also
relies heavily on chunks. Analyses of academic corpora show that academic writing is
made up of a substantial number of recurring word combinations:


On the other hand 


At the same time 


In the present study 


In terms of 


As shown in future 


It was found that


 (from Biber, Conrad, & Cortes, 2004)








































  


Let me just say this



Although they are usually not constructed
in real time, political speeches are a shining example of a genre laden with formulaic
language. Not only do they contain a high number of (grossly overused) recurrent combinations, they employ similar rhetoric and generally follow the same format. Moreover,
they are so remarkably similar that their content can be distilled down to
an algorithm. Indeed, that’s what a group of researchers at the University of
Massachusetts recently did. They ran 4,000 political speech segments through text analysis
software and came up with an algorithm which can generate convincing political speeches.
To do this, they built a model based on n-grams, which evaluates the probability
of a word appearing after a given number of items (words) – a model commonly
used in computational linguistics made popular by Google N-gram Viewer. Put simply,
they taught a robot to write speeches similar to formulaic and cliché-ridden
speeches by politicians.





Obama’s well applauded Victory speech is no
exception. Here’s the final part of his famous 2008 speech:


America, we have come so far. We have seen
so much. But there is so much more to do. So tonight, let us ask ourselves – if
our children should live to see the next century; […] This is our chance to answer that call. This
is our moment. This is our time – to put our people back to work and open doors
of opportunity for our kids; to restore prosperity and promote the cause of
peace; to reclaim the American Dream and reaffirm that fundamental truth – that
out of many, we are one; that while we breathe, we hope, and where we are met
with cynicism, and doubt, and those who tell us that we can’t, we will respond
with that timeless creed that sums up the spirit of a people...







Photo by cfishy on Flickr [CC BY 2.0]





Apart from clichés such as  “This is our moment. This is our time” and “And
while we breathe we hope”, it contains
collocations (which is one kind of a formulaic sequence) such as “promote
the cause of peace” and "reaffirm the truth” as well as a number of predictable strings:





America, we have come so _____. We have
seen so _____. But there is so much more to ____.







Let’s now take an
excerpt from Melania’s speech, which came under criticism:





From a young age, my parents impressed
on me the values
that you work hard for what you want in life,
that your word is your bond and you do what you say and keep your promise,
that you treat people with respect.
They taught and showed me values and morals
in their daily lives. That is a lesson that I continue to pass along to our
son. And we need to pass those lessons on to the many
generations to follow. 




                                     


A quick corpus search
will tell you that “impress on(upon)” is commonly used with PARENTS and FATHER,
and the things that are usually impressed on are THE IMPORTANCE / NEED /  VALUE. The Longman Dictionary actually gives the following example:





Father impressed on me the value of hard
work.





So if I was teaching “impress on smb” I’d
probably give this as an example of how the verb is used
.
Then I’m sure you’ll find there is nothing illicit with “work hard” or “keep promise” either. On the
contrary, I’m certain you would correct your students if they said *worked
hardly
 or used held instead of keep in "keep a promise"





Looking at “treat
people with respect” which is supposedly copied from Michelle Obama’s “treat
people with dignity and respect”, you will see that dignity and respect are two of the very highly likely collocates here. Here is how Netspeak, a tool which helps you find a missing word in a sequence (see how you can use it HERE), suggests "treat people with" should be completed:   











So to answer the
question in the title of this post, was Melania Trump’s speech lifted from Michelle
Obama’s or have the accusations of plagiarism been largely trumped up? If Melania’s
faux pas indeed constitutes plagiarism, the text of her speech was no more plagiarized than
an academic paper containing "Recent research has shown that" or "The results are consistent with data obtained in..."







References







Altenberg, B. (1998). On the phraseology of
spoken English: the evidence of recurrent word-combinations. In A. P. Cowie
(Ed.), Phraseology: theory, analysis and application (pp. 101–122). Oxford:
Oxford University Press. 





Biber, D., Johansson, S., Leech, G.,
Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written
English
. Harlow: Pearson.





Erman, B. & Warren, B. (2000). The idiom
principle and the open choice principle. Text 20(1): 29–62





Pawley, A., &
Syder, F.H. (1983).
Two puzzles for linguistic theory: nativelike selection
and nativelike fluency
in Richards, J.C. & Schmidt, R.W. (eds) Language
and Communication
, London; New York: Longman, pp 191 – 225. Available
online at
http://www.uni-mainz.de/FB/Philologie-II/fb1414/lampert/download/so2008/PawleySyder.pdf



















Schmitt, N. (2010). Researching vocabulary.
Basingstoke, England: Palgrave Macmillan.















Comments

Popular posts from this blog

Austerity-A Fancy Word for Destitute.

The reason for this post is not for the folks who have been caught in the first wave of personal economic hard reality, but the next wave. Regardless of the optimism espoused by grinning leaders and sycophant press, we are entering the final stage of global economic collapse. It began in 2008 and was forestalled for five years with fudge putty, but the weight of global indebtedness cannot be propped any longer and the final crunch is imminent. Austerity measures herald the final throes.  Indications of coming austerity.   Austerity measures are the final last ditch effort, futile or not! Back in the day many of us old-timers went through periods of "hard-times". In retrospect I realize there is no comparison to yesteryear hard times and today's version. Back then, expectations were never very high for the working class, there were no sophisticated systems or conveniences anyway. In fact the difference between being "set" or not was about having treats or not. Si...

Terrifying Arctic methane levels

A peak methane level of 3026 ppb was recorded by the MetOp-B satellite at 469 mb on December 11, 2021 am. This follows a peak methane level of  3644 ppb  recorded by the MetOp-B satellite at 367 mb on November 21, 2021, pm. A peak methane level of 2716 ppb was recorded by the MetOp-B satellite at 586 mb on December 11, 2021, pm, as above image shows. This image is possibly even more terrifying than the image at the top, as above image shows that at 586 mb, i.e. much closer to sea level, almost all methane shows up over sea, rather than over land, supporting the possibility of large methane eruptions from the seafloor, especially in the Arctic.  Also, the image was recorded later than the image at the top with the 3026 ppb peak, indicating that even more methane may be on the way. This appears to be confirmed by the Copernicus forecast for December 12, 2021, 03 UTC, as illustrated by the image below, which shows methane at 500 hPa (equivalent to 500 mb). Furthermore, ...

Women and children overboard

It's the  Catch-22  of clinical trials: to protect pregnant women and children from the risks of untested drugs....we don't test drugs adequately for them. In the last few decades , we've been more concerned about the harms of research than of inadequately tested treatments for everyone, in fact. But for "vulnerable populations,"  like pregnant women and children, the default was to exclude them. And just in case any women might be, or might become, pregnant, it was often easier just to exclude us all from trials. It got so bad, that by the late 1990s, the FDA realized regulations and more for pregnant women - and women generally - had to change. The NIH (National Institutes of Health) took action too. And so few drugs had enough safety and efficacy information for children that, even in official circles, children were being called "therapeutic orphans."  Action began on that, too. There is still a long way to go. But this month there was a sign that ...