You may have noticed it: Plagiarism in TechNet Wiki is becoming a problem, and it is getting more attention now. This article presents a first empirical analysis, gives insights to some mechanisms behind plagiarism, explains the current rules, shows alternatives to plagiarism, and summarizes the critical points. Last but not least  it proposes first steps to fix the problem.

If you are interested in the most recent results concerning pirated content you should have a look at this article:

↑ Return to Top


First empirical data

Until today no in depth investigation on plagiarism in TechNet Wiki has been done. So some empirical data might be interesting. Even though this is no long term study, it gives interesting insights and allows the deduction of some rules of thumb.

Between June 7th and July 5th 2013  a search for plagiarisms was conducted in 7 “search sessions”. This analysis only counted those articles as plagiarisms, which have an overwhelming percentage of stolen content. This approach was to avoid discussions about cases of doubt or minor mistakes of members who are not acquainted with citation standards.

When you look at (a) most new articles of last month (b) selected googled articles with just one revision (c) some articles of members who appear in one of the Wiki leaderboards (d) some other articles of members who have already published a plagiarism. The analysis was restricted to articles written in English.

Let’s have a look at the observations and some first deductions (“rules of thumb”).

Frequency

Observations:

  • found 66 plagiarisms written by 13 members.
  • 31 plagiarism were written within the last month
Rules of thumb:
  • The frequency of heavy plagiarism is similar to spam.
  • It is quite easy to detect plagiarism - if you are willing to do it.

Awareness of the community

Observation:

  • noticed only 2 other persons - Gokan Ozcifci and XAML guy – who detected (three) plagiarisms. XAML guy mentioned one case in the top contributor blog post recently.
Rule of thumb:
  • The community is not aware of plagiarism. That’s why plagiarisms are rarely detected.

“Second offenders”

Observation: The table shows for each anonymized plagiarist (NN 1 – NN 13), the number of plagiarized articles and some core profile data. The analysis did not check all articles of all listed members (NN 1 – NN 13). I.e. the plagiarism count is a lower bound. These cases are indicated by the “≥” sign.

The figure shows for each anonymized plagiarist, the number of plagiarized articles and some core profile data.

  • 69% of the members with plagiarism created more than 1 plagiarism.
  • 53% of the members with plagiarism created more than 4 plagiarisms.
Rule of thumb:
  • A plagiarism is not a singular mistake.

Plagiarized sources per article

Observation: The table shows how many sources are used in a plagiarized article:

The figure shows how many sources are used in a plagiarized article

  • Most popular is the exact or (nearly exact) copy of one article. This pattern is used by 52% of all plagiarism.
Rule of thumb:
  • Most plagiarisms are straight forward: Plagiarize one source.

Plagiarized sources - Where does the plagiarized content come from?

Observation:

The figure shows wre the plagiarized cntent comes from.

Explanation of the columns:

  • IBM: Official IBM documentation at *.ibm.com.
  • Books: Printed books with copyright.
  • Microsoft: MSDN Library, KB articles or any Microsoft site with content published by MS Corporation.
  • Other: All other internet resources – typically blog posts.
Rule of thumb:
  • Most plagiarized articles use blog posts.

Popular plagiarism techniques

Observation: A popular plagiarism algorithm goes like this:

  1. Google one or more good texts and copy them
  2. Optionally try to "legalize" or obfuscate the plagiarism:
    1. Add some quickly googled links
    2. Add simple figures  – like a windows property window or a command line screenshot
    3. Post a link to the original source at the very end of article or as a comment.

During the analysis, we found some interesting samples and refinements:

  • Strategy Copy and forget

    Copy a complete article as it is and create exactly one revision.

    Example: The article “SharePoint 2013: What is SkyDrive Pro“ was already deleted at TechNet Wiki. However, it was a 1:1 copy including formatting taken from this smart blog post.

  • Strategy “Buzzword MSDN article” 

    Find a promising MS related buzzword like “Visual Studio 2012”, take the complete article content from a Microsoft source, link the article prominently – for example on the Wiki: Development Portal:

    Example: Visual Studio 2012

  • Strategy Copy with footnote

    Copy a complete article and “pinged back from <url>” or “Content taken from <url>” or “From: <url>“ at the very end of the article or in a posted comment.

    Member NN 12 used this technique in at least 22 of 94 articles. (Only a quick check with Google in this case. Because no check the overlap with other plagiarism articles already detected, these cases were not added to my statistics.)

    Example: Migrate RADIUS config...

  • Strategy (Tiny) enhancement

    Copy a complete article and use algorithm steps 2.1 to 2.3.

    Do not mention the author, do not change a personal intro, do only slight text changes and avoid writing a new text paragraph.

    Example: How To View the MAC Address...

  • Strategy The big mashup

    Copy as many sources as you can find into your article.

    A more structured approach: Use a link list to a topic as a starting point and replace all links by subsets of the referenced articles.

    Example: Windows trust migration...

My personal summary:

Plagiarism is a problem which occurs in an order similar to spam. All detected cases are serious, because the complete or nearly complete article is plagiarized. In most cases a plagiarism is not a single mistake. It is even possible to identify plagiarism strategies.

↑ Return to Top


Analysis

Microsoft’s plagiarism rules

These documents handle plagiarism:

  • Wiki: Code of Conduct:

    “Respect content creators. Do not copy content from another author (such as a blog post) unless you have permission. If you do have permission, mention it at the top of the article and include a link to the original source and author.”

  • Wiki: How to Contribute Content to TechNet Wiki:

    (1)“Do not just copy/paste from TechNet/MSDN or other websites, or blogs, or other sources of material that you did not create. If you do this and then save this without any editing, rewriting and improvement by you, you are plagiarizing another's work. If you are working on material that started on a blog or another website, it is important to link to the original material at the top of the article.”

    (2) “Copy/pasting and then saving under your name is plagiarism. We frown on that here. Violations of copyright will be deleted.”

Terms of Use and Wiki: Code of Conduct exclude all cases of plagiarism – copying whole article or parts or them. I.e. zero tolerance for plagiarism.

Anyone who reproduces copyrighted material can be prosecuted. Even altering (“enhancing”) the content doesn’t matter as long as the content is substantially similar to the original.

You may argue that a plagiarist may add valuable enhancements to a plagiarized article. Please note, that that there are always (!) alternatives to plagiarism that are nicer, more trustworthy and respectful towards the original author.

↑ Return to Top


Alternatives to plagiarism

Let’s have a look at some real life scenarios:

You found a fantastic article and want to share it with the TechNet Wiki Community. Don’t copy the article. Instead add a link to an existing Wiki article. Or even better: Add a commented link to an existing Wiki article and explain why it is worth to follow this link.

You want to give an overview of a broad topic. Instead of creating a mashup article which plagiarizes (subsets of) a lot of articles you should instead create a landing page or an article with a commented link list.

You think a non-TechNet Wiki article (a blog post, a MSDN Library article, …) contains a mistake that should be corrected. In case of a blog post you can post a comment. MSDN Library also supports comments and feedback. You can contact the author.

You think you can enhance an article. Write your own article and link to the original you want to enhance instead of copying it. In some case you may detect that your enhancements do no justify another article. In this case you may decide to post a comment instead or even to discard the idea of a new article. This reduces your own effort and that of your readers.

You write an article about a sophisticated topic and need an introduction which supplies the reader with the necessary background information to follow the rest of your article. Instead of copying original sources you should mention the prerequisites a reader of your article should know and link to background articles.

None of these techniques require copying other articles. By the way: This may remind you of the DRY principle – don’t repeat yourself.

What’s about citation? Citation is OK, but keep in mind: Citation is like a spice – it is not the whole meal.

↑ Return to Top


The problems with plagiarism

  • Legal problems

    Plagiarism and “enhanced” plagiarized articles can have legal consequences for the plagiarists and Microsoft:

    (1) According to the Copyright Act anyone can be prosecuted who reproduces copyrighted material. Even altering (“enhancing”) the content doesn’t matter as long as the content is substantially similar to the original.

    (2) Imagine we are only adding source references to plagiarisms: In case of an impeachment it could be argued that Microsoft can easily detect those plagiarisms and has neglected its duties.

    Maybe it is a good idea to contact Microsoft’s legal department (if adding source references to plagiarisms is preferred instead of deletions).

    There is no alternative to a zero tolerance policy for plagiarism.

  • Adding source references (instead of deletion) is the wrong remedy

    Let’s play an intellectual game: Imagine for a moment that TechNet Wiki contains no plagiarized content (e.g. because we have zero tolerance for plagiarism, community members check new articles or members a plagiarism aware).

  • Reputation gets lost

    A community, magazine or organization which accepts and tolerates plagiarism loses its reputation. This behavior impairs also the non-plagiarized top quality content.

  • Points and achievements lose their values

    An article like the sample article can be copied and published within minutes.

  • Plagiarism of MSDN content is stimulated

    Even in parts of the Community Council is a slight tendency that copying/plagiarizing MSDN content is not too bad.

    If a plagiarist is smart, he will focus on MSDN content: The content has a high quality. He can create lots of articles on the fly.

  • We lose our compass

    If the frequency of articles with source references increases, it establishes the wrong impressions: “Plagiarism is OK as long as no one complains.” and “Copying content is an accepted article creation technique”.

If we don't delete plagiarism and add source references instead, we silently accept plagiarism. But even the lax ”How to Contribute” article states: “Copy/pasting and then saving under your name is plagiarism. We frown on that here.”

Taken to extremes, this means “Anything goes” and “Plagiarized articles have a right of continuance”.

If we follow this track, we are lost: What’s right, what’s wrong? Where is the threshold for plagiarism?

  • Plagiarism is done with intent

    No one writes a plagiarism without knowing it.

  • We ignore our roots

    TechNet Wiki was inspired by Wikipedia. Can you image that plagiarism is accepted at Wikipedia? Wikipedia goes even further: Even summarized information should have a proof and a reference.

  • We ignore common standards

    May be not everyone is acquainted with (scientific) citation rules. However, plagiarism has never been an accepted behavior.

  • Top Contributor Award may lose its reputation

    Plagiarism starts affecting the Top Contributor Awards. Examples:

    • June 23th 2013: Longest Article reward:

      “This week's largest document …looks like it's been copied directly from another source …”

    • June 6th 2013: Most Active Contributors & New Articles created

      “sayedissahassan in second place, and leading most new articles for the last month, but most seem just copied in, and may have to be removed :/”

      By the way: 26 of his 41 (now 53) articles were checked, and they were all plagiarisms.

↑ Return to Top


What to do next?

To make a long story short:

  • Plagiarism in TechNet Wiki is a real problem.
  • Simply adding source reference means accepting plagiarism and can lead to severe legal problems.
  • Plagiarism compromises the reputation of TechNet Wiki.
  • Points and achievements lose their values.
  • TechNet Wiki cannot focus on quantity and growth alone. Quality and quality assurance have been neglected.

Suggestions:

  • The Community Council and the community should discuss this topic.
  • If we take The US Copyright Act and common sense as a guideline, there is no alternative to a zero tolerance policy for plagiarism.
  • The community council should add a new focus area: either “Plagiarism” or “Quality Assurance”.
  • The community should be aware of plagiarism.

It is time to act. It is up to you and the Community Council. Accepting plagiarism is no long term alternative.

↑ Return to Top


Epilogue

October 2013: If you are interested what happened after this article was written, you may want to read this article:

↑ Return to Top


See Also

Community rules concerning plagiarism

Search for already detected plagiarisms by tag

Wiki Ninja Blog Posts concerning plagiarism

This section was moved to the article Wiki: Pirated Content Portal.

Plagiarism Charts