In July 2013 the article Plagiarism at TechNet Wiki - First Empirical Data Analysis presented a first empirical analysis, gave insights to mechanisms behind plagiarism and proposed first steps to fix the problem. Three months later TechNet Wiki is on the right track: The Community Council extended the focus area "Spam & Article Deletion" to "Spam, Plagiarism & Article Deletion" and is working on a strong framework.

Improvements should be measured. This article gives facts and figures about plagiarism at TechNet Wiki for September 2013.



↑ Return to Top


Approach

Data

Every community member can report pirated content - see Wiki: How to Report a TechNet Wiki Page for details. Pirated content is tagged with Pirated Content and Candidate for Deletion. The Community Council checks the tagged articles on a short notice and deletes them if necessary. All pirated articles which are tagged but not yet deleted can be found using this tag search link:  Pirated Content.
  • From June 2013 until the end of September 2013 I collected information about pirated articles which I detected myself (this was the typical case) or found via tag search.
  • The focus was on articles written in English.
  • TechNet Wiki currently contains 13,667 articles. At least 7.165 are written in English (and tagged with en-US). It's not possible to check all of them.
    Noami N focused on new articles. I used a combination of the following search strategies:
    • Selecting articles by random.
    • Inspecting articles of members who already published pirated content.
    • Checking articles which have only a few revisions.

Limitations 

This approach works fine - but it's not perfect:
  • Pirated content which was detected and deleted before June 2013 is not covered.
  • In single cases I may have missed pirated content detected by other community members.
  • Only an insignificant amount of Non-English articles was checked so far.
  • Only a subset of approximately 1,300 English articles was checked so far. 
I.e. all numbers presented here are lower bounds, and the actual numbers are probably higher.

↑ Return to Top


Charts

Pirated Contents per Month


Facts: Figure 1a shows how many articles with pirated content were created (and eventually detected) per month. Let's have a look at the time axis first: TechNet Wiki officially started on April 14th 2010. The first articles were posted in February 2010. Figure 1b is a zoomed figure which shows the numbers for the last twelve months only.

  
Figure 1a: All data   Figure 1b: Last year

The figures above show absolute numbers. TechNet Wiki is a growing community. Therefore it makes sense to look on relative values too. Figures 2a and 2b show how many articles (in any language) were created per month. Once again the second figure is a zoomed version which shows the numbers for the last twelve months only.

 
Figure 2a: All data   Figure 2b: Last year

Figures 3a and 3b show the percentage of pirated articles in relation to the total number of articles created per month.  I.e. Figure 3 is the "quotient" of  figures 1 and 2. This approach gives only a lower bound of the percentage because we use the number of all articles as a denominator (and not the smaller number of all English articles).

  
Figure 3a: All data   Figure 3b: Last year

Interpretation:
  • Until about March 2012 pirated content was no real problem: The amount of new pirated articles was less than 2% and in absolute numbers less than 10 (detected) articles per month.
  • In the last 1.5 years pirated content got a real problem: Up to 12% of the new articles of a month were pirated.
  • The rate of production is not constant over time. Three main peaks lasting 1 to 3 month can be identified.
  • In the last 2 months the percentage is below 2%. We will have to watch carefully if this is a trend caused by the communities plagiarism awareness (or if it is just an indicator that we have not checked all articles).

Active Pirates per Month


Facts: Figures 4a and 4b show the number of (detected) active pirates per month. "Active pirate" means that a member has published at least one article with pirated content within a month. 

  
Figure 4a: All data   Figure 4b: Last year

Interpretation:
  • Until March 2012 the number of active pirates was no real problem: At most 1 (detected) pirate is active per month.
  • Since April 2012 4 active pirates per month are a typical average.
  • Up to 7 (detected) pirates are active per month.
  • The absolute number of active pirates per month may seem small in comparison to the total number of contributors of TechNet Wiki. However,  they have a significant effect on the statistics. See also the next section.

Average Productivity of Pirates


Facts: Figures 5a and 5b show the average number of pirated articles per active pirate per month. I.e. the figures show the "quotient" of figures 1 and 4.

  
Figure 5a: All data   Figure 5b: Last year

Interpretation:
  • Plagiarism is not a singular mistake. The "average plagiarist" typically produces between 5 and 25 plagiarisms per month. If we look at individual pirates it's even worse. Pirated content can be created within minutes. It's not unusual that a single plagiarist creates 20 plagiarized articles per day.
  • Within the last 1,5 years there is tendency to excessive plagiarism: The productivity peaks of pirates are growing. Until March 2012 the average productivity of an active pirate was below 5 articles per month.  Since then new peaks were reached: 10, 17 and 25.

What Happened to Detected Pirated Content?


Facts: Pirated content can be in one of these states:
  • to be checked by the Community Council
    The article was tagged as pirated content but was not yet checked by the Community Council.
  • fixed
    The author removed plagiarized sections and fixed his article.
  • admindeleted, adminlocked
    The Community Council uses the tags for admindeleted, adminlocked in case of a verified plagiarism. This is typically an intermediate state and the article will be deleted very soon afterwards. However, in some seldom cases (and for reasons I don't know) articles can stay in this state for a longer duration.
  • deleted
    The article is deleted and no longer accessible in TechNet Wiki.
485 plagiarisms were detected since June 2013. Figure 6 shows their states on September 30th 2013.

Figure 6: States of detected articles
Interpretation:
Let's have a look at the distribution of the article states:
  • to be checked by the Community Council (37%)
    The high amount of articles that still have to be checked by the Community Council is not typical. In general the Community Council checks articles on a short notice. However, a peak can occur if a larger amount of pirated content is detected in a short time, or if a vacation delays the check of articles.
  • fixed (< 1 %)
    In most cases pirated articles were completely stolen. However, in some rare cases articles were fixed.
  • admindeleted, adminlocked (3 %)
    This is typically an intermediary state and can be neglected.
  • deleted (59%)
    Rule of thumb: Nearly all detected articles were actually pirated and finally deleted.

↑ Return to Top


Summary

The charts show some remarkable results:
  • In some months up to 12% of all created articles were pirated.
  • There was (and still is) a tendency to excessive plagiarism: The average productivity of active plagiarists is increasing.
The charts also confirmed major findings of Plagiarism at TechNet Wiki - First Empirical Data Analysis from July 2013:
  • Plagiarism is not a singular mistake, and it's not done unintentionally.
  • Plagiarism is a problem which occurs in an order similar to spam:
    The first analysis mentioned 66 cases. 485 cases were detected until end of September 2013, and only a subset of all English articles was checked so far.
At least one of the old findings has to be relativized:
  • Nearly all new detected cases were stolen from one source (or at most 2 sources). Mashups from multiple sources were less frequent then initially expected.
One major point we have to work on is the awareness of the community. Only a small group of members reported pirated contents. I would like to thank (in alphabetical order)
As always it's hard to predict the future. I think the Community Council has the right mind-set and rules to cope with plagiarism at TechNet Wiki. However, if we want to succeed, your support is needed:
  1. You can easily do a plagiarism check using Google within seconds.
  2. In case you are not sure if an article is OK you can use Types of Articles Not Appropriate for TechNet Wiki as a guideline.
  3. Follow the instructions of Wiki: How to Report a TechNet Wiki Page.
The  plagiarism charts should be updated within the next one to three months. With additional results we could check
  • if the framework and rules of the Community Council work,
  • if we reach some kind of consolidation: The number of detected old plagiarisms is only slowly growing, and we can focus on checking new articles (which is much less work).

↑ Return to Top


See Also


↑ Return to Top