Log in or Sign up
Coin Talk
Home
Forums
>
Coin Forums
>
Coin Chat
>
Grading is BS
>
Reply to Thread
Message:
<p>[QUOTE="dd27, post: 2624261, member: 80416"]No answer yet from PCGS, but let me explain why a formal inter-rater reliability study is important.</p><p><br /></p><p>First one needs to determine what one means by "accurate". Is accuracy some sort of pre-determined '<a href="http://idioms.thefreedictionary.com/gold+standard" target="_blank" class="externalLink ProxyLink" data-proxy-href="http://idioms.thefreedictionary.com/gold+standard" rel="nofollow">gold standard</a>'?<font size="3"><span style="color: #00b359">[1]</span></font><span style="color: #000000"> Or is the "percent agreement" between independent raters?</span></p><p><span style="color: #000000"><br /></span></p><p>Of course, there is no definitive <i>gold standard</i> when it comes to grading coins. As others have pointed out, it is subjective to some extent, even with guidelines or standards; it is more art than science.</p><p><br /></p><p>Along those lines, one should specify which guidelines one uses, e.g., <a href="http://www.us-coin-values-advisor.com/grading-coins.html#70Point" target="_blank" class="externalLink ProxyLink" data-proxy-href="http://www.us-coin-values-advisor.com/grading-coins.html#70Point" rel="nofollow">ANA Grading Standards</a> <font size="3"><span style="color: #00b359">[2]</span></font>, or the <a href="http://www.pcgs.com/grades/" target="_blank" class="externalLink ProxyLink" data-proxy-href="http://www.pcgs.com/grades/" rel="nofollow">PCGS grading standards</a>.</p><p><br /></p><p>One could develop an acceptable <i>gold standard</i>, by, for example, soliciting nominations and votes from numismatic organizations, coin clubs, coin dealers, the TPGs, and coin collectors, regarding the best coin graders (who are not affiliated with any of the major TPGs). A group of 15 such experts could independently grade coins in groups of 3, i.e., three graders per coin, and then either average the grades, or the three graders could discuss their grades and agree on a consensus grade. This grade would then become the gold standard for that particular coin, which would then be submitted to each of the TPGs under study.</p><p><br /></p><p>Such a research project would then require a statistical analysis of accuracy by comparing TPGs' grades <i>vis-a-vis</i> the gold standard grade.</p><p><br /></p><p>Or one could evaluate the degree of congruence (agreement) between two (or more) independent raters.</p><p><br /></p><p>Either way determining how to measure accuracy is not easy.</p><p><br /></p><p>For example, one would first have to determine the appropriate type of statistical analysis, interrater reliability (IRR) or interrater agreement (IRA).</p><p><br /></p><p>As LeBreton & Senter (2008, pp. 816-817) explain:</p><p><br /></p><blockquote><p>"IRR [interrater reliability] refers to the relative consistency in ratings provided by multiple judges of multiple targets. Estimates of IRR are used to address whether judges rank order targets in a manner that is relatively consistent with other judges. The concern here is not with the equivalence of scores but rather with the equivalence of relative rankings. In contrast, IRA [interrater agreement] refers to the absolute consensus in scores furnished by multiple judges for one or more targets. Estimates of IRA are used to address whether scores furnished by judges are interchangeable or equivalent in terms of their absolute value.The concepts of IRR and IRA both address questions concerning whether or not ratings furnished by one judge are ‘‘similar’’ to ratings furnished by one or more other judges.</p><p><br /></p><p>These concepts simply differ in how they go about defining inter-rater similarity. Agreement emphasizes the interchangeability or the absolute consensus between judges and is typically indexed via some estimate of within-group rating dispersion. Reliability emphasizes the relative consistency or the rank order similarity between judges and is typically indexed via some form of a correlation coefficient. Both IRR and IRA are perfectly reasonable approaches to estimating rater similarity; however, they are designed to answer different research questions. Consequently, researchers need to make sure their estimates match their research questions." <font size="3"><span style="color: #00b359">[3]</span></font></p><p><font size="3"><span style="color: #00b359"><br /></span></font></p></blockquote><p><font size="4"><span style="color: #000000">Or, as Gisev, Bell, & Chen (2013, p. 330) note:</span></font></p><p><br /></p><blockquote><p>"Interrater agreement indices assess the extent to which the responses of 2 or more independent raters are concordant. Interrater reliability indices assess the extent to which raters consistently distinguish between different responses. A number of indices exist, and some common examples include Kappa, the Kendall coefficient of concordance, Bland-Altman plots, and the intraclass correlation coefficient. Guidance on the selection of an appropriate index is provided. In conclusion, selection of an appropriate index to evaluate interrater agreement or interrater reliability is dependent on a number of factors including the context in which the study is being undertaken, the type of variable under consideration, and the number of raters making assessments." [4]</p></blockquote><p><br /></p><p>To complicate matters, in some cases even if one simply wants to calculate the extent of agreement between independent raters, one might nonetheless use an IRR analysis because the 70-point grading scale would be considered analogous to a <i>continuous</i> variable, even though it is technically an <i>ordinal</i> variable. (See <i>Categorical and Continuous Variables</i>, near the bottom of the page, at <a href="https://statistics.laerd.com/statistical-guides/types-of-variable.php" target="_blank" class="externalLink ProxyLink" data-proxy-href="https://statistics.laerd.com/statistical-guides/types-of-variable.php" rel="nofollow">Types of Variables</a>.)</p><p><br /></p><p>My point in bringing in this academic stuff is to highlight the complexities involved in establishing a reliable accuracy statistic. Even if a company wants to conduct an internal evaluation of grader consistency and accuracy, it requires careful planning, knowledge of research methodology (or <i>program evaluation</i> methodology, which is similar), and statistical analysis.</p><p><br /></p><p>The best guide to the statistical analysis is the <a href="https://www.amazon.com/Handbook-Inter-Rater-Reliability-Definitive-Measuring/dp/0970806280/" target="_blank" class="externalLink ProxyLink" data-proxy-href="https://www.amazon.com/Handbook-Inter-Rater-Reliability-Definitive-Measuring/dp/0970806280/" rel="nofollow">Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters (4th Ed.)</a>. <font size="3"><span style="color: #00b359">[5]</span></font></p><p><br /></p><p><u><font size="3"><span style="color: #00b359">Footnotes</span></font></u></p><p><font size="3"><span style="color: #00b359">1. </span><span style="color: #000000"><i>gold standard - </i></span>"...a well-established and widely accepted model or paradigm of excellence by which similar things are judged or measured." <a href="http://idioms.thefreedictionary.com/gold+standard" target="_blank" class="externalLink ProxyLink" data-proxy-href="http://idioms.thefreedictionary.com/gold+standard" rel="nofollow">Farlex Dictionary of Idioms</a></font></p><p><br /></p><p><font size="3"><span style="color: #00b359">2.</span> Bressett, K. E. & Bowers, Q. D. (2006). <i>The official American Numismatic Association grading standards for United States coins</i> (6th Ed.). Atlanta, GA: Whitman Publishing. [ISBN-13: 978-0794819934]</font></p><p><font size="3"><br /></font></p><p><font size="3"><span style="color: #00b359">3.</span> LeBreton, J. M., & Senter, J. L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. <i>Organizational Research Methods, 11</i>(4), 815-852.</font></p><p><font size="3"><br /></font></p><p><font size="3"><span style="color: #00b359">4. </span><span style="color: #000000">Gisev, N., Bell, J. S., & Chen, T. F. (2013). Interrater agreement and interrater reliability: key concepts, approaches, and applications. <i>Research in Social and Administrative Pharmacy, 9</i>(3), 330-338.</span></font></p><p><font size="3"><span style="color: #000000"><br /></span></font></p><p><font size="3"><span style="color: #00b359">5.</span><span style="color: #000000"> Gwet, K. L. (2014). <i>Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters (4th Ed.)</i>. Gaithersburg, MD: Advanced Analytics [ISBN: 978-0970806284]</span></font>[/QUOTE]</p><p><br /></p>
[QUOTE="dd27, post: 2624261, member: 80416"]No answer yet from PCGS, but let me explain why a formal inter-rater reliability study is important. First one needs to determine what one means by "accurate". Is accuracy some sort of pre-determined '[URL='http://idioms.thefreedictionary.com/gold+standard']gold standard[/URL]'?[SIZE=3][COLOR=#00b359][1][/COLOR][/SIZE][COLOR=#000000] Or is the "percent agreement" between independent raters? [/COLOR] Of course, there is no definitive [I]gold standard[/I] when it comes to grading coins. As others have pointed out, it is subjective to some extent, even with guidelines or standards; it is more art than science. Along those lines, one should specify which guidelines one uses, e.g., [URL='http://www.us-coin-values-advisor.com/grading-coins.html#70Point']ANA Grading Standards[/URL] [SIZE=3][COLOR=#00b359][2][/COLOR][/SIZE], or the [URL='http://www.pcgs.com/grades/']PCGS grading standards[/URL]. One could develop an acceptable [I]gold standard[/I], by, for example, soliciting nominations and votes from numismatic organizations, coin clubs, coin dealers, the TPGs, and coin collectors, regarding the best coin graders (who are not affiliated with any of the major TPGs). A group of 15 such experts could independently grade coins in groups of 3, i.e., three graders per coin, and then either average the grades, or the three graders could discuss their grades and agree on a consensus grade. This grade would then become the gold standard for that particular coin, which would then be submitted to each of the TPGs under study. Such a research project would then require a statistical analysis of accuracy by comparing TPGs' grades [I]vis-a-vis[/I] the gold standard grade. Or one could evaluate the degree of congruence (agreement) between two (or more) independent raters. Either way determining how to measure accuracy is not easy. For example, one would first have to determine the appropriate type of statistical analysis, interrater reliability (IRR) or interrater agreement (IRA). As LeBreton & Senter (2008, pp. 816-817) explain: [INDENT]"IRR [interrater reliability] refers to the relative consistency in ratings provided by multiple judges of multiple targets. Estimates of IRR are used to address whether judges rank order targets in a manner that is relatively consistent with other judges. The concern here is not with the equivalence of scores but rather with the equivalence of relative rankings. In contrast, IRA [interrater agreement] refers to the absolute consensus in scores furnished by multiple judges for one or more targets. Estimates of IRA are used to address whether scores furnished by judges are interchangeable or equivalent in terms of their absolute value.The concepts of IRR and IRA both address questions concerning whether or not ratings furnished by one judge are ‘‘similar’’ to ratings furnished by one or more other judges. These concepts simply differ in how they go about defining inter-rater similarity. Agreement emphasizes the interchangeability or the absolute consensus between judges and is typically indexed via some estimate of within-group rating dispersion. Reliability emphasizes the relative consistency or the rank order similarity between judges and is typically indexed via some form of a correlation coefficient. Both IRR and IRA are perfectly reasonable approaches to estimating rater similarity; however, they are designed to answer different research questions. Consequently, researchers need to make sure their estimates match their research questions." [SIZE=3][COLOR=#00b359][3] [/COLOR][/SIZE][/INDENT] [SIZE=4][COLOR=#000000]Or, as Gisev, Bell, & Chen (2013, p. 330) note:[/COLOR][/SIZE] [INDENT]"Interrater agreement indices assess the extent to which the responses of 2 or more independent raters are concordant. Interrater reliability indices assess the extent to which raters consistently distinguish between different responses. A number of indices exist, and some common examples include Kappa, the Kendall coefficient of concordance, Bland-Altman plots, and the intraclass correlation coefficient. Guidance on the selection of an appropriate index is provided. In conclusion, selection of an appropriate index to evaluate interrater agreement or interrater reliability is dependent on a number of factors including the context in which the study is being undertaken, the type of variable under consideration, and the number of raters making assessments." [4][/INDENT] To complicate matters, in some cases even if one simply wants to calculate the extent of agreement between independent raters, one might nonetheless use an IRR analysis because the 70-point grading scale would be considered analogous to a [I]continuous[/I] variable, even though it is technically an [I]ordinal[/I] variable. (See [I]Categorical and Continuous Variables[/I], near the bottom of the page, at [URL='https://statistics.laerd.com/statistical-guides/types-of-variable.php']Types of Variables[/URL].) My point in bringing in this academic stuff is to highlight the complexities involved in establishing a reliable accuracy statistic. Even if a company wants to conduct an internal evaluation of grader consistency and accuracy, it requires careful planning, knowledge of research methodology (or [I]program evaluation[/I] methodology, which is similar), and statistical analysis. The best guide to the statistical analysis is the [URL='https://www.amazon.com/Handbook-Inter-Rater-Reliability-Definitive-Measuring/dp/0970806280/']Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters (4th Ed.)[/URL]. [SIZE=3][COLOR=#00b359][5][/COLOR][/SIZE] [U][SIZE=3][COLOR=#00b359]Footnotes[/COLOR][/SIZE][/U] [SIZE=3][COLOR=#00b359]1. [/COLOR][COLOR=#000000][I]gold standard - [/I][/COLOR]"...a well-established and widely accepted model or paradigm of excellence by which similar things are judged or measured." [URL='http://idioms.thefreedictionary.com/gold+standard']Farlex Dictionary of Idioms[/URL][/SIZE] [SIZE=3][COLOR=#00b359]2.[/COLOR] Bressett, K. E. & Bowers, Q. D. (2006). [I]The official American Numismatic Association grading standards for United States coins[/I] (6th Ed.). Atlanta, GA: Whitman Publishing. [ISBN-13: 978-0794819934] [COLOR=#00b359]3.[/COLOR] LeBreton, J. M., & Senter, J. L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. [I]Organizational Research Methods, 11[/I](4), 815-852. [COLOR=#00b359]4. [/COLOR][COLOR=#000000]Gisev, N., Bell, J. S., & Chen, T. F. (2013). Interrater agreement and interrater reliability: key concepts, approaches, and applications. [I]Research in Social and Administrative Pharmacy, 9[/I](3), 330-338. [/COLOR] [COLOR=#00b359]5.[/COLOR][COLOR=#000000] Gwet, K. L. (2014). [I]Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters (4th Ed.)[/I]. Gaithersburg, MD: Advanced Analytics [ISBN: 978-0970806284][/COLOR][/SIZE][/QUOTE]
Your name or email address:
Do you already have an account?
No, create an account now.
Yes, my password is:
Forgot your password?
Stay logged in
Coin Talk
Home
Forums
>
Coin Forums
>
Coin Chat
>
Grading is BS
>
Home
Home
Quick Links
Search Forums
Recent Activity
Recent Posts
Forums
Forums
Quick Links
Search Forums
Recent Posts
Competitions
Competitions
Quick Links
Competition Index
Rules, Terms & Conditions
Gallery
Gallery
Quick Links
Search Media
New Media
Showcase
Showcase
Quick Links
Search Items
Most Active Members
New Items
Directory
Directory
Quick Links
Directory Home
New Listings
Members
Members
Quick Links
Notable Members
Current Visitors
Recent Activity
New Profile Posts
Sponsors
Menu
Search
Search titles only
Posted by Member:
Separate names with a comma.
Newer Than:
Search this thread only
Search this forum only
Display results as threads
Useful Searches
Recent Posts
More...