Revisiting Numi: Testing The Latest GPT-4 Update

Discussion in 'US Coins Forum' started by Dansco_Dude, Apr 21, 2024.

  1. Dansco_Dude

    Dansco_Dude Well-Known Member

    Hey everyone. As you may remember from my previous posts, in late 2023 I developed Numi, an AI-powered chatbot that leverages the advanced capabilities of OpenAI's GPT-4 vision model to assist coin collectors in identifying and grading their coins. It's been fascinating seeing the exponential growth of Artificial Intelligence, so I created Numi to test AI's abilities to tackle one of the biggest barriers to new collectors. Throughout Numi's development, I became more and more convinced that AI is going to fundamentally change the future of the hobby.

    Testing Numi With OpenAI's Latest GPT-4 Update

    In December 2023, I paused development on Numi as I had maxed out on the AI's capabilities and the results were not accurate enough to justify further development. Following OpenAI's recent April 2024 update to their GPT-4 model, which powers Numi's AI capabilities, I conducted another series of tests on Numi's grading accuracy.

    I then ran statistical analyses to assess the impact on Numi's performance and compared its grading accuracy between the December 2023 and April 2024 test results.

    [​IMG]

    Determining the Optimal # of Photos for Accurate Grading

    A key aspect of my analysis focused on identifying the optimal number of coin photos users should upload to achieve the most accurate grading results. In December 2023, my tests indicated that uploading 10 photos yielded the best accuracy across all coin grades. This aligned with my hypothesis that more data = better. However, after the GPT-4 update in April 2024, that number had changed, with just 4 photos now providing the most precise grading outcomes.

    [​IMG]

    Just How Much Did Numi Improve?

    To measure Numi's accuracy and any improvements, I calculated the Mean Absolute Deviation (MAD) – a metric that represents the average deviation between Numi's predicted grades and the actual, expert-assigned grades. In December 2023, Numi's MAD was 5.39, indicating that, on average, its predictions deviated by approximately 5 grade points from the actual coin's grade. By April 2024, following the GPT-4 update, Numi's MAD score decreased to 3.64, representing a substantial 32.47% increase in overall accuracy.

    I suspected that Numi would be more accurate given the updates, but I was not expecting this much of a change. While the GPT-4 vision model still struggles immensely with medium-graded coins [Around XF-40], Numi performed exceptionally well for very low and very high graded coins. With the biggest improvements seen for very low-graded coins.

    The Future of AI in Numismatics

    After seeing these results, I am even more convinced that Artificial Intelligence will revolutionize the field of coin collecting. As models like GPT-4 continue to improve, AI tools will become increasingly valuable for collectors seeking to expand their knowledge and make informed decisions about their collections. While Numi itself will most likely not end up being the go-to tool for collectors in the future, it serves as powerful evidence of where the hobby is heading.
     
    Last edited: Apr 21, 2024
    JoshuaP, Spark1951 and Mr.Q like this.
  2. Avatar

    Guest User Guest



    to hide this ad.
  3. Mr.Q

    Mr.Q Well-Known Member

    Enjoyed the read, good luck.
     
    Dansco_Dude likes this.
  4. ldhair

    ldhair Clean Supporter

    It will never be of value to the hobby. It will not be trusted.
     
    Burton Strauss III likes this.
  5. cwart

    cwart Senior Member

    I'm not in total disagreement with the statement, but can't the same be said for the TPGs? It only takes enough people trusting it for it to gain a foothold.
     
    -jeffB and Burton Strauss III like this.
  6. samclemens3991

    samclemens3991 Well-Known Member

    trust is a funny thing. I wonder how this might come into fashion grading 69/70 coins. I have spent the last 6 weeks studying slabbed Silver 69/70 silver State Quarters. I have yet to find a single coin I would call 70. it is obvious I am being to strict but what does this say about my confidence in the current graders. maybe AI will be more consistent. james
     
  7. Dansco_Dude

    Dansco_Dude Well-Known Member

    An interesting use case would be with the TPGs using AI to quickly process ASE monster boxes. All of them are going to be either PR 69/70
     
    samclemens3991 and Cheech9712 like this.
Draft saved Draft deleted

Share This Page