Looking for synonyms ...

Discussion in 'Ancient Coins' started by Roerbakmix, Feb 17, 2020.

  1. Roerbakmix

    Roerbakmix Well-Known Member

    So some of you probably know by now that I've been trying to explain, or understand, the hammered prices of coins sold at auctions. As a data-scientist (buzz-word; I'm a clinical epidemiologist), I've been coding and tweaking data on various coin type (aurei, several denari and several medieval coins). This involves a bit of automated data-mining.

    In this post, I'm focusing on the data-mining, I've coded an algorithm that 'looks' in the data for the following variables:
    1. The estimated price [variable name: estimate]
    2. The hammered price [hammer]
    3. The assigned grade [grade]
    4. The auction house [auction_house]
    5. Is there a toning (e.g. 'cabinet toning) [toning]
    6. Any super-enthusiastic notes (e.g. "superb") [positive_remarks]
    7. Is the coin rare (e.g. "extremely rare") [rarity]
    8. Is the coin damaged (e.g. broken) [damaged]
    9. Is the coin slabbed or graded (e.g. NGC) [slabbed]
    10. Are there problems related to minting (e.g. rusty die) [minting_problems]
    11. Are there remarkable minting 'benefits' ("well struck") [minting_pros]
    12. Is the coin die-matched [die_match]
    13. Is there a provenance (e.g. "from the x-collection") [provenance]
    14. Is this coin a plate coin [plate_coin]
    So, so far, 14 variables. Now I'm looking for your input. Except for the estimate, hammered price and auction house, these variables are based on different strings of text. For example, the variable "toning" is based on the following strings:
    • bronzepatina
    • cabinet
    • patina
    • tone
    • toned
    • toning
    • tönung
    ...the variable "positive_remarks" on the following:
    • attractive
    • wonderful
    • interesting
    • important type
    • fascinating
    • attractively
    • eye appeal
    • splendid
    • Strong portrait
    • desirable
    • pleasing
    • superb
    • premium quality
    ... and the variable "rarity" is based on these:
    • rare
    • rarest
    • scarce
    • unrecorded
    • selten
    • unpublished
    • unique
    • key coin
    • mule
    • uncommon
    • finest known
    • Rarity
    • seltene
    • recorded
    • Unrecorded
    So, a description containing the sentence "Good VF, toned. Attractive style. Rare." will result in:
    • Toning = true
    • Positive_remarks = true
    • Rarity = true
    You might agree (or not) that these variables say something about the value of a coin (you can actually test this). Now what I'm looking for is the following: I need some input on these variables.

    So, here they are:
    Damage: (mild damage)
    • prüfhieb
    • test cuts
    • test cutss
    • edge cut
    • edge cuts
    • banker
    • banker-mark
    • banker mark
    • banker\'s mark
    • bankers' marks
    • countermark
    • damage
    • damaged
    • metal flaw
    • graffiti
    • cleaning marks
    • weakness
    • scrapes
    • kratzer
    • corrosion
    • korrosion
    • porous
    • grainy
    • rough
    • flan flaw
    • hairlines
    • damaged
    • damaged
    • scraped
    • flan-crack
    • bent
    • korrosionsspuren
    • schrötlingsrisse
    • roughness
    • porosity
    • deposits
    • flaw
    • scratches
    • scratch
    • chop-mark
    • circulated
    • smoothed
    • cleaned
    • hairline
    • Schrötlingsriss
    • chipped
    • creased
    • korrodiert
    • corroded
    • Schrötlingsriß
    • chip
    • crimped
    • overcleaned
    Damage: (severe damage)
    • broken
    • broke
    • crack
    • repaired
    • repair
    • crystallized
    • hole
    • holed
    • bronzepest
    • Altered surfaces
    • tooled
    • Artificial toning
    • Retoned
    • pierced
    • Durchbrüche
    • fragment
    • fragments
    • GEBROCHENES
    • GEKLEBTES
    • Abbruch
    Minting_problems
    • unregelmäßiger schrötling
    • worn reverse die
    • worn obverse die
    • stempelausbruch
    • doppelschl
    • irregular flan
    • prägeschwächen
    • worn die
    • stempelfehler
    • wavy flan
    • schrötlingsfehler
    • knapper schrötling
    • off-center
    • off center
    • flat strike
    • weak strike
    • dezentriert
    • belagreste
    • die crack
    • die rust
    • die clash
    • low relief
    • die wear
    • flow lines
    • gewellt
    • rusted
    • blundered
    Minting_pros
    • large flan
    • wide flan
    • broad flan
    • breiter schrötling
    • luster
    • lustrous
    • iridescent
    • stempelglanz
    • hohes relief
    • well struck
    • good strike
    • well-centered
    • well centered
    • perfect strike
    • as struck
    • full strike
    • Lustre
    Provenance
    • provenance
    • collection
    • sammlung
    • erworben
    • from the inventory
    • from a private
    • provenienzen
    • Pedigree
    • purchased
    Die_match
    • same obv. die
    • same rev. die
    • same dies
    • stempelgleich
    • die match
    • same die
    Plate_coin
    • plate coin
    • this coin published
    • this coin illustrated
    • this coin cited
    • this coin, illustrated
    • this coin
    Slabbed
    • ngc
    • slabbed
    • certified
    • encapsulated
    • Slabbed
    • PNG
    • PCGS
    • PNG
    • holder
     
    Broucheion, TuckHard and Pellinore like this.
  2. Avatar

    Guest User Guest



    to hide this ad.
  3. DonnaML

    DonnaML Well-Known Member

    I'm not at all qualified to comment on these. But I'm curious about one minor point: if merely being cleaned results in a classification of an ancient coin as being "damaged,' than isn't just about every ancient coin in the world that's for sale or has ever been sold "damaged" unless it just came out of the ground? You're not talking about U.S. coins here, after all. Or are you assuming that if a dealer even bothers to mention that a coin has been cleaned, that implies that the coin was damaged in the process?
     
    dougsmit likes this.
  4. fretboard

    fretboard Defender of Old Coinage!

    QAS=quite a system! :shame:
     
  5. Endeavor

    Endeavor Well-Known Member

    Which auction house or online market are you mining?
     
  6. Ed Snible

    Ed Snible Well-Known Member

    It is interesting to find you here talking about your automated tools to scrape auction descriptions the same week that @Suarez is posting about the difficulties he is having with his artisanal coin scraping software.

    I went to my collection database and the second coin had the condition notes "minor chipping to the edges - typical". "Chipping" isn't one of your words. Would could all pitch in and find words but perhaps something like WordNet is better to find synonyms.

    Consider publishing your tool as open source on github so that people working on tools like CoinProject and Coryssa can use your tools to get more machine-readable data for analysis.

    I see you have a few foreign words but not many. Look at http://www.muenzen-hardelt.de/dic/diction1.html for more ideas.
     
    Broucheion likes this.
  7. dougsmit

    dougsmit Member Supporter

    Since half of your words are synonyms and many will have several that apply at the same time and to varying degrees, how will you handle searches? Are German and English the only relevant languages? I feel you are trying to digitize an analog subject. That is a hard task.
     
    DonnaML likes this.
  8. Roerbakmix

    Roerbakmix Well-Known Member

    Thanks all for the comments.

    This is of course correct. However, it's my understanding (and limited experience) that the comment "cleaned" or "overcleaned" in ancient coinage, and especially in the higher-end, decreases the value.

    I use publicly available data from sixbid. At this moment, I'm not focusing on auction houses, although this certainly is a possibility.

    Certainly interesting, and I've looked at coryssa before. Right now, I'm using data from sixbid, because it presents the data in the same, standardized manner which makes text-mining easier.

    I am not sure I understand your question. But to give an example. Suppose I'm constructing a dataset on Hadrian denari. This one will be included as well. The variables will be as following:
    upload_2020-2-18_9-40-8.png
    1. estimate 1711
    2. hammer 11976
    3. grade extremely fine
    4. auction_house Leu Winterthur
    5. toning TRUE ("toned")
    6. positive_remarks TRUE ("beautifully"; "wonderful"; "exquisite")
    7. rarity TRUE ("very rare")
    8. damaged FALSE
    9. slabbed FALSE
    10. minting_problems FALSE
    11. minting_pros FALSE
    12. die_match FALSE
    13. provenance TRUE ("collections")
    14. plate_coin FALSE
    So, synonyms are not an issue, as long as the synonyms are unique to the variable.

    Also, I should probably mention that I have no real intentions to use the algorithm apart for personal use (e.g. identifying an auction house for a specific coin)
     
  9. pprp

    pprp Well-Known Member

    You are basically missing contextual information. Provenance is FALSE for the example you gave or why should anyone care for a 2013 provenance. I am not commenting on the former collector but there were some negative posts about his auction house and tactics earlier in CT.
     
  10. Ed Snible

    Ed Snible Well-Known Member

    I wasn't suggesting you look at Coryssa for input. I understand what you are trying to do. (I wrote a processor for numisbids.com, which is similar, to generate output for CoinProject.com.)

    I am pointing out that there are a lot of input formats, and a lot of data to gather. I am sure the folks here will help you find synonyms. It would be great if your output can be encoded in the nomisma.org format promoted by the ANS. It would be good for numismatics as a field of study if the data science folks were on the same page.

    http://nomisma.org/ontology has a proposal for the fields used to describe a coin including wear, corrosion, countermark, secondaryTreatment, etc. They have been working on it a long time. It would be a useful result if you find there system great or full of holes for what you are trying to do.
     
    Broucheion likes this.
  11. almostgem

    almostgem Junior Member

    One additional in the damage category - at least for trade dollars would be "chop marked" ?
    Of course if you're only interested in ancients, this would have no bearing I would guess.
     
Draft saved Draft deleted

Share This Page