Human Distribution Model for Processing Herbarium Specimen Sheets
Simple, Fast & Effective
SilverArchive is not an ocr program like HerbIS but an advanced routing system for specimen sheets to get labels processed quickly and efficiently. This engine is being tested and will be used to process millions of sepciemn sheets. To find more about the server and not the engine please go to www.helpingscience.org.
Here are some of the features of the engine:
- Ability to load in thousands of specimen sheets from separate herbaria
- Ability to chop labels out of specimen sheets using a simple click and drag interface
- Ability to chop fields from labels while tagging them with DwC type, handwritten/typed, and language
- Advanced routing system of fields and verification system
- No one person will receive the same field twice
- A user can select which fields they wish to work on: Names, Dates, Taxa, Numbers, Random, Blobs
- A field will be required X amount of correct variances to be considered vouched for (ie: Carex:3, Carx:1, Earex: 1 Well Carex has as 3 count on that variance so that is assumed the value since 3 random people typed in the same word)
- We cross reference data with IPNI and soon plan on work with Tony Rees Taxamatch for better species matching on possible misspelling. (Good for handwritten taxa)
- Fields that are located in the same country, state or county are routed to a person signed in from that location first for better name recognition.
- Fields that are handwritten are generally sent to older people that wish to process this type of data. (Older people are able to read handwritten images better then the younger generation)
- Fields in different languages are sent to people from that country or have the ability to type in characters from a keyboard in that language.
- Fields are blocked in the queue by higher level fields that wait to be vouched for like Family so we give the user a better lookup list/ picklist on Genus and then onto species or making it easier on the user to tag fields accurately
- Points are rewarded for every field that is vouched for correctly
- FIFO (First In First Out) along with other optimizations to get the labels processed as fast as possible that have been in there the longest.
- Users can set there preference for Herbaria they wish to process or default All
- Vouching system is based on the 3 or x fields sent out to random people that entered the same value. This is set by the client to work with there pricing model.
- We have an incident system in place for when the computer detects issues that need administrative attention from either SilverBiology or the Herbarium
- Incidents are handed depending on the type
- Hacking attempt this includes people trying to beat the system
- Issues arising from aged field or labels that sit too long into the system
- Other issues like too many different answers for a field, no labels on a Specimen Sheet, passed too many times, Image to light / dark, can't read, etc...
Points
- Points are awarded to people for correct values
- They can be redeemed for things in our store much like skymiles
- Clients can track certain users to help track how student workers are performing
- Points can be given to groups to pool their points in a community environment
Groups
- Users can join groups to share or donate points earned
- Some groups are Charitable organizations
- Some groups are for herbaria to process their sheets
- Some groups are for funding other botany projects looking for grant money
- Private / Public groups
Contest System
- Encourage user to be in ladder competitions (ie: Top 50 per week, largest points per group)
- Has an X in a row bonus reward system to encourage accuracy
- Reward system for amount of time spent processing data. (ie. Records for processing 1k, 10k, 50k, 100k, etc...)
Store
- Here people can purchase things with their points
- We use a international paypal system for all transactions
Export
- We have a download system where clients (Herbaria) can download the xml/json/csv of there processed data
- Labels and field images and position can be download for HerbIS or other future uses
- Plan on offering compatibility to Specify 6 and other formats as are needed to load into collection databases
- View digital specimens with data along side or overlayed
Label Chopping
- Test for users to qualify for being capable for tagging specimen labels with the correct fields on the label. (This will be the bottleneck of the system and will require the most educated)
- Auto learning algorithm that will help find similar layouts and help auto complete the mapping of a label with the 5 most accurate matches.
Hacking
- We have a series of test that are performed and in place to keep bad people away from corrupting the data.
Administrative Side of things:
- Monitor the age of all specimen sheets
- Look at the complete history of a specimen sheet and all its fields and labels
- Look at users history
- Handle incidents
- Pull any specimen sheet/ label/ field and process it
- Manage accounting system
Note: All points are allocated from the Price Per Sheet and never exceeds the total amount of points.
Server Location:
SilverArchive will be located in the Louisiana Technology Park in Baton Rouge. NTG is the region’s only Tier IV datacenter which is designed to maintain a 99.999% up time.The NTG hosting infrastructure provides the Fortune 500-class solutions necessary to maintain your servers securely and efficiently 365 x 24 x 7. The robust network platforms are engineered to avoid any single point of failure in connectivity, power, or environmental conditioning. Also, because the data centers have several layers of restricted physical and network access, you can rest assured that only authorized individuals will be allowed to work with or around your server.
Members of the Tech Park receive highly subsidized services from NTG.
- Only Total Service Provider data center in Louisiana, Arkansas or Mississippi
- Industry-leading Sun, Dell, EMC2, Cisco Systems and Oracle database hardware and software
- Continuous (24/7/365) operations to 99.999% reliability
- Multiple fiber optic telecommunications routes using multiple suppliers
- Redundant hardware and software installations to provide "fail-over" protection in the event of equipment failure
- Continuous Real Time Monitoring (24/7/365) from a State-of-the-Art Network Operations Center (NOC)
- Multiple DS3 and OC3 links to major Internet backbone system, expandable to OC96
Screen Shots:

