Need advice about which tool to choose?Ask the StackShare community!

Basilica

0
8
+ 1
0
Tesseract OCR

95
283
+ 1
7
Add tool

Basilica vs Tesseract OCR: What are the differences?

Developers describe Basilica as "Word2Vec For Anything". An API that embeds high-dimensional data like images and text. You send an image, and you back a vector of floats. On the other hand, Tesseract OCR is detailed as "Tesseract Open Source OCR Engine". Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.

Basilica and Tesseract OCR can be categorized as "Image Analysis API" tools.

Tesseract OCR is an open source tool with 28.2K GitHub stars and 5.38K GitHub forks. Here's a link to Tesseract OCR's open source repository on GitHub.

Decisions about Basilica and Tesseract OCR
Vladyslav Holubiev
Sr. Directory of Technology at Shelf · | 1 upvote · 50.1K views

AWS Rekognition has an OCR feature but can recognize only up to 50 words per image, which is a deal-breaker for us. (see my tweet).

Also, we discovered fantastic speed and quality improvements in the 4.x versions of Tesseract. Meanwhile, the quality of AWS Rekognition's OCR remains to be mediocre in comparison.

We run Tesseract serverlessly in AWS Lambda via aws-lambda-tesseract library that we made open-source.

See more
Manage your open source components, licenses, and vulnerabilities
Learn More
Pros of Basilica
Pros of Tesseract OCR
    Be the first to leave a pro
    • 5
      Building training set is easy
    • 2
      Very lightweight library

    Sign up to add or upvote prosMake informed product decisions

    Cons of Basilica
    Cons of Tesseract OCR
      Be the first to leave a con
      • 1
        Works best with white background and black text

      Sign up to add or upvote consMake informed product decisions

      - No public GitHub repository available -

      What is Basilica?

      An API that embeds high-dimensional data like images and text. You send an image, and you back a vector of floats.

      What is Tesseract OCR?

      Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.

      Need advice about which tool to choose?Ask the StackShare community!

      What companies use Basilica?
      What companies use Tesseract OCR?
        No companies found
        Manage your open source components, licenses, and vulnerabilities
        Learn More

        Sign up to get full access to all the companiesMake informed product decisions

        What are some alternatives to Basilica and Tesseract OCR?
        Shrine
        Shrine implements a plugin system analogous to Roda’s and Sequel’s. Shrine ships with over 25 plugins, which together provide a great arsenal of features. Where CarrierWave and other file upload libraries favor complex class-level DSLs, Shrine favours simple instance-level interface.
        Google Drive
        Keep photos, stories, designs, drawings, recordings, videos, and more. Your first 15 GB of storage are free with a Google Account. Your files in Drive can be reached from any smartphone, tablet, or computer.
        CloudFlare
        Cloudflare speeds up and protects millions of websites, APIs, SaaS services, and other properties connected to the Internet.
        Dropbox
        Harness the power of Dropbox. Connect to an account, upload, download, search, and more.
        Amazon CloudFront
        Amazon CloudFront can be used to deliver your entire website, including dynamic, static, streaming, and interactive content using a global network of edge locations. Requests for your content are automatically routed to the nearest edge location, so content is delivered with the best possible performance.
        See all alternatives