The primary tasks involved in e-Publishing are the conversion of paper, images or other physical items into digital format. We help you decide on the most appropriate digital format and provide high quality data conversion services at a reasonable price.
1. Image Scanning & Conversion Services:
We scan and convert physical books and magazines into digital media such as images, electronic text, or electronic books (e-books) by using image scanners.
Digital books can be distributed and reproduced easily and can be read on-screen. Our processes help you to turn book pages into a digital text format like ASCII or other similar formats. We also facilitate reformatting of text to aid world class search & retrieval system or use it in other applications.
- We can convert the input paper format into: Portable Document Format (PDF), Tagged Image File Format (TIFF), JPG, JPEG, etc.
- We are specialized in the conversion of raw images into optical character recognition (OCR) text and aid data cleanup.
- Conversion of content into ePub is one of our areas of specialization
- We also can load data into databases such as SQL, MySQL etc. and output it into a format of your choice. For e.g., XML, Text, Html etc.
We can also output imaged text into:
- .pos files: With the help of x & y coordinates of each of the extracted terms in the text. This process helps to identify exact location of a keyword and highlight it during display
- .txt files: load all extracted data into plain text format.
- .pdf files: convert the data in searchable portable document format
- .alto files: load the OCRed data into open XML standard format with layout information
- .html files: convert the extracted data into html for web display
- xml files: load data into XML & validate them
2. METS/ALTO Generation:
ALTO is an open XML standard that describes OCRed text and provides layout information of the printed documents. It is often used with METS standard.
ALTO file consists of three major sections of the root element:
- containing metadata about the ALTO file itself and information on file creation
- contains the content information and is subdivided into elements.
- contains the text and paragraph:
- has font descriptions
- has paragraph descriptions, e.g. alignment information
METS: is a metadata standard for encoding descriptive, administrative, and structural metadata of objects within a digital library, expressed using the XML schema language of the WWW Consortium.
3. Digital Library System:
This is a semi-automated process that utilizes OCR and layout-recognition technology. Both manual and automated tasks undergo manual review before the data is exported into XML to create web-accessible derivative images.
OCR technology has improved considerably in the recent past. It can recover valuable information and make it reusable. Data entry through Optical Character Recognition (OCR) is faster, more accurate and efficient than the keystroke data entry practices. In several cases of text conversion, OCR provides an alternative to keyboarding / data entry, or can be part of a mixed data entry/OCR process.
Process Followed:
- Scanned copies such as TIFF, GIF, JPG, JPEG etc are converted into text files using OCR and eye-balled for accuracy
- Alongside OCR process our Subject Matter Expert (SMEs) go through the documents and identify keywords, metadata and other client specified elements to ear mark the printed copies of such documents.
- The marked and OCRed documents are tagged into several coding formats, such as HTML and XML
- After a final Quality Check the files uploaded to a secured FTP location.