VeryPDF PaperTools COM/SDK — Feature Overview and Use Cases
Key features
- Image cleanup: Deskew, Despeckle, Black Border Removal, Black Lines Removal (horizontal/vertical).
- Binarization: Dynamic thresholding, fixed/auto threshold, and dither options.
- Layout analysis: Detects areas (Text, Inverted Text, Noise, Images, Tables, Lines) and supports sub-classification rules.
- OCR with positions: OCR to text with X/Y/Width/Height coordinates.
- Input/output formats: BMP, JPEG, GIF, PNG, TIFF, MNG, ICO, PCX, TGA, WMF, WBMP, JBG, J2K.
- Interfaces & languages: COM/ActiveX, .NET assembly, C/C++, Java (JNI); usable from C#, VB, VB.NET, Python, PHP, Ruby, JavaScript, etc.
- Product variants: Command-line shell, API/SDK, COM/ActiveX.
- Cross-platform support: Windows, Linux (CentOS/SuSE/RedHat), Mac OS X.
- Licensing & support: Server/Developer licenses and optional paid support tiers.
Typical use cases
- Automated preprocessing of scanned documents before OCR (deskew, despeckle, border/line removal).
- Converting scanned image PDFs into searchable text with positional data for downstream extraction.
- Extracting and classifying page regions (text, tables, images) for document conversion and archival workflows.
- Form and table cleanup to improve data extraction accuracy (remove form lines, detect table structure).
- Batch image processing integrated into document ingestion pipelines (server-side automation).
- Embedding image-preprocessing capabilities into .NET or legacy apps (Access, FoxPro, Delphi) via COM/.NET.
When to choose PaperTools
- You need robust document-layout analysis and image cleanup pre-OCR.
- Your workflow requires a COM/.NET SDK that integrates with legacy Windows applications.
- You must support many raster image formats and perform headless, server-side batch processing.
Limitations / considerations
- Focused on scanned image processing—full PDF feature parity (annotations, forms) is better handled by other VeryPDF SDKs (e.g., PDF Extractor SDK).
- Licensing is commercial (server/developer tiers); evaluate pricing for large-scale deployments.
- For advanced extraction (AI table parsing, downstream data normalization) you may need to combine with other tools or custom logic.
Quick integration notes
- Use the COM/ActiveX or .NET assembly to call functions from C#, VB.NET, Python (via COM bridge), or native C/C++.
- Preprocess images (deskew/despeckle/border removal) → run Layout Analysis → OCR to get text with coordinates → apply extraction rules or templates.
Sources: VeryPDF product pages and knowledge-base documentation (VeryPDF PaperTools COM/SDK).
Leave a Reply