Documentation
¶
Overview ¶
Package vergilevhasi OCR module provides digit recognition for VKN extraction.
This is a ZERO-DEPENDENCY implementation that works without: - ONNX Runtime - TensorFlow Lite - Tesseract - Any external tools
It uses: - Pure Go image processing - Built-in PDF text/image extraction from the pdfcpu library - Code128 barcode scanning with gozxing library - Feature-based digit recognition with a trained classifier
Usage:
parser, _ := vergilevhasi.NewOCRParser()
defer parser.Close()
vkn, err := parser.ExtractVKNFromImage("image.png")
// or from PDF bytes:
// vkn, err := parser.ExtractVKNFromPDFBytes(pdfData)
Package vergilevhasi provides tools for parsing Turkish tax plate (Vergi Levhası) PDF documents.
This library extracts structured data from tax plate PDFs issued by the Turkish Revenue Administration (Gelir İdaresi Başkanlığı - GİB).
Basic Usage ¶
parser := vergilevhasi.NewParser()
result, err := parser.ParseFile("vergi-levhasi.pdf")
if err != nil {
log.Fatal(err)
}
fmt.Printf("VKN: %s\n", result.VergiKimlikNo)
Extracted Fields ¶
The parser extracts the following information:
- Adı Soyadı (Full Name) - for individuals
- Ticaret Ünvanı (Trade Name) - for companies
- İş Yeri Adresi (Business Address)
- Vergi Türü (Tax Types)
- Faaliyet Kodları (Activity Codes - NACE codes)
- Vergi Dairesi (Tax Office)
- Vergi Kimlik No (Tax ID Number - VKN)
- TC Kimlik No (Turkish ID Number - TCKN) - for individuals
- İşe Başlama Tarihi (Business Start Date)
- Geçmiş Matrahlar (Historical Tax Bases)
OCR Support ¶
For PDFs where the VKN is embedded as a barcode image rather than text, use the OCR parser:
parser, _ := vergilevhasi.NewOCRParser()
defer parser.Close()
vkn, err := parser.ExtractVKNFromImage("vergi-levhasi.png")
// or from PDF bytes:
// vkn, err := parser.ExtractVKNFromPDFBytes(pdfData)
Index ¶
- type DigitClassifier
- type DigitFeatureWeights
- type DigitFeatures
- type Faaliyet
- type Matrah
- type OCRParser
- func (p *OCRParser) Close() error
- func (p *OCRParser) ExtractVKNFromImage(imagePath string) (string, error)
- func (p *OCRParser) ExtractVKNFromImageBytes(imgData []byte) (string, error)
- func (p *OCRParser) ExtractVKNFromImageData(img image.Image) (string, error)
- func (p *OCRParser) ExtractVKNFromPDFBytes(pdfData []byte) (string, error)
- func (p *OCRParser) ExtractVKNFromPDFReaderWithImage(reader io.Reader) (string, error)
- func (p *OCRParser) ExtractVKNFromPDFWithImage(data []byte) (string, error)
- func (p *OCRParser) SaveDebugImage(img image.Image, filename string) error
- func (p *OCRParser) SetOCRDebug(debug bool)
- type Parser
- type VergiLevhasi
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type DigitClassifier ¶ added in v1.1.1
type DigitClassifier struct {
// contains filtered or unexported fields
}
DigitClassifier recognizes digits using feature extraction
func NewDigitClassifier ¶ added in v1.1.1
func NewDigitClassifier() *DigitClassifier
NewDigitClassifier creates a classifier with pre-trained weights
type DigitFeatureWeights ¶ added in v1.1.1
type DigitFeatureWeights struct {
// contains filtered or unexported fields
}
DigitFeatureWeights contains weights for matching a specific digit
type DigitFeatures ¶ added in v1.1.1
type DigitFeatures struct {
// contains filtered or unexported fields
}
DigitFeatures contains extracted features from a digit image
type Matrah ¶
type Matrah struct {
Yil int `json:"yil"`
Donem string `json:"donem,omitempty"`
Tutar float64 `json:"tutar,omitempty"`
Tur string `json:"tur,omitempty"`
}
Matrah represents historical tax base information
type OCRParser ¶ added in v1.1.1
type OCRParser struct {
*Parser
// contains filtered or unexported fields
}
OCRParser provides OCR capabilities for VKN extraction Zero external dependencies - works with pure Go
func NewOCRParser ¶ added in v1.1.1
NewOCRParser creates a new OCR parser with zero dependencies
func (*OCRParser) Close ¶ added in v1.1.1
Close releases resources (no-op for pure Go implementation)
func (*OCRParser) ExtractVKNFromImage ¶ added in v1.1.1
ExtractVKNFromImage extracts VKN from an image file
func (*OCRParser) ExtractVKNFromImageBytes ¶ added in v1.1.1
ExtractVKNFromImageBytes extracts VKN from image bytes
func (*OCRParser) ExtractVKNFromImageData ¶ added in v1.1.1
ExtractVKNFromImageData extracts VKN from an image.Image
func (*OCRParser) ExtractVKNFromPDFBytes ¶ added in v1.1.1
ExtractVKNFromPDFBytes extracts VKN from PDF bytes by extracting embedded images Uses pdfcpu for image extraction (pure Go, no external dependencies)
func (*OCRParser) ExtractVKNFromPDFReaderWithImage ¶ added in v1.1.1
ExtractVKNFromPDFReaderWithImage extracts VKN from a PDF reader by extracting embedded images Uses pdfcpu for image extraction (pure Go, no external dependencies)
func (*OCRParser) ExtractVKNFromPDFWithImage ¶ added in v1.1.1
ExtractVKNFromPDFWithImage extracts VKN from a PDF by extracting embedded images and scanning barcodes Uses pdfcpu for image extraction (pure Go, no external dependencies)
func (*OCRParser) SaveDebugImage ¶ added in v1.1.1
SaveDebugImage saves an image for debugging
func (*OCRParser) SetOCRDebug ¶ added in v1.1.1
SetOCRDebug enables debug output
type Parser ¶
type Parser struct {
// contains filtered or unexported fields
}
Parser is responsible for parsing Turkish tax plate PDFs
func (*Parser) Parse ¶
func (p *Parser) Parse(reader io.ReadSeeker) (*VergiLevhasi, error)
Parse parses a tax plate PDF from an io.ReadSeeker and returns structured data
type VergiLevhasi ¶
type VergiLevhasi struct {
// Adı Soyadı (Full Name) - for individuals, can be empty
AdiSoyadi string `json:"adi_soyadi"`
// Ticaret Ünvanı (Trade Name) - for companies, can be empty
TicaretUnvani string `json:"ticaret_unvani"`
// İş Yeri Adresi (Business Address)
IsYeriAdresi string `json:"is_yeri_adresi,omitempty"`
// Vergi Türü (Tax Type)
VergiTuru []string `json:"vergi_turu,omitempty"`
// Faaliyet Kodları ve Adları (Activity Codes and Names)
FaaliyetKodlari []Faaliyet `json:"faaliyet_kodlari,omitempty"`
// Vergi Dairesi (Tax Office)
VergiDairesi string `json:"vergi_dairesi,omitempty"`
// Vergi Kimlik No (Tax ID Number)
VergiKimlikNo string `json:"vergi_kimlik_no,omitempty"`
// TC Kimlik No (Turkish ID Number) - for individuals
TCKimlikNo string `json:"tc_kimlik_no,omitempty"`
// İşe Başlama Tarihi (Business Start Date)
IseBaslamaTarihi *time.Time `json:"ise_baslama_tarihi,omitempty"`
// Geçmiş Matrahlar (Historical Tax Bases)
GecmisMatra []Matrah `json:"gecmis_matrahlar,omitempty"`
// Raw text extracted from PDF
RawText string `json:"-"`
}
VergiLevhasi represents the structured data extracted from a Turkish tax plate PDF