This project provides a multi-functional document analyzer that supports text extraction, summarization, and question generation. The tool supports various input formats like PDFs, images, audio, Excel, Word, and PowerPoint files. Leveraging transformers, OCR (Tesseract), PyMuPDF, and other machine learning libraries, it can handle multilingual content (especially Arabic and English) for both summarization and question generation tasks.
Features
Text Extraction: Extract text from documents including PDFs, images, audio files, Excel, Word, and PowerPoint.
Summarization: Summarize extracted text in either Arabic or English.
Question Generation: Generate comprehension questions based on the extracted text.