Docx Handler¶
- class fuzzy_table_extractor.doc_handlers.DocxHandler(file_path: pathlib.Path)¶
The DocxHandler is handler for Microsoft Word Documents. It is suposed to use the newer .docx format for Word Documents, but if a .doc file is provided, it will be converted to .docx format.
- property dictionary: pandas.DataFrame¶
All cell couples in document
- property document: docx.api.Document¶
Open document and creates a docx file if necessary
- Returns
word document object
- Return type
Document
- property docx_tables: List[docx.table.Table]¶
List of tables in docx document
- property tables: List[pandas.DataFrame]¶
List of all tables (as dataframes) in document
- property words: List[str]¶
List of all words in document