Docx Handler

class fuzzy_table_extractor.doc_handlers.DocxHandler(file_path: pathlib.Path)

The DocxHandler is handler for Microsoft Word Documents. It is suposed to use the newer .docx format for Word Documents, but if a .doc file is provided, it will be converted to .docx format.

property dictionary: pandas.DataFrame

All cell couples in document

property document: docx.api.Document

Open document and creates a docx file if necessary

Returns

word document object

Return type

Document

property docx_tables: List[docx.table.Table]

List of tables in docx document

property tables: List[pandas.DataFrame]

List of all tables (as dataframes) in document

property words: List[str]

List of all words in document