Fast serialize Excel file to XML with apache poi

Excel is a human readable and writable format, and XML is an important machine language. We need an efficient bridge between those two technologies.

Since 2007, the Excel files are a zip file containing XML data. Those XML files must be serialized into a minimal XML containing the cells data inside tags with the column headers as names. The resulting XML will look like that :

<?xml version="1.0" encoding="UTF-8"?>
<pre:root xmlns:pre="http://uri/">
	<pre:feuil1>
		<pre:ligne>
			<pre:header1>A</pre:header1>
			<pre:header2>B</pre:header2>
		</pre:ligne>
		<pre:ligne>
			<pre:header1>A</pre:header1>
			<pre:header2>B</pre:header2>
		</pre:ligne>
	</pre:feuil1>
	<pre:feuil2>
	</pre:feuil2>
	<pre:feuil3>
	</pre:feuil3>
</pre:root>

Here is the code to do it.

Read Excel file

In that post, I only treat the case of the recent XLSX format. I will use Apache POI and I want it to be able to manage very big files, so I have to use an event API.

The POI streaming API only treats the old XLS format. For XLSX, you have to use the SAX API which only helps you providing access to a stream of the XML of the sheets. The reading of the tags must be coded into your SAX handler and the given example is very useful but not complete.

The code in order to open a file and trigger the SAX reading is :

OPCPackage p = OPCPackage.open(xls.getPath(), PackageAccess.READ);
XSSFReader xssfReader = new XSSFReader(p);
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
while (iter.hasNext()) {
    InputStream stream = iter.next();
    InputSource sheetSource = new InputSource(stream);
    SAXParserFactory saxFactory = SAXParserFactory.newInstance();
    SAXParser saxParser = saxFactory.newSAXParser();
    XMLReader sheetParser = saxParser.getXMLReader();
    sheetParser.setContentHandler(this);
    sheetParser.parse(sheetSource);
    stream.close();
}

Write XML

As I said, I want that serializer to be fast. So I will not use a XML API in order to write the XML output. With some lines of codes, I write the content as a XML containing a namespace for reusability.

Here is the code :

public void writeXmlStart(Writer writer) throws IOException {
        writer.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
        writeTag(writer, "root");
        writer.write(" xmlns:");
        writer.write(PREFIX);
        writer.write("=\"");
        writer.write("http://uri/");
        writer.write('"');
        writeSuperior(writer);
    }

    public void writeSheet(Writer writer, String normalizedName) throws IOException {
        writeTag(writer, normalizedName);
        writeSuperior(writer);
    }

    public void writeLine(Writer writer, StringWriter rowStr) throws IOException {
        writer.write('\t');
        writeTag(writer, LIGNE);
        writeSuperior(writer);
        writer.write(rowStr.toString());
        writer.write('\t');
        writeClosingTag(writer, LIGNE);
    }

    public void writeCell(Writer writer, String cellToString, String colHeader) throws IOException {
        writer.write("\t\t");
        writeTag(writer, colHeader);
        writer.write('>');
        String escaped = StringEscapeUtils.escapeXml10(cellToString);
        writer.write(escaped);
        writeClosingTag(writer, colHeader);
    }

    public void writeSuperior(Writer writer) throws IOException {
        writer.write(">\n");
    }

    public void writeClosingTag(Writer writer, String name) throws IOException {
        writer.write("</");
        writeQName(writer, name);
        writeSuperior(writer);
    }

    public void writeTag(Writer writer, String name) throws IOException {
        writer.write('<');
        writeQName(writer, name);
    }

    public void writeQName(Writer writer, String name) throws IOException {
        writer.write(PREFIX);
        writer.write(':');
        writer.write(name);
    }

Normalizing sheet names and column headers

In order to avoid issues with XML tag names, I normalize the sheet names and the column headers I found. It removes accents, normalizes spaces, and omits any characters which is not a letter, ‘_’ or a digit.

Here is the normalizing code :

public static String normalize(final String input) {
        final StringBuilder validName = new StringBuilder();
        final String spaceNormalized = StringUtils.normalizeSpace(input);
        final String trimed = Normalizer.normalize(spaceNormalized, Normalizer.Form.NFD);
        if (trimed.isEmpty()) {
            return "_";
        }
        final char firstChar = trimed.charAt(0);
        if (Character.isLetter(firstChar) || firstChar == '_') {
            validName.append(Character.toLowerCase(firstChar));
        } else if (Character.isDigit(firstChar)) {
            validName.append("_");
            validName.append(firstChar);
        }
        for (int i = 1; i < trimed.length(); i++) {
            final char charAt = trimed.charAt(i);
            if (Character.isWhitespace(charAt)) {
                validName.append("_");
            } else if (Character.isDigit(charAt) || charAt == '_') {
                validName.append(charAt);
            } else if (Character.isLetter(charAt)) {
                validName.append(Character.toLowerCase(charAt));
            }
        }
        return validName.toString();
    }

Quick overview of XSLX format

The contents of the cells are inside the tag <sheetData> of the XML files of the sheets.
In the files I use, I saw 3 different structures :

    • one with attribute t="inlineStr" :
<row r="1">
    <c r="A1" s="2" t="inlineStr">
        <is>
            <t>First column header</t>
        </is>
    </c>
</row>
    • a structure with the value in clear, with attribute t="n" :
<row r="2">
   <c r="A2" s="3" t="n">
       <v>18.0</v>
   </c>
</row>
    • and a last case, where it uses the “shared strings”. In that case the value is the index of the String in that list of shared strings :
<row r="2" spans="1:34" ht="38.25">
    <c r="A2" s="3" t="s">
        <v>29</v>
    </c>
</row>

I did not see any tag <inlineStr>, as it is expected in example from Apache POI. And in the case of an attribute t="inlineStr", the value is in a tag <t> instead of <v>, as expected in example.

Whole code

You can download the whole code here :
FastXlsx2XmlSerializer

Categories: Excel Tags: Tags: , ,

Web Service : fine control of object marshalling

When returning persisted objects in a web service, the marshaller will browse all the object tree. 2 problems can occur :

  • The object tree is huge and the resulting XML is too large.
  • Some objects have not been fetched from the database and it throws an exception.

In order to avoid that, it is possible to have a control on object marshalling by providing a custom :

  • AccessorFactory returning a custom Accessor.
  • AnnotationReader returning JAXB annotations depending on the context.

Custom AccessorFactory returning a custom Accessor

When instanciating the JAXBContext, enable the @XmlAccessorFactory annotation :

HashMap<String, Object> props = new HashMap<String, Object>();
props.put(JAXBRIContext.XMLACCESSORFACTORY_SUPPORT, true);
JAXBContext jaxbContext = JAXBContext.newInstance(new Class[] { clazz }, props);

Then you can add annotation :

@XmlAccessorFactory(value = CustomAccessorFactory.class)

Implement you CustomAccessorFactory which will return a CustomReflectionAccessor :

public class CustomAccessorFactory implements AccessorFactory {
    @Override
    public Accessor createFieldAccessor(final Class bean, final Field field, final boolean readOnly) throws JAXBException {
        return new CustomReflectionAccessor(field);
    }
    @Override
    public Accessor createPropertyAccessor(final Class bean, final Method getter, final Method setter) throws JAXBException {
        return new CustomReflectionAccessor(getter, setter);
    }
}

In the CustomReflectionAccessor you will implement get() method that way :

    @Override
    public ValueT get(final BeanT bean) throws AccessorException {
        final ValueT valueT = doGet(bean);
        if (!Hibernate.isInitialized(valueT)) {
            return null;
        }
        return valueT;
    }

There are some traps which can be avoided using the whole code :CustomReflectionAccessor

Custom AnnotationReader

It is also possible (and the two techniques can be combined) to provide a custom AnnotationReader which will return the JAXB annotations depending on the context.

HashMap<String, Object> props = new HashMap<String, Object>();
props.put(JAXBRIContext.ANNOTATION_READER, new CustomAnnotationReader());
JAXBContext jaxbContext = JAXBContext.newInstance(new Class[] { clazz }, props);

Your CustomAnnotationReader should implement AbstractInlineAnnotationReaderImpl.

Categories: Web Services Tags: Tags: , ,

Hibernate : get the modified properties and their old values

Imagine your IHM sends you a big POJO object which is already persisted. Are you going to let Hibernate flush the whole state of the entity without even knowing what has been changed ?

If you can do that, you are lucky. In real applications, it is often necessary to trigger some specific treatments if a property has been changed.

Actually, Hibernate does it all the time, so it has all the needed methods (publicly visibles) in order to know exactly the changes made to your pojo.

Here is the code in order to use those methods.

Get EntityPersister from SessionFactory

You must have a method getSessionFactory() which gives you the Hibernate SessionFactory. Then write a method which will return the entityPersister of your Pojo :

protected EntityPersister getEntityPersister() {
    SessionFactoryImplementor sessionFactoryImplementor = (SessionFactoryImplementor) getSessionFactory();
    return sessionFactoryImplementor.getEntityPersister(POJO.class.getName());
}

Get the databaseSnapshot of your object

With the entityPersister and your session casted as SessionImplementor, you can get the databaseSnapshot, which is an array of Objects, the persisted values of the properties of your pojo. You must have a getId() method on your pojo in order to get its primary key value.

EntityPersister entityPersister = getEntityPersister();
SessionImplementor sessionImplementor = (SessionImplementor) getSession();
PersistenceContext persistenceContext = sessionImplementor.getPersistenceContext();
Object[] databaseSnapshot = persistenceContext.getDatabaseSnapshot((Serializable) pojo.getId(), entityPersister);

Get the indexes of modified properties

entityPersister has a method findModified() which gives you an array of the index of the modified properties :

public int[] findPropertiesModified(POJO pojo) {
    EntityPersister entityPersister = getEntityPersister();
    SessionImplementor sessionImplementor = (SessionImplementor) getSession();
    PersistenceContext persistenceContext = sessionImplementor.getPersistenceContext();
    Object[] databaseSnapshot = persistenceContext.getDatabaseSnapshot((Serializable) pojo.getId(), entityPersister);
    Object[] propertyValues = entityPersister.getPropertyValues(pojo, getSession().getEntityMode());
    return entityPersister.findModified(databaseSnapshot, propertyValues, pojo, sessionImplementor);
}

Get the persisted value

The persisted value of the property is databaseSnapshot[index].

Get the name of the modified property

You can retrieve the name of the modified property from its index with the EntityMetamodel of your entity.

EntityMetamodel metamodel = entityPersister.getEntityMetamodel();
String[] properties = metamodel.getPropertyNames();

The name of the property is properties[index].

Categories: Hibernate Tags: Tags: ,

Write a ResultSet to Excel with Apache POI

We often face the need to export data in a human-readable and human-writable way. Export to Excel seems natural, giving access to all his tools of filtering, etc…

I had the occasion to do it a generic way, I share it here. It does not cover all cases of data found in database, up to you to complete it and maybe share your improvements.

Dump headers

First step is to dump the column headers, found in ResultSetMetaData. The cells will be styled in BOLD font.

public HSSFWorkbook dump(ResultSet resultSet) throws SQLException {
    HSSFWorkbook workbook = new HSSFWorkbook();
    HSSFFont boldFont = workbook.createFont();
    boldFont.setBoldweight(HSSFFont.BOLDWEIGHT_BOLD);
    HSSFSheet sheet = workbook.createSheet("sheet");
    HSSFRow titleRow = sheet.createRow(0);
    ResultSetMetaData metaData = resultSet.getMetaData();
    int columnCount = metaData.getColumnCount();
    for (int colIndex = 0; colIndex &lt; columnCount; colIndex++) {
        String title = metaData.getColumnLabel(colIndex + 1);
        HSSFCell cell = HSSFCellUtil.createCell(titleRow, colIndex, title);
        HSSFCellStyle style = workbook.createCellStyle();
        style.setFont(boldFont);
        cell.setCellStyle(style);
    }
    return workbook;
}

Dump data

Then the resultSet is browsed (a call to beforeFirst() resets the cursor), and different cases of Object retrieved are covered. That does not cover all different object types the driver can return.

    private void dumpData(ResultSet resultSet, HSSFSheet sheet, int columnCount) throws SQLException {
        int currentRow = 1;
        resultSet.beforeFirst();
        while (resultSet.next()) {
            HSSFRow row = sheet.createRow(currentRow++);
            for (int colIndex = 0; colIndex < columnCount; colIndex++) {
                Object value = resultSet.getObject(colIndex + 1);
                final HSSFCell cell = row.createCell(colIndex);
                if (value == null) {
                    cell.setCellValue("");
                } else {
                    if (value instanceof Calendar) {
                        cell.setCellValue((Calendar) value);
                    } else if (value instanceof Date) {
                        cell.setCellValue((Date) value);
                    } else if (value instanceof String) {
                        cell.setCellValue((String) value);
                    } else if (value instanceof Boolean) {
                        cell.setCellValue((Boolean) value);
                    } else if (value instanceof Double) {
                        cell.setCellValue((Double) value);
                    } else if (value instanceof BigDecimal) {
                        cell.setCellValue(((BigDecimal) value).doubleValue());
                    }
                }
            }
        }
        for (int i = 0; i < columnCount; i++) {
            sheet.autoSizeColumn(i);
        }
    }

(Optional) Add cell formatting

If you want to format the cells, you can parameter a map between JDBC formats and Excel formatting. The JDBC formats will be set with regular expressions. Such a map can be initialized with a Spring file, like this :

<util:map map-class="java.util.LinkedHashMap">
    <entry key="NUMBER\(\d+,2\)" value="0.00" />
    <entry key="NUMBER\(\d+,0\)" value="0" />
    <entry key="NUMBER.*" value="0.###" />
    <entry key="DECIMAL\(\d+,2\)" value="0.00" />
    <entry key="DECIMAL\(\d+,0\)" value="0" />
    <entry key="DECIMAL.*" value="0.###" />
    <entry key=".*CHAR.*" value="text" />
    <entry key="DATE.*" value="dd/MM/yyyy" />
    <entry key="TIMESTAMP.*" value="dd/MM/yyyy" />
</util:map>

Then in the code, add the methods :


    private LinkedHashMap<String, String> mapSqlTypeExcelFormat;

    private HSSFCellStyle getDataStyle(HSSFWorkbook workbook, ResultSetMetaData metaData, int colIndex, HSSFDataFormat dataFormat) throws SQLException {
        HSSFCellStyle dataStyle = workbook.createCellStyle();
        String columnType = metaData.getColumnTypeName(colIndex + 1);
        columnType += "(" + metaData.getPrecision(colIndex + 1);
        columnType += "," + metaData.getScale(colIndex + 1) + ")";
        String excelFormat = getExcelFormat(columnType);
        final short format = dataFormat.getFormat(excelFormat);
        dataStyle.setDataFormat(format);
        return dataStyle;
    }

    private String getExcelFormat(String columnType) {
        for (Map.Entry<String, String> entry : mapSqlTypeExcelFormat.entrySet()) {
            if (Pattern.matches(entry.getKey(), columnType)) {
                return entry.getValue();
            }
        }
        return "text";
    }

Whole code

Java Class :
ResultSetToExcel

JUnit Test Class :
ResultSetToExcelTest

Categories: Excel Tags: Tags: , ,