locked
JDBC Batching with prepared statement SQLServer 2019 RRS feed

  • Question

  • Hello, 
    I've been writing an application for import / export of a database. 
    So far I've been getting decent performance except with microsoft database.
    I suspect, batching is not quite implemented with how slow it's running.

                performBindingOnPreparedStatement(con, prepStatement, datatypes, currentLineData);
                prepStatement.addBatch();
                
                if (i % batchSize == 0) {
                    int[] executedBatches = null;
    
                    executedBatches = prepStatement.executeBatch();
                    con.commit();
                    
                    if (LOG.isTraceEnabled()) {
                        LOG.trace(Arrays.toString(executedBatches));
                    }
                    
                    importedRecords += (executedBatches != null) ? executedBatches.length : 0;
                }

    It's pretty straight forward.
    I suspected issues with the blob/clob handling but my table it's so slow on, does not contain clobs/blobs.

    Now to my table. It has 54 columns with multiple (8) varchar(max) columns.
    It's rather big but I don't fully use those columns anyway.
    JDBC Driver is mssql-jdbc-7.4.1.jre8.jar. Microsoft Server is 2019.
    It has enough resources and statements generally are quite fast on that instance.
    It's not having heavy loads or anything since it's used only for development.

    Now on oracle I was having time of about 8 Minutes for 4~ Million Entries.
    Postgres took about 16 minutes.
    With SQLServer I'm 5 minutes in and I don't even have 20 K Entries done. So that's the reason I suspect, there is no actual batching happening.
    I've not tried using:

    Statement stmt = con.createStatement();
    stmt.addbatch(sqlString);
    I will try that solution, later.
    I would love some kind of help since I'm not really any further after 5 hours of analyzing and trying things differently.

    Best Regards



    • Edited by Cibot Tuesday, July 28, 2020 6:58 AM formatting
    Tuesday, July 28, 2020 6:56 AM

All replies

  • Tried getting the 

    Statement stmt = con.createStatement();
    stmt.addbatch(sqlString);

    Approach to work, unfortunately it's not quite easy as I'd have to somehow create a valid sqlString by myself.
    And the value contains Strings with multiple lines. Which makes it impossible to simply replace the ? from the query I generate.

    Right now I made the batches much much smaller, to just 100. It fails when values have linebreaks but otherwise it works fine. Speeds are slow, but MUCH faster than preparedStatement. After 3 Minutes I already have 42 K entries inserted.


    For my preparedStatement approach, i generate a sql string like the following

    INSERT into TBLNAME (TBLCOLUMNS1, .... TBLCOLUMNSX) VALUES ( ?, ..... ? )

    ? being the wildcards. 



    • Edited by Cibot Tuesday, July 28, 2020 11:28 AM
    Tuesday, July 28, 2020 11:18 AM
  • Try appending ";useBulkCopyForBatchInsert=true" to the connection string so that the bulk copy API is used for batch inserts.


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    Tuesday, July 28, 2020 11:56 AM
  • My JDBC String is the following: jdbc:sqlserver://server:1433;databaseName=test;useBulkCopyForBatchInsert=true

    So I've seen that parameter, but it didn't really do anything. 


    • Edited by Cibot Tuesday, July 28, 2020 12:01 PM
    Tuesday, July 28, 2020 12:01 PM
  • I would expect the specification to be honored with prepared statements and addBatch with your driver version. A trace of rpc requests (Profier or Extended Events) will show if bulk insert is actually used or not. If not, you could try the latest GA driver version (currently 8.2).

    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    Tuesday, July 28, 2020 12:16 PM
  • Good point. 
    I'm not exactly a database administrator so I'm not exactly sure if I can determine if something is batched or not.
    What I can say is that there is a seperate 

    RPC:STARTING

    RPC COMPLETED

    For each statement. 
    After each statement there seems to be 1 additional statement executed which seems something similar to a batching mechanims. 

    Basically what happens:

    declare @p1 int
    set @p1=0
    exec sp_prepexec @p1 output ... (does some binding on columns and tabledata) select @p1

    return number (example 4820)

    exec sp_execute 4820, (table data) 

    After that is done, it starts declaring again. It seems this mechanism happens, when following statements are similar. So when only 1 or 2 field is different. So sometimes there is 1 up to 4 statements 'batched' in this mechanism. 

    Also using the latest driver did not incur any changes in behaviour unfortunately. 

    If you can guide me to tell me what to do, that would be wonderful. 
    It would be really weird, if the mssql jdbc driver did not support batching in preparedstatements. 





    • Edited by Cibot Tuesday, July 28, 2020 2:32 PM
    Tuesday, July 28, 2020 2:24 PM
  • So yeah I just compared it to the approach with 

    Statement stmt = con.createStatement();
    stmt.addBatch("Insert ...");

    And it looks much more like batching:

    RPC:Starting

    SQL:BatchStarting

    SQL:StmtStarting

    SQL:StmtCompleted

    SQL:StmtStarting

    .......

    I guess it just doesn't support it? But I still don't like the idea of it not supporting that... That's such a basic feature. 

    Tuesday, July 28, 2020 3:29 PM
  • Sorry, I naively assumed the batched inserts would be done transparently using SQLServerBulkCopy with the connection string spec. For full disclosure, I only have a cursory knowledge of Java.

    For maximum insert performance, you could use the SQLServerBulkCopy class directly instead of traditional prepared statements. This will stream data such that performance is orders of magnitude faster than singleton or batched insert statements. I ran a quick test of this technique with the code below and the latest jdbc driver and observed 4M rows in about 15 seconds.

    I used a ResultSet for the source data in the example but one can also provide source data with other writeToServer overloads (SQLServerBulkCSVFileRecord or ISQLServerBulkData), depending on what is most convenient for the task at hand.

    public static void sqlBulkCopyExample() throws SQLException, Exception {
    
        String destinationTable = "dbo.TargetTable";
        ResultSet sourceData;
    
         try (Connection sourceConnection = DriverManager.getConnection(connectionString);
                 Connection destinationConnection = DriverManager.getConnection(connectionString);
                 Statement stmt = sourceConnection.createStatement();
                 SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(destinationConnection)) {
    
             sourceData = stmt.executeQuery("WITH \r\n" + 
             		"	 t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))\r\n" + 
             		"	,t1k AS (SELECT 0 AS n FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)\r\n" + 
             		"	,t1g AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t1k AS a CROSS JOIN t1k AS b CROSS JOIN t1k AS c)\r\n" + 
             		"SELECT num,num,num,num,num,num,num,num,num,num\r\n" + 
             		"FROM t1g\r\n" + 
             		"WHERE num <= 4000000;");
    
             bulkCopy.setDestinationTableName(destinationTable);
    
             bulkCopy.writeToServer(sourceData);
    
         }
    
         catch (SQLException e) {
             e.printStackTrace();
         }
    
    }


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com



    • Edited by Dan GuzmanMVP Thursday, July 30, 2020 10:31 AM changed ISqlBulkRecord to ISqlBulkData
    Wednesday, July 29, 2020 12:03 PM
  • Thank you for your time.
    Unfortunately this won't work.
    I am actually using a CSV File for my data so SQLServerBulkCSVFileRecord sounded really promising. Now I was almost done with the implementation (and it looked horrible...) when I found a few notes to the class:

    1. Streaming of large data types such as varchar(max)varbinary(max)nvarchar(max)sqlxml, and ntext isn't supported.

    2. The delimiter specified for the CSV file shouldn't appear anywhere in the data and should be escaped properly if it is a restricted character in Java regular expressions.

    3. In the CSV file implementation, double quotes are treated as part of the data. For example, the line hello,"world","hello,world" would be treated as having four columns with the values hello"world""hello and world" if the delimiter is a comma.

    4. New line characters are used as row terminators and aren't allowed anywhere in the data.

    All of that applies to me. I have CLOB/BLOBS which are very large data. I have columns with line breaks. Also the way the CSV is written, it uses double quota ' " ' as escape mechanism when there is a delimiter in text data or line breaks. I've tested that csv writer/ csv reader class extensively and could never identify an error. Based on that note, it wouldn't work though. 

    So what now? PreparedStatements should be a must for jdbc drivers so is there a way to report this as a bug? 

    Thursday, July 30, 2020 7:22 AM
  • Although large types are not streamed, I would still expect it to provide much better performance than traditional prepared statements.

    If the SQLBulkCSVRecord won't work for you due to parsing limitations, you could try ISQLServerBulkData since you apparently have successfully parsed the source data for the prepared statement implementation.

    Report issues with the Microsoft JDBC Driver on the open source github project site


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    Thursday, July 30, 2020 10:49 AM
  • Thanks for the project site. So I've searched stuff on there and found a few parameters for the connection which actually fix my problem I guess.
    I don't fully understand how they work but they definitely make the preparedstatement faster. 400K Entries in 1 Minute. I'm totally fine with that number. Maybe I'll find some more tweaks to it, but this way, I don't have to rewrite the whole architecture.

    ISQLServerBulkData I don't fully get. There is SQLServerBulkBatchInsertRecord which represents a record. Not sure how to use it since there is no code for that and no example. You can't exactly batch it from what I could see. I could do

    SQLServerBulkBatchInsertRecord data =  new SQLServerBulkBatchInsertRecord(...);
    bulkCopy.writeToServer(data);

    But that is only 1 row. I want as few roundtrips as possible so I don't really see how that could be a solution. Maybe you've got some idea for me. 

    Do you maybe know if there are other forums or places where I can ask about performance tweaking? There might be more JDBC parameters or methods I could call or maybe even SQL Server settings which I can change.

    The actual JDBC URL Parameters I set for the performance increase:

    statementPoolingCacheSize=5000;disableStatementPooling=false;enablePrepareOnFirstPreparedStatementCall=true;

    The statementPoolingCacheSize is equal to my batchsize. So not sure if that's correct but kinda seems logical. 


    • Edited by Cibot Thursday, July 30, 2020 12:04 PM
    Thursday, July 30, 2020 11:55 AM
  • Thanks for the project site. So I've searched stuff on there and found a few parameters for the connection which actually fix my problem I guess.
    I don't fully understand how they work but they definitely make the preparedstatement faster. 400K Entries in 1 Minute. I'm totally fine with that number. Maybe I'll find some more tweaks to it, but this way, I don't have to rewrite the whole architecture.

    Can you please share the specifics what you've found? That may help others in search of a solution.

    Of course, you don't need to go down the bulk insert path if fast is fast enough. Other DBMS drivers may have similar, and proprietary, methods for bulk inserts but you'll need to weigh the benefits of the simpler architecture and shared code against the resultant performance improvements.

    ISQLServerBulkData I don't fully get. There is SQLServerBulkBatchInsertRecord which represents a record. Not sure how to use it since there is no code for that and no example. You can't exactly batch it from what I could see. 

    I think all you need to do is implement ISQLServerBulkData using your parsed data as the source. Below is the stubbed implementation generated by Eclipse for your reference. Although I could make an attempt at an implementation example, I suspect your Java skills are better than mine ;-) 

    class SQLServerBulkDataExampleStub implements ISQLServerBulkData {
    
    	private static final long serialVersionUID = 1L;
    
    	@Override
    	public String getColumnName(int arg0) {
    		// TODO Auto-generated method stub
    		return null;
    	}
    
    	@Override
    	public Set<Integer> getColumnOrdinals() {
    		// TODO Auto-generated method stub
    		return null;
    	}
    
    	@Override
    	public int getColumnType(int arg0) {
    		// TODO Auto-generated method stub
    		return 0;
    	}
    
    	@Override
    	public int getPrecision(int arg0) {
    		// TODO Auto-generated method stub
    		return 0;
    	}
    
    	@Override
    	public Object[] getRowData() throws SQLException {
    		// TODO Auto-generated method stub
    		return null;
    	}
    
    	@Override
    	public int getScale(int arg0) {
    		// TODO Auto-generated method stub
    		return 0;
    	}
    
    	@Override
    	public boolean next() throws SQLException {
    		// TODO Auto-generated method stub
    		return false;
    	}
    }


    Do you maybe know if there are other forums or places where I can ask about performance tweaking? There might be more JDBC parameters or methods I could call or maybe even SQL Server settings which I can change.

    Someone else might still chime in here but you could multi-post to stackoverflow.com with a mssql-jdbc and jdbc tags. Include a link to this thread in your question and add the SO link to your question here too to help avoid duplication of effort by your peers.


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    Thursday, July 30, 2020 1:26 PM
  • I wrote the jdbc URL Parameters in the my previous reply, so that's basically the solution. 
    Thank you for your assistance. Yes that's the approach to the issue, totally missed there was an interface to work with, my bad. 
    Thursday, July 30, 2020 1:57 PM