An introduction to CSV

Definition

People can easily mess up between CVS and CSV. I once mistype one as the other in Google and get all the wrong results and end up downloading the wrong library.

CVS stands for  Concurrent Versioning System. Developers use CVS to control source file version especially when working in a team.
While CSV, Comma-separated Values is a file format. It is basically a specialized form of text file with a delimited data format that has fields separated by the comma character and records separated by newlines.

When synchronizing database in a realtime mode can take place, using CSV files to

Agreement on Format of CSV file

The most important thing is an agreement of the CSV format that will be used as the medium of data exchange. It includes among others the agreement of (1) whether to suppress headers or not (column names will show up in the CSV file or not), (2) which character will be used as field delimiter, (3) which characters should be treated specially, and (4) how large the size of those CSV files.

Creating a CSV file

Using a simple text editor, we can create a csv file. However, there are several rules concerning this process. The CSV example in Wikipedia illustrates that:

1. fields that contain commas, double-quotes, or line-breaks must be quoted,
2. a quote within a field must be escaped with an additional quote immediately preceding the literal quote,
3. space before and after delimiter commas may be trimmed, and
4. a line break within an element must be preserved.

Exporting a database table into a CSV file is a feature in major DBMS. In MySQL, via phpMyAdmin, it can conveniently be done using Export menu. To ensure that the reader code could accept the file, configure the parameters by referring to points of agreement as explained above.

Reading a CSV file

In the project I'm currently working on, the application needs to read CSV files to update local database. Using CSV JDBC, I follow steps below

1. read the source file input stream
2. create a CSV JDBC connection to the file location.
3. execute a query in the connection, treating the file as a common database table.
4. manipulating the result set produced by the query.

1. read the source file input stream

In a Struts web application environment, I need to get the CSV file uploaded to the server, and place it in a certain directory.
Hence the following code:

// this is where I get the file
FormFile csvFile = csv.getCSVFile();
String csvFileName = csvFile.getFileName();

//this is where I read the file and moved in into directory csv under web app root directory

String fileCSV = csvFileName.substring(0, csvFileName.length() - 4);
InputStream stream = csvFile.getInputStream();
String csvPath = this.getServlet().getServletContext().getRealPath("/csv") + "/";
OutputStream bos = new FileOutputStream(csvPath + csvFileName);

int bytesRead = 0;

//the buffer size limits the maximum size of the file, I use 1 MB, which translates into 1000 * 1024 Byte

byte[] buffer = new byte[1024000];
while ((bytesRead = stream.read(buffer, 0, 1024000)) != -1) {
bos.write(buffer, 0, bytesRead);
bos.close();
}

stream.close();

2. create a CSV JDBC connection to the file location.

Class.forName("org.relique.jdbc.csv.CsvDriver");
Properties props = new java.util.Properties();
props.put("separator", ";"); // separator is a bar

java.sql.Connection conn = DriverManager.getConnection("jdbc:relique:csv:" + filelocation, props);

Statement stmt = conn.createStatement();

When you use default parameter, like comma as separator, column headers in the first line of the CSV file, and source file has .csv extension, we can ommit the properties setting and create the connection like below.

Connection conn = DriverManager.getConnection("jdbc:relique:csv:" + filelocation);

When you don't, follow the steps of setting the parameter.
File location is the real path to the file. In step 1, variable csvPath contains this value.

3. execute a query in the connection, treating the file as a common database table.

String query = "SELECT id, name, birthday, address, city, zipcode FROM ";
java.sql.ResultSet rscsv = stmt.executeQuery(query + filename);

In step 2, variable fileCSV contains this value.

4. manipulating the result set produced by the query.

What you would do with the result set is all up to you. I choose to iterate over the result set and save each record into the database.



Comments

Popular Posts