public class CSVParser
extends java.lang.Object
CSVStrategy.
Parsing of a csv-string having tabs as separators, '"' as an optional value encapsulator, and comments starting with '#':
String[][] data =
(new CSVParser(new StringReader("a\tb\nc\td"), new CSVStrategy('\t','"','#'))).getAllValues();
Parsing of a csv-string in Excel CSV format
String[][] data =
(new CSVParser(new StringReader("a;b\nc;d"), CSVStrategy.EXCEL_STRATEGY)).getAllValues();
Internal parser state is completely covered by the strategy and the reader-state.
see package documentation for more details
| Modifier and Type | Class and Description |
|---|---|
(package private) static class |
CSVParser.Token
Token is an internal token representation.
|
| Modifier and Type | Field and Description |
|---|---|
private CharBuffer |
code |
private static java.lang.String[] |
EMPTY_STRING_ARRAY
Immutable empty String array.
|
private ExtendedBufferedReader |
in |
private static int |
INITIAL_TOKEN_LENGTH
length of the initial token (content-)buffer
|
private java.util.ArrayList |
record
A record buffer for getLine().
|
private CSVParser.Token |
reusableToken |
private CSVStrategy |
strategy |
protected static int |
TT_EOF
Token (which can have content) when end of file is reached.
|
protected static int |
TT_EORECORD
Token with content when end of a line is reached.
|
protected static int |
TT_INVALID
Token has no valid content, i.e.
|
protected static int |
TT_TOKEN
Token with content, at beginning or in the middle of a line.
|
private CharBuffer |
wsBuf |
| Constructor and Description |
|---|
CSVParser(java.io.InputStream input)
Deprecated.
use
CSVParser(Reader). |
CSVParser(java.io.Reader input)
CSV parser using the default
CSVStrategy. |
CSVParser(java.io.Reader input,
char delimiter)
Deprecated.
|
CSVParser(java.io.Reader input,
char delimiter,
char encapsulator,
char commentStart)
Deprecated.
|
CSVParser(java.io.Reader input,
CSVStrategy strategy)
Customized CSV parser using the given
CSVStrategy |
| Modifier and Type | Method and Description |
|---|---|
private CSVParser.Token |
encapsulatedTokenLexer(CSVParser.Token tkn,
int c)
An encapsulated token lexer
Encapsulated tokens are surrounded by the given encapsulating-string.
|
java.lang.String[][] |
getAllValues()
Parses the CSV according to the given strategy
and returns the content as an array of records
(whereas records are arrays of single values).
|
java.lang.String[] |
getLine()
Parses from the current point in the stream til
the end of the current line.
|
int |
getLineNumber()
Returns the current line number in the input stream.
|
CSVStrategy |
getStrategy()
Obtain the specified CSV Strategy.
|
private boolean |
isEndOfFile(int c) |
private boolean |
isEndOfLine(int c)
Greedy - accepts \n and \r\n
This checker consumes silently the second control-character...
|
private boolean |
isWhitespace(int c) |
protected CSVParser.Token |
nextToken()
Convenience method for
nextToken(null). |
protected CSVParser.Token |
nextToken(CSVParser.Token tkn)
Returns the next token.
|
java.lang.String |
nextValue()
Parses the CSV according to the given strategy
and returns the next csv-value as string.
|
private int |
readEscape(int c) |
private CSVParser.Token |
simpleTokenLexer(CSVParser.Token tkn,
int c)
A simple token lexer
Simple token are tokens which are not surrounded by encapsulators.
|
protected int |
unicodeEscapeLexer(int c)
Decodes Unicode escapes.
|
private static final int INITIAL_TOKEN_LENGTH
protected static final int TT_INVALID
protected static final int TT_TOKEN
protected static final int TT_EOF
protected static final int TT_EORECORD
private static final java.lang.String[] EMPTY_STRING_ARRAY
private final ExtendedBufferedReader in
private final CSVStrategy strategy
private final java.util.ArrayList record
private final CSVParser.Token reusableToken
private final CharBuffer wsBuf
private final CharBuffer code
public CSVParser(java.io.InputStream input)
CSVParser(Reader).CSVStrategy.input - an InputStream containing "csv-formatted" streampublic CSVParser(java.io.Reader input)
CSVStrategy.input - a Reader containing "csv-formatted" inputpublic CSVParser(java.io.Reader input,
char delimiter)
CSVParser(Reader,CSVStrategy).CSVStrategy
except for the delimiter setting.input - a Reader based on "csv-formatted" inputdelimiter - a Char used for value separationpublic CSVParser(java.io.Reader input,
char delimiter,
char encapsulator,
char commentStart)
CSVParser(Reader,CSVStrategy).input - a Reader based on "csv-formatted" inputdelimiter - a Char used for value separationencapsulator - a Char used as value encapsulation markercommentStart - a Char used for comment identificationpublic CSVParser(java.io.Reader input,
CSVStrategy strategy)
CSVStrategyinput - a Reader containing "csv-formatted" inputstrategy - the CSVStrategy used for CSV parsingpublic java.lang.String[][] getAllValues()
throws java.io.IOException
The returned content starts at the current parse-position in the stream.
java.io.IOException - on parse error or input read-failurepublic java.lang.String nextValue()
throws java.io.IOException
java.io.IOException - on parse error or input read-failurepublic java.lang.String[] getLine()
throws java.io.IOException
java.io.IOException - on parse error or input read-failurepublic int getLineNumber()
protected CSVParser.Token nextToken() throws java.io.IOException
nextToken(null).java.io.IOExceptionprotected CSVParser.Token nextToken(CSVParser.Token tkn) throws java.io.IOException
tkn - an existing Token object to reuse. The caller is responsible to initialize the
Token.java.io.IOException - on stream access errorprivate CSVParser.Token simpleTokenLexer(CSVParser.Token tkn, int c) throws java.io.IOException
tkn - the current tokenc - the current characterjava.io.IOException - on stream access errorprivate CSVParser.Token encapsulatedTokenLexer(CSVParser.Token tkn, int c) throws java.io.IOException
tkn - the current tokenc - the current characterjava.io.IOException - on invalid stateprotected int unicodeEscapeLexer(int c)
throws java.io.IOException
c - current char which is discarded because it's the "\\" of "\\uXXXX"java.io.IOException - on wrong unicode escape sequence or read errorprivate int readEscape(int c)
throws java.io.IOException
java.io.IOExceptionpublic CSVStrategy getStrategy()
private boolean isWhitespace(int c)
private boolean isEndOfLine(int c)
throws java.io.IOException
java.io.IOExceptionprivate boolean isEndOfFile(int c)