Applying Regular Expressions on the Contents of a File

The matching routines in java.util.regex require that the input be a CharSequence object. This example implements a method that efficiently returns the contents of a file in a CharSequence object.
// Converts the contents of a file into a CharSequence // suitable for use by the regex package. public CharSequence fromFile(String filename) throws IOException { FileInputStream fis = new FileInputStream(filename); FileChannel fc = fis.getChannel(); // Create a read-only CharBuffer on the file ByteBuffer bbuf = fc.map(FileChannel.MapMode.READ_ONLY, 0, (int)fc.size()); CharBuffer cbuf = Charset.forName("8859_1").newDecoder().decode(bbuf); return cbuf; }
Here is sample code that uses the method:
try { // Create matcher on file Pattern pattern = Pattern.compile("pattern"); Matcher matcher = pattern.matcher(fromFile("infile.txt")); // Find all matches while (matcher.find()) { // Get the matching string String match = matcher.group(); } } catch (IOException e) { }

Comments

25 Aug 2010 - 1:27pm by 12 Oz Mouse (not verified)

Not quite efficienlty. Charset.forName("8859_1").newDecoder().decode(bbuf) copies (with decoding) the memory-mapped byte buffer causing out-of-memory exception in situations that just accessing the byte buffer never would, thus negating all advantages of file memory-mapping

Post a comment

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image. Ignore spaces and be careful about upper and lower case.