| Class | CodeRay::Scanners::Scanner |
| In: |
lib/coderay/scanner.rb
|
| Parent: | StringScanner |
The base class for all Scanners.
It is a subclass of Ruby‘s great StringScanner, which makes it easy to access the scanning methods inside.
It is also Enumerable, so you can use it like an Array of Tokens:
require 'coderay'
c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"
for text, kind in c_scanner
puts text if kind == :operator
end
# prints: (*==)++;
OK, this is a very simple example :) You can also use map, +any?+, find and even sort_by, if you want.
| ScanError | = | Class.new StandardError | Raised if a Scanner fails while scanning | |
| DEFAULT_OPTIONS | = | { } |
The default options for all scanner classes.
Define @default_options for subclasses. |
|
| KINDS_NOT_LOC | = | [:comment, :doctype, :docstring] |
| state | [RW] |
The typical filename suffix for this scanner‘s language.
# File lib/coderay/scanner.rb, line 84
84: def file_extension extension = lang
85: @file_extension ||= extension.to_s
86: end
Else, a Tokens object is used.
# File lib/coderay/scanner.rb, line 143
143: def initialize code = '', options = {}
144: if self.class == Scanner
145: raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses."
146: end
147:
148: @options = self.class::DEFAULT_OPTIONS.merge options
149:
150: super self.class.normalize(code)
151:
152: @tokens = options[:tokens] || Tokens.new
153: @tokens.scanner = self if @tokens.respond_to? :scanner=
154:
155: setup
156: end
Normalizes the given code into a string with UNIX newlines, in the scanner‘s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.
# File lib/coderay/scanner.rb, line 69
69: def normalize code
70: # original = code
71: code = code.to_s unless code.is_a? ::String
72: return code if code.empty?
73:
74: if code.respond_to? :encoding
75: code = encode_with_encoding code, self.encoding
76: else
77: code = to_unix code
78: end
79: # code = code.dup if code.eql? original
80: code
81: end
# File lib/coderay/scanner.rb, line 100
100: def encode_with_encoding code, target_encoding
101: if code.encoding == target_encoding
102: if code.valid_encoding?
103: return to_unix(code)
104: else
105: source_encoding = guess_encoding code
106: end
107: else
108: source_encoding = code.encoding
109: end
110: # print "encode_with_encoding from #{source_encoding} to #{target_encoding}"
111: code.encode target_encoding, source_encoding, :universal_newline => true, :undef => :replace, :invalid => :replace
112: end
# File lib/coderay/scanner.rb, line 118
118: def guess_encoding s
119: #:nocov:
120: IO.popen("file -b --mime -", "w+") do |file|
121: file.write s[0, 1024]
122: file.close_write
123: begin
124: Encoding.find file.gets[/charset=([-\w]+)/, 1]
125: rescue ArgumentError
126: Encoding::BINARY
127: end
128: end
129: #:nocov:
130: end
# File lib/coderay/scanner.rb, line 114
114: def to_unix code
115: code.index(?\r) ? code.gsub(/\r\n?/, "\n") : code
116: end
The string in binary encoding.
To be used with pos, which is the index of the byte the scanner will scan next.
# File lib/coderay/scanner.rb, line 243
243: def binary_string
244: @binary_string ||=
245: if string.respond_to?(:bytesize) && string.bytesize != string.size
246: #:nocov:
247: string.dup.force_encoding('binary')
248: #:nocov:
249: else
250: string
251: end
252: end
the default file extension for this scanner
# File lib/coderay/scanner.rb, line 178
178: def file_extension
179: self.class.file_extension
180: end
The current line position of the scanner, starting with 1. See also: column.
Beware, this is implemented inefficiently. It should be used for debugging only.
# File lib/coderay/scanner.rb, line 227
227: def line pos = self.pos
228: return 1 if pos <= 0
229: binary_string[0...pos].count("\n") + 1
230: end
Sets back the scanner. Subclasses should redefine the reset_instance method instead of this one.
# File lib/coderay/scanner.rb, line 160
160: def reset
161: super
162: reset_instance
163: end
Scan the code and returns all tokens in a Tokens object.
# File lib/coderay/scanner.rb, line 183
183: def tokenize source = nil, options = {}
184: options = @options.merge(options)
185: @tokens = options[:tokens] || @tokens || Tokens.new
186: @tokens.scanner = self if @tokens.respond_to? :scanner=
187: case source
188: when Array
189: self.string = self.class.normalize(source.join)
190: when nil
191: reset
192: else
193: self.string = self.class.normalize(source)
194: end
195:
196: begin
197: scan_tokens @tokens, options
198: rescue => e
199: message = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state]
200: raise_inspect e.message, @tokens, message, 30, e.backtrace
201: end
202:
203: @cached_tokens = @tokens
204: if source.is_a? Array
205: @tokens.split_into_parts(*source.map { |part| part.size })
206: else
207: @tokens
208: end
209: end
Scanner error with additional status information
# File lib/coderay/scanner.rb, line 281
281: def raise_inspect msg, tokens, state = self.state || 'No state given!', ambit = 30, backtrace = caller
282: raise ScanError, "\n\n***ERROR in %s: %s (after %d tokens)\n\ntokens:\n%s\n\ncurrent line: %d column: %d pos: %d\nmatched: %p state: %p\nbol? = %p, eos? = %p\n\nsurrounding code:\n%p ~~ %p\n\n\n***ERROR***\n\n" % [
283: File.basename(caller[0]),
284: msg,
285: tokens.respond_to?(:size) ? tokens.size : 0,
286: tokens.respond_to?(:last) ? tokens.last(10).map { |t| t.inspect }.join("\n") : '',
287: line, column, pos,
288: matched, state, bol?, eos?,
289: binary_string[pos - ambit, ambit],
290: binary_string[pos, ambit],
291: ], backtrace
292: end
Resets the scanner.
# File lib/coderay/scanner.rb, line 274
274: def reset_instance
275: @tokens.clear if @tokens.respond_to?(:clear) && !@options[:keep_tokens]
276: @cached_tokens = nil
277: @binary_string = nil if defined? @binary_string
278: end
Shorthand for scan_until(/\z/). This method also avoids a JRuby 1.9 mode bug.
# File lib/coderay/scanner.rb, line 315
315: def scan_rest
316: rest = self.rest
317: terminate
318: rest
319: end
This is the central method, and commonly the only one a subclass implements.
Subclasses must implement this method; it must return tokens and must only use Tokens#<< for storing scanned tokens!
# File lib/coderay/scanner.rb, line 269
269: def scan_tokens tokens, options # :doc:
270: raise NotImplementedError, "#{self.class}#scan_tokens not implemented."
271: end