

It may help to understand the program, so I'll show the last 10 lines of the output here: Print the hash, with the results sorted by the hash value.

Put the word and the word frequency into the Hash (the word is the key, the frequency is the value).Split each line into words (words separated by spaces).Read each line in the file, one line at a time.Here's a little more discussion of the program: # sort the hash by value, and then print it in this sorted order The_file='/Users/Al/DD/Ruby/GettysburgAddress.txt' # a file appears in that file (i.e., the frequency of that word). # a sample ruby program that determines the number of times each word in Without any further ado here is the Ruby "word frequency" program: At the end of the program I print out the hash information, with the printout sorted in order by the hash value. The program opens a file, then adds each word in the file to a hash, where they word is the key and the number of occurrences of the word is the value. I wrote a little Ruby program to help me analyze this word frequency. (I'll skip the details of my theory here, but it involves looking at how often we re-type words compared to how often we use new dictionary-based words beginning with the same characters.) Place a caret at any string in your file and press Ctrl+F to find its occurrences or from the main menu, select Edit Find Next Occurrence of the Word at Caret. RubyMine places the highlighted string into the search field. The line number of this keyword in the current file. Alternatively, in the editor, highlight the string you want to find and press Ctrl+F. The first part of this is looking at documents I've written in the past, and analyzing the frequency of word occurrences within those documents. There are total 41 keywords present in Ruby as shown below: Keyword. In support of this effort I'm looking at different algorithms to best predict the word the user next wants to type. The regex (\d+)abc\1, with the \1 back-reference, would match the first and fourth line, only.After a fairly large number of emails I've started working on my type-ahead, predictive text editor project. It’s very important to understand the fundamental difference between a subpattern, used as a subroutine and a back reference !!įor instance, given the four lines text, below : 123abc123 In this new tab, perform the simple S/R, below : Paste the clipboard contents in a new tab, with the Ctrl + V shortcut


Hit the Ctrl + C shortcut ( DO NOT use the context option Copy ! ) With a right mouse click, choose the select All option, in the Find result panel If you, simply, need the list of all these files, follow the method, below : Some lines, containing, either, Word1 or Word2 or both The absolute path of each file, containing the two words Word1 and Word2 If you need a sensitive search, change the modifiers part by the syntax (?-is)Īfter running these regexes, using the Find in Files dialog, you should get, in the Find result panel : The dot will match a single standard character, even if you previously checked the. The search is performed, in an * insensitive case way ( This atomic group is just a particular case of of recursive subpattern, located outside the parentheses to which it refers if your two words, Word1 and Word2, are, always, both located in a same line, you could, preferably, use the more simple regex, below, which searches for the smaller range of characters, in a same line, between the string Word1 and Word2 OR between Word2 and Word1 We use a specific regex construction (?#), named a called subpattern. Then (Word1) matches the string Word1, stored as group 1, due to the parentheses, ONLY IF followed by the first string Word2, found afterwards, also stored as group 2, due to the “Look-ahead” construction (?=.*?(Word2))Īfter the alternative symbol |, the case (?2)(?=.*?(?1)) just represents the opposite case, where we’re searching for the string Word2, followed, further on, with the string Word1. The search will be perform, in an insensitive case way ( If you need a sensitive search, just use the syntax (?s-i) ) ) special character matches, absolutely, any single character ( standard or EOL ) The syntax (?si), at beginning of the regex, are modifiers which ensures that : So, instead, you could use the regex (?si)(Word1)(?=.*?(Word2))|(?2)(?=.*?(?1)), which searches for, either, the words Word1 OR Word2, in an insensitive case way, if they are followed, further on, by the second specific word However when the general case where the two words Word1 and Word2 are located in different lines, a search, with the Find in Files dialog, does NOT display, in the Find Result panel, all the lines of the block, beginning with Word1 and ending with Word2 ( or the opposite ) but ONLY the first line of each multi-lines block.
