Thursday, May 29, 2008 MySQL full text search and test data
I like using lipsum as test data. If I need to generate test strings, I usually generate them using my lipsum in a text file.
There’s one thing to note when you generate a large size of test data from a small lipsum source.
I need to test the full text search function in MySQL using my test data. However, my lipsum source is too small and the generated table is too large (~10,000). myisam_ftdump shows the following information:
Total rows: 9939
Total words: 1299056
Unique words: 161
Longest word: 12 chars (consectetuer)
Median length: 6
Average global weight: -1.871320
Most common word: 9906 times, weight: -5.704388 (mauris)
Hm… there are only 161 unique words. Remember, MySQL Natural Language Full-Text Searches has a behaviour which ignores the search words if they appear in 50% or more of the rows. You can almost say, for sure, that any word will appear in > 50% of the rows.
I need to use Boolean Mode to get around this.