Text and data mining is unique and thus, can take different timeframes to accomplish. A successful and time-efficient project begins with preplanning.
Step One starts with completing the outline below:
TDM (Text and Data Mining) is frequently a fair use under US copyright law, but some resources maybe restricted by license agreement.
"TDM (Text and Data Mining) is a broad term for developing research practices that involves building and processing a corpus: a collection of text that may contain millions or even billions of words. Remember, less is (becomes) more as you build your corpus --see below -- as it will help avoid hidden biases and unpredictable gaps in coverage.
A source or database describes data collected without a strong organizing principle.
A corpus describes data collected and organized with specific questions in mind about certain geographic regions, time periods, or social phenomena.
“Data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time actually analyzing it.” IBM Analytics
Illinois State University
Campus Box 8900
201 North School Street
Normal, Il 61790-8900
Have comments or questions about our guides?
Please contact Instruction and Student Engagement: