I have extensive experience with Python and R.
The libraries/technologies that I use most often are:
NLTK – Useful for data mining, POS (part-of-speech) tagging, and syntactic parsing in my corpus research.
Pandas – Filtering and presenting results from language data mining.
Numpy – Used in conjunction with NLTK and Pandas for mathematical manipulations and for summarizing data with basic statistics.
I utilize this software environment to render complex statistical analyses, ranging from analysis of variance, regression analysis, PCA, to cluster analysis.
Following are recent articles that I have co-authored that showcase the types of analysis that I have conducted and reported.
– Collentine, J. G., & Collentine, K. (2020). Organic models for measuring Spanish learners’ linguistic complexity. In Current Theoretical and Applied Perspectives on Hispanic and Lusophone Linguistics (pp. 39–62). Amsterdam: John Benjamins.
– Collentine, K., & Collentine, J. G. (2020). A corpus analysis of the structural elaboration of Spanish heritage language learners. In Variation and Evolution: Aspects of language contact and contrast across the Spanish-speaking world (pp. 56–73). Amsterdam: John Benjamins.
– Collentine, J. G., & Asención-Delaney, Y. (2020). L2 Discourse Functions of the Spanish Subjunctive. In Routledge Handbook of Corpus Approaches to Discourse Analysis (pp. 252–268). New York, NY: Routledge.